Skip to content

Why We Built 40+ Connectors Instead of "Just Using a Warehouse"

The economics of buying versus building the fund-data layer

PS

Philipp Starovoytov

CTO & Co-Founder

7 min read

Every quarter, a CTO at a $500M-$2B fund sends us the same architecture diagram. Custodians and prime brokers feed into a general-purpose warehouse. A team of two data engineers stitches it together with dbt models. Reports go out. The board approves. The CTO asks: why does PlexiFact exist?

Here's the cost model that answers it.

The Hidden Build Cost

A general-purpose warehouse is a query engine. It does not understand alternative assets.

To turn it into something that produces a defensible NAV, you write the schema for trades. The schema for positions. The schema for corporate actions. The schema for fund admin reconciliation. The mapping from custodian feed format A to internal canonical format. The mapping from B. From C. From the prime broker's overnight trade file. From the OMS's intraday feed.

Three of these mappings break every quarter when a counterparty updates their export format. None of them are mentioned in the warehouse vendor's pitch deck.

We have measured this work across 20+ deployments. The build-out is 18 to 30 months of two senior data engineers writing connector code, schema layers, validation rules, and reconciliation logic. Loaded cost: $1.2M to $2M before the platform produces a single defensible report.

What PlexiFact Ships With

When we deploy at a new fund, we don't write any of that. We deploy:

  • 20+ Live connectors today (40+ on roadmap) covering custodians, prime brokers, fund admins, OMS, EMS, market data vendors, trade compression services, and corporate actions feeds
  • A canonical schema for trades, positions, NAV, performance, and investor allocations - already mapped from each connector's source format
  • A reconciliation engine that runs every break to either auto-resolve or escalate to a named owner
  • A governance layer that records every transformation and supports lineage queries from any output back to its source

This is what costs $95K to $295K in license depending on tier. Compare to $1.2M+ to build it once, plus ongoing maintenance forever.

Why "Just Use Snowflake" is Wrong

Snowflake is excellent at what it does. So is BigQuery. So is Databricks. They are general-purpose query engines that scale spectacularly.

They are not, and do not pretend to be, fund-data platforms. They have no opinion on what a trade looks like, what a position looks like, what corporate actions are, what counterparty risk is, or how to reconcile a custodian record against an OMS record.

You can build a fund-data platform on top of Snowflake. We have done it for clients. We have also done it for ourselves and decided to ship the result as a product. The economics for any single fund building it from scratch do not work.

When Build-from-Scratch Makes Sense

Two cases:

  1. The fund has a unique strategy that requires data the existing connector library doesn't cover. We've seen this in distressed credit and in physical commodity strategies. In those cases, we extend our connector library - we don't ask the fund to build their own.

  2. The fund has a 10+ engineer team and views the data platform as core IP they need to own. This is rare outside of multi-strategy platforms with $5B+ AUM and a quant DNA.

For everyone else - and that is the overwhelming majority of $300M-$7B funds - the buy-vs-build math is decided.

The Number That Matters

In every fund evaluation we've run, the deciding number is not the platform license cost. It is the loaded cost of the data engineers who would otherwise be writing the connector code, plus the opportunity cost of the 18 months they spend doing it instead of supporting the alpha-generating teams.

That number is always larger than $295K per year. Often by 4 to 8 times.