The Agent Puts Data Where Its Shape Wants to Live

An agent routing each data feed to the store whose shape fits it — documents to Mongo, relational to Postgres, files to object storage, embeddings to a vector DB — instead of collapsing everything into one warehouse

Ask an engineer where a new feed should land and you’ll get the answer before you finish the question. The warehouse. The lake. Whatever the company standardized on years ago. The destination was decided long before this feed existed, and every feed since has been bent to fit it.

That’s not a modeling decision. It’s a budget decision dressed up as one.

One warehouse was the affordable answer, not the right one

The data never asked to live in one place. A broker feed of deeply nested trade events doesn’t want to be flattened into wide relational tables. It wants to stay a document. A reference table of accounts and balances doesn’t want to be a pile of JSON blobs in object storage. It wants columns, types, a query planner. A daily dump of raw files nobody touches until an audit doesn’t want to sit in an expensive transactional store. It wants to be cheap, immutable, and out of the way. A corpus of filings and research you’ll search by meaning, not by exact match, doesn’t want to be any of those. It wants to be embeddings in a vector store.

Four shapes, four natural homes. We knew that. We standardized on one anyway, because the alternative cost too much.

Run Mongo and Postgres and an object store and a vector database, and you’ve signed up to maintain all four. Four runbooks. Four things that page someone at 3 a.m. Four security models to keep straight. So teams did the rational thing. They picked one store, called it the source of truth, and spent the next decade writing transformations whose only job was to force data that didn’t fit into the shape of the thing they could afford to keep running.

“One source of truth” got promoted from coping mechanism to architectural principle. We told ourselves the single warehouse was discipline. Mostly it was the cost of the second warehouse.

Storage was never the expensive part

What made polyglot persistence expensive was never the storage. Storage is cheap, and has been for years. It was the maintenance. The human attention each store demanded, multiplied by every store you ran. That was the line item that made one warehouse the only sane call.

An agent removes that line item. It doesn’t burn out switching between four operational models. It doesn’t need a runbook per store, because it doesn’t forget the procedure between Tuesdays. It stands up the destination with the same call it uses for everything else, applies the schema, and keeps the thing healthy without anyone holding the steps in their head. The cost that forced the collapse to one warehouse, human attention per store, is the cost that stops scaling with the number of stores.

So the question changes. It’s no longer which warehouse did we commit to. It’s what shape is this data, and where does that shape want to live. That’s a question you answer one feed at a time, not once for the whole company when the architecture got frozen years ago.

The destination gets chosen when the feed is set up

This is the actual shift. For thirty years the destination was one decision, made once, for the whole company, before any of these feeds existed. You built every pipeline to the same warehouse because that’s what the architecture said years ago. The warehouse was a given.

Now the decision moves down to the feed. When an agent wires up a new source, it looks at what it’s pulling from: the nesting, the relational structure, the volume, the access pattern. It can pull a sample first to see what actually comes back. Then it sends that feed to the store that fits. Document-shaped data goes to Mongo. Data you’ll slice and join goes to Postgres. Big write-once-read-rarely data goes to object storage. Text you’ll search by meaning gets embedded into a vector database. Same machinery underneath. Different home, chosen per feed by the agent that set it up, instead of by an org chart decision from three years ago.

Nobody had to guess the shape in advance, and nobody had to commit the whole company to one answer. The feed didn’t flatten a document into columns because columns were the only thing on the menu. The data landed where it fit, because for the first time offering it the right home cost almost nothing.

What you give up is a habit, not a principle

The urge to consolidate runs deep, and some of it is real. One store is easier to reason about. One query surface, one place to look, one backup you trust. Those are genuine goods, and an agent doesn’t hand them to you for free. A copy spread across several stores is now something you have to think about, the way you think about any distributed system.

But the consolidation we actually practiced wasn’t chasing those goods. It was dodging a staffing cost. We didn’t pick one warehouse because one warehouse modeled the world best. We picked it because we couldn’t afford the people to run the second one. Take that cost away and most of the case for cramming every shape into one store goes with it. What’s left, the real simplicity of one place to look, is a tradeoff to weigh per pipeline. Not a law to apply to all of them.

The single warehouse was a coping mechanism we called a principle. You find out which one it really was the moment the coping stops being necessary. An agent doesn’t add destinations as a feature. It removes the reason we ever collapsed to one, and gets out of the data’s way.

Todd Fearn is the founder and CEO of Datris, an open-source, agent-native data platform. He has spent thirty years building production data infrastructure for financial institutions including Goldman Sachs, Bridgewater, Deutsche Bank, and Freddie Mac, and has founded several venture-backed companies.