The Execution Layer Changed. The Understanding Layer Did Not.

A company running AI agents at scale generates the equivalent of 70 Wikipedias of text per month. Conversation logs, agent traces, tool calls, coding assistant sessions. Every failure, every unexpected behavior, every edge case your team has not seen yet is almost certainly in there.
Almost nobody reads it.
Not because they do not care. Because no human can. A production incident happens. A customer files a ticket. An agent behaves strangely in front of the board. The answer is usually in the traces. The problem is that there are millions of them.
This is a pattern the software industry has lived through before. Mainframes created systems monitoring. Distributed systems created distributed tracing. Microservices created Datadog. Every time execution became more complex, the industry eventually built a new understanding layer to match it.
AI created a different kind of complexity. The understanding layer has not caught up.

The problem is not volume. It is shape.
The existing tools were not built for this. The usual observability suspects treat agent traces the way they treat application logs. Shove the data into a proprietary backend, charge per GB ingested, surface dashboards over aggregated averages.
But a single agent session is megabytes of nested unstructured text: system prompts, tool calls, context injection, model responses across dozens of turns. Traditional observability was built for numeric metrics and structured events.
Traditional software produces metrics. AI systems produce explanations. Metrics can be aggregated. Explanations have to be understood.
A latency spike can be summarized on a dashboard. A reasoning failure cannot. To understand why an agent chose the wrong tool, ignored a constraint, or confidently produced the wrong answer, somebody eventually has to reconstruct what happened across prompts, context, tool calls, and model outputs.
The data exists. The understanding does not.
Most teams discover this the hard way. An agent breaks in production. Someone gets paged at 11pm. The instinct is to open the logs. But the session volume blows past any context window, and the tools that exist were not designed to let you ask questions across millions of cold traces. The fallback is grep. You find nothing useful. You give up.
The diagnosis was in the logs the whole time.
This is a data engineering problem
Databases were not invented because spreadsheets failed. They were invented because the data shape changed.
The tools that solved distributed data at scale already showed the architecture. Apache Iceberg, ClickHouse, Parquet: lightweight processing on top of raw files in storage you already own, without moving data to a vendor, at storage cost. That model works for every other kind of large-scale enterprise data. It should work for agent traces.
Once you believe that, a different product follows naturally. Not a better version of a traditional observability company. Something built from different assumptions about where the data lives and how it gets queried.
Built Native, Not Retrofitted
Kenny Daniel came to this conclusion the hard way. At his previous company, whenever an AI system misbehaved, he went into the logs. The answer was almost always there. The problem was never the absence of information. The problem was volume. As agent workloads scaled, no human had the patience to read enough of it, and the tools that existed were not built to let you ask questions across it.
Hyperparam starts from the opposite assumption to the incumbents. One command collects agent traces, gateway traffic, and Claude Code session logs and lands them as Apache Iceberg in your own S3 bucket. The entire stack is built around a simple idea: query the data where it already lives rather than moving it into somebody else’s system.
The primitives underneath it are open source. Hyparquet, a Parquet parser built entirely in JavaScript, was developed with an open-source grant from Hugging Face. HypGrep handles full-text search directly over Parquet files in cloud storage, no Elasticsearch instance, no server to keep warm. The query engine runs in the browser, with LLM calls available as async functions against your trace columns in parallel, meaning you can ask a question across an entire dataset of agent responses and watch results populate concurrently, without a server.
Better dashboards are not the point. The right substrate changes the question you can ask. Not "what happened here?" but "which failures look like this one?", "which prompt patterns correlate with bad outcomes?", "which executions resemble the complaint that just came in?" Those questions span millions of traces. Getting there requires the stack to be built correctly first.
The architecture paper won best paper at the ACM Conference on AI and Agentic Systems last month: hyperparam.app.
What comes after execution
At the first Conference on AI and Agentic Systems last month, I wrote that the bottleneck was moving from models to systems. Researchers working on governance, evaluation, databases, and observability were all wrestling with the same underlying problem from different angles. The models are no longer the hard part. What is hard is everything that happens after the model produces an answer.
We spent the last decade building systems that execute software. We may spend the next decade building systems that explain it.
Every AI system already records what happened. The companies that matter next may be the ones that help us understand why.
Kenny Daniel is speaking at the Databricks Data+AI Summit in San Francisco this week. The session is called “AI Is a Data Engineering Problem.” If you are there and thinking seriously about what it means to operate agents in production, it is worth finding him.
More essays: Semantic layers, Fine Tuning, What Makes 5% of AI Agents Actually Work in Production?, AI Engineer World’s Fair 2025 - Field Notes, AI for PMs
Oana Olteanu is the founder and GP of Motive Force, an early-stage venture firm investing in Beautiful Software at pre-seed and seed.
oana@motiveforce.ai



Thanks Oana, this is the clearest articulation of the problem I've read. Understanding is really the second level of the hierarchy though. The base layer is just collecting the data, and most companies aren't doing even that. Ask an engineering org where their Claude Code logs go and it's usually nowhere. We're throwing away the clearest traces we have of how people reason with these tools, much less learning from them. Thanks for writing this!