When an OpenAI finance analyst needed to compare revenue across geographies and customer cohorts last year, it took hours of work — hunting through 70,000 datasets, writing SQL queries, verifying table schemas. Today, the same analyst types a plain-English question into Slack and gets a finished chart in minutes.
The tool behind that transformation was built by two engineers in three months. Seventy percent of its code was written by AI. And it is now used by more than 4,000 of OpenAI’s roughly 5,000 employees every day — making it one of the most aggressive deployments of an AI data agent inside any company, anywhere.
In an exclusive interview with VentureBeat, Emma Tang, the head of data infrastructure at OpenAI whose team built the agent, offered a rare look inside the system — how it works, how it fails, and what it signals about the future of enterprise data. The conversation, paired with the company’s blog post announcing the tool, paints a picture of a company that turned its own AI on itself and discovered something that every enterprise will soon confront: the bottleneck to smarter organizations isn’t better models. It’s better data.
“The agent is used for any kind of analysis,” Tang said. “Almost every team in the company uses it.”
A plain-English interface to 600 petabytes of corporate data
To understand why OpenAI built this system, consider the scale of the problem. The company’s data platform spans more than 600 petabytes across 70,000 datasets. Even locating the correct table can consume hours of a data scientist’s time. Tang’s Data Platform team — which sits under infrastructure and oversees big data systems, streaming, and the data tooling layer — serves a staggering internal user base. “There are 5,000 employees at OpenAI right now,” Tang said. “Over 4,000 use data tools that our team provides.”
The agent, built on GPT-5.2 and accessible wherever employees already work — Slack, a web interface, IDEs, the Codex CLI, and OpenAI’s internal ChatGPT app — accepts plain-English questions and returns charts, dashboards, and long-form analytical reports. In follow-up responses shared with VentureBeat on background, the team estimated it saves two to four hours of work per query. But Tang emphasized that the larger win is harder to measure: the agent gives people access to analysis they simply couldn’t have done before, regardless of how much time they had.
“Engineers, growth, product, as well as non-technical teams, who may not know all the ins and outs of the company data systems and table schemas” can now pull sophisticated insights on their own, her team noted.
From revenue breakdowns to latency debugging, one agent does it all
Tang walked through several concrete use cases that illustrate the agent’s range. OpenAI’s finance team queries it for revenue comparisons across geographies and customer cohorts. “It can, just literally in plain text, send the agent a query, and it will be able to respond and give you charts and give you dashboards, all of these things,” she said.
But the real power lies in strategic, multi-step analysis. Tang described a recent case where a user spotted discrepancies between two dashboards tracking Plus subscriber growth. “The data agent can give you a chart and show you, stack rank by stack rank, exactly what the differences are,” she said. “There turned out to be five different factors. For a human, that would take hours, if not days, but the agent can do it in a few minutes.”
Product managers use it to understand feature adoption. Engineers use it to diagnose performance regressions — asking, for instance, whether a specific ChatGPT component really is slower than yesterday, and if so, which latency components explain the change. The agent can break it all down and compare prior periods from a single prompt.
What makes this especially unusual is that the agent operates across organizational boundaries. Most enterprise AI agents today are siloed within departments — a finance bot here, an HR bot there. OpenAI’s cuts horizontally across the company. Tang said they launched department by department, curating specific memory and context for each group, but “at some point it’s all in the same database.” A senior leader can combine sales data with engineering metrics and product analytics in a single query. “That’s a really unique feature of ours,” Tang said.
How Codex solved the hardest problem in enterprise data
Finding the right table among 70,000 datasets is, by Tang’s own admission, the single hardest technical challenge her team faces. “That’s the biggest problem with this agent,” she said. And it’s where Codex — OpenAI’s AI coding agent — plays its most inventive role.
Codex serves triple duty in the system. Users access the data agent through Codex via MCP. The team used Codex to generate more than 70% of the agent’s own code, enabling two engineers to ship in three months. But the third role is the most technically fascinating: a daily asynchronous process where Codex examines important data tables, analyzes the underlying pipeline code, and determines each table’s upstream and downstream dependencies, ownership, granularity, join keys, and similar tables.
“We give it a prompt, have Codex look at the code and respond with what we need, and then persist that to the database,” Tang explained. When a user later asks about revenue, the agent searches a vector database to find which tables Codex has already mapped to that concept.
This “Codex Enrichment” is one of six context layers the agent uses. The layers range from basic schema metadata and curated expert descriptions to institutional knowledge pulled from Slack, Google Docs, and Notion, plus a learning memory that stores corrections from previous conversations. When no prior information exists, the agent falls back to live queries against the data warehouse.
The team also tiers historical query patterns. “All query history is everybody’s ‘select star, limit 10.’ It’s not really helpful,” Tang said. Canonical dashboards and executive reports — where analysts invested significant effort determining the correct representation — get flagged as “source of truth.” Everything else gets deprioritized.
The prompt that forces the AI to slow down and think
Even with six context layers, Tang was remarkably candid about the agent’s biggest behavioral flaw: overconfidence. It’s a problem anyone who has worked with large language models will recognize.
“It’s a really big problem, because what the model often does is feel overconfident,” Tang said. “It’ll say, ‘This is the right table,’ and just go forth and start doing analysis. That’s actually the wrong approach.”
The fix came through prompt engineering that forces the agent to linger in a discovery phase. “We found that the more time it spends gathering possible scenarios and comparing which table to use — just spending more time in the discovery phase — the better the results,” she said. The prompt reads almost like coaching a junior analyst: “Before you run ahead with this, I really want you to do more validation on whether this is the right table. So please check more sources before you go and create actual data.”
The team also learned, through rigorous evaluation, that less context can produce better results. “It’s very easy to dump everything in and just expect it to do better,” Tang said. “From our evals, we actually found the opposite. The fewer things you give it, and the more curated and accurate the context is, the better the results.”
To build trust, the agent streams its intermediate reasoning to users in real time, exposes which tables it selected and why, and links directly to underlying query results. Users can interrupt the agent mid-analysis to redirect it. The system also checkpoints its progress, enabling it to resume after failures. And at the end of every task, the model evaluates its own performance. “We ask the model, ‘how did you think that went? Was that good or bad?'” Tang said. “And it’s actually fairly good at evaluating how well it’s doing.”
Guardrails that are deliberately simple — and surprisingly effective
When it comes to safety, Tang took a pragmatic approach that may surprise enterprises expecting sophisticated AI alignment techniques.
“I think you just have to have even more dumb guardrails,” she said. “We have really strong access control. It’s always using your personal token, so whatever you have access to is only what you have access to.”
The agent operates purely as an interface layer, inheriting the same permissions that govern OpenAI’s data. It never appears in public channels — only in private channels or a user’s own interface. Write access is restricted to a temporary test schema that gets wiped periodically and can’t be shared. “We don’t let it randomly write to systems either,” Tang said.
User feedback closes the loop. Employees flag incorrect results directly, and the team investigates. The model’s self-evaluation adds another check. Longer term, Tang said, the plan is to move toward a multi-agent architecture where specialized agents monitor and assist each other. “We’re moving towards that eventually,” she said, “but right now, even as it is, we’ve gotten pretty far.”
Why OpenAI won’t sell this tool — but wants you to build your own
Despite the obvious commercial potential, OpenAI told VentureBeat that the company has no plans to productize its internal data agent. The strategy is to provide building blocks and let enterprises construct their own. And Tang made clear that everything her team used to build the system is already available externally.
“We use all the same APIs that are available externally,” she said. “The Responses API, the Evals API. We don’t have a fine-tuned model. We just use 5.2. So you can definitely build this.”
That message aligns with OpenAI’s broader enterprise push. The company launched OpenAI Frontier in early February, an end-to-end platform for enterprises to build and manage AI agents. It has since enlisted McKinsey, Boston Consulting Group, Accenture, and Capgemini to help sell and implement the platform. AWS and OpenAI are jointly developing a Stateful Runtime Environment for Amazon Bedrock that mirrors some of the persistent context capabilities OpenAI built into its data agent. And Apple recently integrated Codex directly into Xcode.
According to information shared with VentureBeat by OpenAI, Codex is now used by 95% of engineers at OpenAI and reviews all pull requests before they’re merged. Its global weekly active user base has tripled since the start of the year, surpassing one million. Overall usage has grown more than fivefold.
Tang described a shift in how employees use Codex that transcends coding entirely. “Codex isn’t even a coding tool anymore. It’s much more than that,” she said. “I see non-technical teams use it to organize thoughts and create slides and to create daily summaries.” One of her engineering managers has Codex review her notes each morning, identify the most important tasks, pull in Slack messages and DMs, and draft responses. “It’s really operating on her behalf in a lot of ways,” Tang said.
The unsexy prerequisite that will determine who wins the AI agent race
When asked what other enterprises should take away from OpenAI’s experience, Tang didn’t point to model capabilities or clever prompt engineering. She pointed to something far more mundane.
“This is not sexy, but data governance is really important for data agents to work well,” she said. “Your data needs to be clean enough and annotated enough, and there needs to be a source of truth somewhere for the agent to crawl.”
The underlying infrastructure — storage, compute, orchestration, and business intelligence layers — hasn’t been replaced by the agent. It still needs all of those tools to do its job. But it serves as a fundamentally new entry point for data intelligence, one that is more autonomous and accessible than anything that came before it.
Tang closed the interview with a warning for companies that hesitate. “Companies that adopt this are going to see the benefits very rapidly,” she said. “And companies that don’t are going to fall behind. It’s going to pull apart. The companies who use it are going to advance very, very quickly.”
Asked whether that acceleration worried her own colleagues — especially after a wave of recent layoffs at companies like Block — Tang paused. “How much we’re able to do as a company has accelerated,” she said, “but it still doesn’t match our ambitions, not even one bit.”
