
NEW YORK (WHN) – For two years, the engine room of generative AI development has hummed with a simple, almost elegant, transaction: the “completion.” A developer would send a text prompt, the model would spit back text, and the conversation, for all intents and purposes, would end. If you wanted to pick up the thread, you had to package the entire dialogue and send it back, a process that, while functional for basic chatbots, increasingly felt like trying to steer a freighter with a dinghy’s oars.
This “stateless” architecture, the bedrock of Google’s legacy `generateContent` endpoint, proved a bottleneck as developers pushed toward autonomous agents. These agents, designed to wield tools, manage intricate states, and ponder over extended timelines, found the old model wanting. The constant need to retransmit vast swathes of history became a drag, a digital weight slowing down innovation.
Last week, Google DeepMind finally addressed this fundamental infrastructure gap. The public beta launch of its Interactions API, accessible via the `/interactions` endpoint, signals a significant architectural shift, moving AI models from mere text generators to something far more akin to remote operating systems.
While OpenAI began a similar transition with its Responses API in March 2025, Google’s entry into this space is a clear statement of intent. The Interactions API isn’t just about managing conversational state; it’s a unified interface built to recognize that LLMs are evolving beyond simple text output. They are becoming sophisticated systems, capable of complex reasoning and tool utilization.
The core of this new paradigm lies in server-side state as a default. Previously, building a complex AI agent meant meticulously managing a growing JSON list of every user and model turn. Developers were tasked with sending gigabytes of conversation history with each API call. Now, with the Interactions API, developers simply pass a `previous_interaction_id`. Google’s infrastructure shoulders the burden of retaining conversation history, tool outputs, and the model’s internal “thought” processes.
“Models are becoming systems and over time, might even become agents themselves,” DeepMind’s Ali Çevik and Philipp Schmid noted in an official company blog post. “Trying to force these capabilities into generateContent would have resulted in an overly complex and fragile API.”
This architectural change directly enables Background Execution, a feature crucial for the burgeoning agentic era. Imagine a complex workflow: an agent tasked with browsing the web for an hour to synthesize a comprehensive report. Standard APIs would likely time out long before such a task concluded. The Interactions API, however, allows developers to initiate an agent with `background=true`, disconnect, and later poll for the result. It effectively transforms the API into a job queue for artificial intelligence.
Google is already leveraging this new infrastructure to power its first built-in agent: Gemini Deep Research. This agent, accessible through the same `/interactions` endpoint, is designed for “long-horizon research tasks.” Unlike a standard model that predicts the next token based on a prompt, the Deep Research agent operates in a loop: it searches, reads, and synthesizes information autonomously.
Crucially, Google is also weaving itself into the broader ecosystem with native support for the Model Context Protocol (MCP). This allows Gemini models to directly invoke external tools—think weather services or databases—hosted on remote servers, eliminating the need for developers to write custom code to interpret tool calls. It’s a move that streamlines integration and lowers the barrier to entry for tool-augmented AI agents.
Google might be seen as playing catch-up, but it’s doing so with a distinct philosophical difference compared to competitors. OpenAI, for its part, began its move away from statelessness nine months ago with the Responses API in March 2025. Both tech titans are grappling with the challenge of “context bloat,” the ever-increasing size of conversational history that needs to be managed.
Yet, their solutions diverge sharply on transparency. OpenAI’s Responses API introduced “Compaction,” a feature that shrinks conversation history by replacing tool outputs and reasoning chains with opaque “encrypted compaction items.” This prioritizes token efficiency, a critical metric in AI cost structures, but it creates a “black box,” obscuring the model’s past reasoning from the developer. The internal workings become a mystery.
Google’s approach with the Interactions API, however, keeps the full history accessible and composable. Its data model is designed to allow developers to “debug, manipulate, stream and reason over interleaved messages.” The emphasis here is on inspectability, allowing for deeper understanding and control over the AI’s operational flow, rather than pure compression.
The Interactions API is currently in Public Beta, available immediately via Google AI Studio. It supports Google’s latest generation models, including Gemini 3.0 (Gemini 3 Pro Preview), Gemini 2.5 (Flash, Flash-lite, and Pro), and the aforementioned Deep Research agent (deep-research-pro-preview-12-2025). Developers can now select the model size best suited for their specific agentic task.
Commercially, the API integrates into Google’s existing pricing structure, charging standard rates for input and output tokens based on the selected model. However, the value proposition shifts significantly with the new data retention policies. Because the API is stateful, Google must store interaction history to enable features like implicit caching and context retrieval. This storage access is tiered.
Developers on the Free Tier are limited to a 1-day retention policy, suitable for ephemeral testing but insufficient for agents requiring long-term memory. The Paid Tier, conversely, unlocks a 55-day retention policy. This extended retention isn’t merely for auditing; it’s a strategic move to lower total cost of ownership. By keeping interaction history “hot” on Google’s servers for nearly two months, developers avoid the cost of re-processing massive context windows for recurring users. This makes the Paid Tier a far more efficient option for production-grade agents.
It’s important to note that as this is a Beta release, Google has advised that features and schemas are subject to change—a standard caveat for early-stage product rollouts.
Sam Witteveen, a Google Developer Expert in Machine Learning and CEO of Red Dragon AI, views this release as a necessary evolution of the developer stack. “If we go back in history… the whole idea was simple text-in, text-out,” Witteveen observed in a technical breakdown of the release on YouTube. “But now… you are interacting with a system. A system that can use multiple models, do multiple loops of calls, use tools, and do code execution on the backend.”
Witteveen pointed to the immediate economic benefit of this architecture: Implicit Caching. Because conversation history resides on Google’s servers, developers are not charged for re-uploading the same context repeatedly. “You don’t have to pay as much for the tokens that you are calling,” he explained, a direct answer to the cost pressures that have long plagued AI development.
Yet, the release isn’t without its friction points. Witteveen raised a concern regarding the current implementation of the Deep Research agent’s citation system. While the agent provides sources, the returned URLs are often wrapped in internal Google/Vertex AI redirection links rather than raw, usable URLs. “My biggest gripe is that… these URLs, if I save them and try to use them in a different session, they’re not going to work,” Witteveen warned. “If I want to make a report for someone with citations, I want them to be able to click on the URLs from a PDF file… Having something like medium.com as a citation [without the direct link] is not very good.”
For Lead AI Engineers prioritizing rapid model deployment and fine-tuning, this release offers a direct architectural solution to the persistent “timeout” problem: Background Execution. Instead of building complex asynchronous handlers or managing separate job queues for long-running reasoning tasks, complexity can now be offloaded directly to Google. This convenience, however, comes with a strategic trade-off. The new Deep Research agent enables rapid deployment of sophisticated research capabilities, but it operates as a “black box” compared to custom-built LangChain or LangGraph flows.
Engineers are advised to prototype “slow thinking” features using the `background=true` parameter to assess if the speed of implementation outweighs the loss of fine-grained control over the research loop. This is a critical decision point for teams balancing speed to market with deep customization.
Senior engineers managing AI orchestration and budgets will find that the shift to server-side state via `previous_interaction_id` directly enables Implicit Caching. This is a massive win for both cost and latency metrics. By referencing history stored on Google’s servers, token costs associated with re-uploading massive context windows are automatically avoided, directly addressing budget constraints while maintaining high performance.
The challenge here lies in the supply chain. Incorporating Remote MCP means agents connect directly to external tools, requiring rigorous validation of those remote services for security and authentication. It is time to audit current token spend on re-sending conversation history. If that spend is high, migrating to the stateful Interactions API could capture significant savings.
For Senior Data Engineers, the Interactions API presents a more structured data model than raw text logs. Its schema allows for complex histories to be debugged and reasoned over, improving overall Data Integrity across pipelines. However, vigilance regarding Data Quality remains paramount. The issue raised by Sam Witteveen concerning citations is particularly relevant here. The Deep Research agent currently returns “wrapped” URLs that may expire or break, rather than raw source links.
If pipelines rely on scraping or archiving these sources, a cleaning step to extract usable URLs may be necessary. Engineers should also test the structured output capabilities (`response_format`) to see if they can replace fragile regex parsing in current ETL pipelines. This offers a pathway to more reliable data extraction.
Finally, for Directors of IT Security, moving state to Google’s centralized servers presents a paradox. It can enhance security by keeping API keys and conversation history off client devices, but it introduces a new data residency risk. The critical check involves Google’s Data Retention Policies. While the Free Tier retains data for only one day, the Paid Tier holds interaction history for 55 days.
This stands in contrast to OpenAI’s “Zero Data Retention” (ZDR) enterprise options. Organizations must ensure that storing sensitive conversation history for nearly two months complies with internal governance and regulatory requirements. If this violates policy, calls must be configured with `store=false`, though doing so disables the stateful features and their associated cost benefits, a trade-off that needs careful consideration.
This analysis is for informational purposes only and not investment advice.