Amidst the rapidly escalating competition in the global artificial intelligence field, Google and OpenAI once again released major updates on the same day, drawing intense attention from the entire industry.
Last night, Google released a completely reimagined version of Gemini Deep Research and opened up the API for embedded research agents for the first time.
Almost simultaneously, OpenAI officially released the highly anticipated GPT-5.2 (codename Garlic). The competition between the two companies over the future of intelligent agents, the boundaries of basic large-scale model capabilities, and dominance in the application ecosystem is entering an unprecedentedly intense phase.
This time, the offensive and defensive actions of Google and OpenAI almost coincided precisely with the same time window, allowing the outside world to clearly observe the pace of the strategic confrontation between these two global AI giants.
1. Google launches new Deep Research Agent
Google's new Gemini Deep Research tool is an intelligent agent capable of integrating massive amounts of information and processing large amounts of contextual data in prompts. Google says customers use the Deep Research Agent for a wide range of tasks, from due diligence to drug toxicity and safety studies.
Google also stated that it will soon integrate this new Deep Research Agent into its various services, including Google Search, Google Finance, the Gemini app, and the popular NotebookLM. This marks another step forward for Google towards a future where humans will no longer use Google to search for anything, but rather AI agents will do the work for them.
Specifically, what capabilities does a Deep Research Agent possess?
In this update, Google not only redesigned the Deep Research Agent at the architectural level, but also built a more stable, accurate, and traceable deep research system based on the Gemini 3 Pro as the core foundation model. The improvements in the new Deep Research Agent can be summarized in three key areas: model upgrades, breakthroughs in inference stability, and comprehensive enhancements in interactive capabilities .
First, let's talk about the model upgrade. The new Deep Research Agent is built entirely on Gemini 3 Pro, which Google considers its most "realistic," reliable, and best suited flagship model for long-chain inference to date. Google emphasizes that this is not just a performance improvement, but a qualitative leap in the "reliability" of research agents.
To build such an intelligent agent, Google adopted a training strategy of Reinforcement Learning over Multi-step Trajectories. Its goal is very clear: in complex research tasks involving tens or hundreds of steps, the AI must maintain stable reasoning paths, reduce the probability of hallucinations, and ensure consistency in continuous decision-making processes.
One of the main pain points of traditional LLM in long-chain inference is that each step of the inference introduces cumulative error—just a single illusory node can invalidate the entire output. Google emphasizes that the new version of Deep Research has made a significant breakthrough in this regard:
- Multi-round reinforcement learning optimizes decision sequences
- Significantly reduce logical offsets in lengthy task chains
- A more stable retrieval-analysis-reasoning-citation closed loop
This allows Deep Research to undertake tasks that LLMs were previously unable to perform, such as fully executing multi-day research, policy evaluation, multi-source data integration, and full-process due diligence.
Another core advantage of the new Deep Research Agent is its massive context processing capabilities . Powered by Gemini 3 Pro, it can process far more data at once than ever before, including academic papers, official reports, and lengthy web pages. More importantly, Google has added a "research-grade standard capability" to Deep Research: it automatically adds traceable citations to every viewpoint and conclusion. These citations are not just URL links, but structured references to key passages or paragraphs in the original text, ensuring credible output, verifiable viewpoints, and allowing users to conduct secondary investigations and reviews. This makes Deep Research not just "generate content," but "provide research results with a chain of evidence."
This update is not just a feature upgrade, but a systematic release from Google surrounding its "research agent ecosystem." In addition to the Deep Research Agent update, Google is also introducing two key new capabilities: the open-source DeepSearchQA benchmark for network research agents and a new interaction API.
Currently, there is a lack of unified metrics for evaluating network research agents in the industry. To demonstrate Google's progress, Google has created a new benchmark. This new benchmark, called DeepSearchQA, is designed to test the performance of agents in complex, multi-step information retrieval tasks. Google has open-sourced this benchmark.
DeepSearchQA open source address: https://www.kaggle.com/benchmarks/google/dsqa/leaderboard
DeepSearchQA comprises 900 meticulously designed "causal chain" tasks across 17 domains, each step relying on prior analysis. Unlike traditional fact-based tests, DeepSearchQA measures comprehensiveness, requiring agents to generate exhaustive sets of answers. This assesses both the precision and recall of the research.
Comparing the results of pass@8 and pass@1 demonstrates the value of allowing the agent to explore multiple parallel paths for answer verification. These results were computed on a subset of 200 hints from DeepSearchQA.
The all-new Deep Research Agent achieved state-of-the-art results in the "Last Test for Humans" (HLE) and DeepSearchQA tests, and performed best in the BrowseComp test. It is optimized to generate high-quality research reports at a lower cost.
The benchmark results are impressive. It's built on the Gemini 3 Pro core, but utilizes an agent workflow to achieve state-of-the-art performance. Statistics (from charts):
- Human's Last Test (HLE): 46.4% (significantly better than GPT-5 Pro's 38.9%)
- DeepSearchQA: 66.1% (slightly better than GPT-5 Pro's 65.2%)
- BrowseComp: 59.2% (Typical performance to GPT-5 Pro)
Gemini Deep Research achieved a leading score of 46.4% on the complete "The Last Test for Humans" (HLE) dataset, 66.1% on DeepSearchQA, and a remarkable 59.2% on BrowseComp.
The Interactions API is one of the most strategically significant capabilities released by Google this time. For the first time, it allows developers to control the behavior, inference steps, execution of long-chain tasks, and storage of intermediate states of an agent in a structured way. This means that while developers could previously only "ask questions to the model," they can now "train the agent on how to perform tasks."
2. What do netizens think?
The reaction from the tech community after Google released the new version of Deep Research Agent is also worth noting.
In related discussion threads on Hacker News and Reddit, many developers expressed their appreciation for Google's achievement of "truly making the Agent a professionally engineered product."
On Reddit, some users expressed their amazement at the progress of technology:
"It's incredible! I don't think we've fully realized this yet. The progress we've made over the past three years is simply unbelievable!"
Some netizens pointed out that Google's emphasis on "verifiable references" and "end-to-end multi-step inference stability" at the product level for the first time is a significant advancement in the field of AI Agents.
One user, who claims to have long been engaged in compliance review work, commented: "If Deep Research can really achieve step-by-step auditability, it will be the first time that a major company has truly pushed the agent from a toy to a production environment."
However, some remain cautious. One Reddit user criticized, "Google has proven itself to be the best using its own benchmarks far too many times. What we need is third-party testing on real web pages and in real-world tasks."
Google's new agent was released on the same day as OpenAI GPT-5.2, so it was inevitable that netizens would compare the two.
On Reddit, a user asked how this Deep Research Agent compares to GPT-5.2, which was released by OpenAI around the same time. Another user replied that they have different uses, but GPT-5.2 is better.
To provide a clearer comparison, some netizens also found OpenAI researcher Sebastien Bubeck...
In a LinkedIn post, Sebastien Bubeck stated that GPT-5.2 scored 45% on the Human Last Test (HLE), while Google's new agent scored 46.4%, slightly higher than GPT-5.2.
Meanwhile, regarding the competition between Google and OpenAI, some people have made sarcastic comments: "Google just released Deep Research, and OpenAI immediately released Garlic (GPT-5.2). These two companies are practically fighting to release news against each other."
Some people summarized the pace of this fierce competition as: "This is no longer a model war, but a press conference war."
3. The close-quarters competition in model capabilities is intensifying.
The ability to create a basic model has always been the most iconic competitive focus for both companies.
In early 2025, Google launched the Gemini 3 Pro, aiming to rebuild its advantage in long-chain inference and specialized task scenarios with its more "realistic," reliable, and less hallucinatory features. The Gemini 3 Pro emphasizes enhanced retrieval, multimodal processing capabilities, and large-scale contextual processing capabilities, and has performed exceptionally well in high-trust scenarios such as scientific research, law, and finance.
In its latest release, GPT-5.2 (Garlic), OpenAI has enhanced logical consistency, tool invocation stability, and agent autonomy, further improving cross-task generalization capabilities. Internal benchmark tests show that GPT-5.2 maintains its lead over Gemini in inference, code generation, and multi-round tool scheduling, especially excelling in OpenAI's self-developed "Continuous Inference Consistency Benchmark."
The capability gap between the two is considered by industry commentators to have "reached the millimeter level"—the gap is often only reflected in specific task scenarios, rather than being a global advantage.
If the basic model determines whether an agent can think, then the platform capabilities of the agent determine whether the agent can perform tasks.
Google's complete overhaul of the Gemini Deep Research Agent can be seen as a key milestone in its formal entry into the intelligent agent war.
The new Deep Research Agent has three major highlights:
- Completely rewritten inference chain based on Gemini 3 Pro
- By employing multi-step reinforcement learning training to maintain decision consistency across long-chain tasks, the probability of hallucinations is significantly reduced.
- Provides full-link citations, allowing you to trace the source of evidence for each viewpoint.
This upgrades it from a "report generation tool" to a "professional intelligent agent capable of performing complete research tasks." More importantly, Google introduced the Interactions API, which provides structured control over the agent's behavior, allowing developers to manage the scheduling and state of each stage and subtask of the agent with highly controllable precision. This means that the Deep Research Agent is no longer just a capability within Google's product line, but a general-purpose agent execution engine.
OpenAI's intelligent agent system places more emphasis on versatility and freedom.
The Agent API, OpenAI Swarm, BrowserAgent, and CodeAgent have formed a complete intelligent agent development framework. With the improved inference consistency of GPT-5.2, it maintains its advantages in automated task execution, tool invocation complexity, and environmental adaptability.
The competition between the two is about who controls the next generation of computing paradigms: future software development will be centered on intelligent agents, and whoever controls the intelligent agent framework standard will control the next generation of computing paradigms.
Reference link:
https://ai.google.dev/gemini-api/docs/deep-research?hl=zh-cn
https://techcrunch.com/2025/12/11/google-launched-its-deepest-ai-research-agent-yet-on-the-same-day-openai-dropped-gpt-5-2/
This article is from the WeChat official account "InfoQ" , authored by Dongmei, and published with authorization from 36Kr.




