History is often shaped by countless "accidents and coincidences".
In 2022, when researcher Hunter Lightman joined OpenAI, his colleagues were busy igniting the globally explosive ChatGPT - undoubtedly the fastest-growing phenomenal product in history.
However, Lightman quietly devoted himself to a seemingly inconspicuous team: MathGen.
Their only mission was to train AI models to tackle high school-level mathematical competition challenges.
Now, this once obscure MathGen team is considered the true reason why OpenAI can now dominate the industry!
On May 31, 2023, OpenAI published a research blog "Improving Mathematical Reasoning with Process Supervision", formally proposing the effectiveness of process supervision training.
And in the author list, researchers related to the MathGen team, including Hunter Lightman, appeared. This blog is one of the first official releases related to the MathGen team.
On the same day, Altman posted a congratulatory message on X - this was the first official confirmation of the MathGen Team's existence by OpenAI.
The "AI reasoning ability" they forged is precisely the heart of that ultimate technology - AI Agent!
This agent will independently complete any task you assign on the computer, just like a human!
"Back then, AI's mathematical reasoning ability was terrible!" Lightman recalled, "Our mission was to make it learn to truly think."
Evolution from a "Dumb Student" to an "Olympic Math Gold Medalist"!
To be fair, today's OpenAI models are far from perfect - they still "talk nonsense seriously", and those so-called AI agents are often helpless in the face of complex tasks.
But a massive transformation is happening!
OpenAI's top models have achieved an incredible breakthrough in mathematical reasoning!
Recently, an OpenAI model won a gold medal in the world's top International Mathematical Olympiad (IMO) competition!
OpenAI believes that this powerful reasoning ability can be replicated in any field!
This is the cornerstone of building a universal AI agent, the ultimate dream they have been pursuing since their founding!
If ChatGPT's success was an "unintentional world-changing masterpiece", a miracle that was accidentally ignited while trying to test quietly.
Then, the AI agent is the strategic crystallization that OpenAI has carefully planned for years!
"In the future, you just need to give instructions to the computer, and it will take care of everything for you!" OpenAI's CEO Altman declared at the 2023 developer conference, "This ability is the AI agent. The disruption it brings will be unprecedented!"
Whether Altman's prophecy will come true remains to be seen. But OpenAI has already made its move!
In the fall of 2024, their first AI reasoning model o1 emerged dramatically!
In less than a year, the 21 core researchers who created this miracle instantly became top talents that Silicon Valley is frantically competing for!
Zuckerberg spared no expense, offering a sky-high salary of over 100 million dollars, to poach 5 core members of the o1 team from OpenAI and build Meta's "super intelligence" legion.
One of them, Tsinghua alumnus Zhao Shengjia, was directly appointed as the Chief Scientist of Meta's Super Intelligence Laboratory!
A talent war surrounding the "AI brain" has become white-hot!
Reinforcement Learning: The Ancient Art that Ignites the Intelligence Revolution
Behind OpenAI's reasoning revolution is a long-standing technology called Reinforcement Learning (RL) being reborn.
It's like a strict coach who constantly rewards and punishes AI's choices in a simulated environment, teaching AI what is "correct".
This technology is not new.
As early as 2016, Google DeepMind's AlphaGo used it to defeat the world Go champion, making a sensation.
At that time, OpenAI's veteran employee Andrej Karpathy had begun to conceive how to use Reinforcement Learning (RL) to create an AI agent that could skillfully operate a computer.
However, from ideal to reality, OpenAI took years.
In 2018, OpenAI launched the groundbreaking large language model GPT series.
Paper address: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf
With massive data and GPU clusters, it became a genius in text processing and ultimately gave birth to ChatGPT.
But its fatal weakness was that it couldn't even handle basic mathematics.
Until 2023, a shocking breakthrough arrived!
A project codenamed "Q*" (later called "Strawberry") integrated large language models, Reinforcement Learning (RL), and a technique called "test-time computation"!
It gave the model extra thinking time, allowing AI to repeatedly plan, deduce, and verify before giving an answer.
The "Chain of Thought" (CoT) technology was thus born! AI's performance in handling unprecedented mathematical challenges was transformed!
"I witnessed the model start to truly reason," researcher El Kishky said excitedly, "It would find its own mistakes and then backtrack to correct them. It even showed frustration. It felt like reading someone's thoughts!"
These technologies are not groundbreaking when taken individually.
But OpenAI's genius lies in combining them in an unprecedented way, directly giving birth to their ace - o1.
At that moment, OpenAI suddenly realized: Isn't this planning and fact-checking capability the perfect engine driving AI agents?
"We've solved a problem I've been pondering for years!" Lightman said, "It was the most exciting moment of my research career!"
Igniting Reasoning: A Bottom-Up Gamble
With the AI reasoning model, OpenAI's ambition was completely ignited.
They discovered two entirely new evolutionary paths:
1. Invest more computing power in the late stages of model training!
2. Provide the model with more thinking time and computing power when answering questions!
"OpenAI has always been thinking not just about the present, but how to infinitely expand its advantages in the future!" Lightman said.
After the breakthrough of the 2023 "Strawberry" project, OpenAI quickly formed a special task force led by researcher Daniel Selsam, the "AI Agent" special ops team.
Their goal was simple: Push this new capability to the extreme!
Initially, the company didn't even strictly distinguish between "reasoning models" and "AI agents".
The shared goal was only one: Create a super AI capable of completing complex tasks!
Eventually, the work of this special ops team was incorporated into the more ambitious o1 model project, personally led by co-founder Ilya Sutskever and other top experts.
To create o1, OpenAI had to stake its most valuable resources—top talent and GPUs.
At OpenAI, resources are not allocated by seniority, but by capability.
Researchers must earn the company's full support through astonishing breakthroughs.
"At OpenAI, all research innovations come from the frontline, bottom-up." Lightman explained.
"When we laid out the amazing evidence of o1, the entire company immediately reached a consensus: this is it, full speed ahead!"
Many former employees believe that OpenAI's near-obsessive pursuit of Artificial General Intelligence (AGI) gave birth to this reasoning revolution.
They were single-minded, not swayed by short-term products, betting everything on building the strongest AI brain. Such a cost-insensitive gamble would be almost impossible at other AI giants.
Looking back, this decision was incredibly far-sighted!
By the end of 2024, many AI giants discovered that the traditional "pile up data, pile up computing power" model was yielding diminishing returns.
And the most exciting pulse in the AI field was coming from the progress of "AI reasoning"!
(Translation continues in the same manner for the rest of the text)This is the ultimate form of ChatGPT: an all-powerful AI agent that can handle everything on the internet for you and understand your intentions!
Compared to today's ChatGPT, this is worlds apart. But there is no doubt that OpenAI's research is rapidly moving towards this future.
However, the track is already overcrowded!
The absolute kings from a few years ago are now surrounded by strong competitors. DeepSeek, Google, Anthropic, xAI, Meta... are all eyeing the prize.
The question is no longer whether OpenAI can realize its agent future, but rather—
Can it be the first to cross the finish line in this fierce battle of heroes.
References:
https://techcrunch.com/2025/08/03/inside-openais-quest-to-make-ai-do-anything-for-you/
This article is from the WeChat public account "New Intelligence", author: New Intelligence, editors: Ding Hui, Hao Kun, published with authorization from 36kr.




