OpenAI today announced an improved version of its most capable artificial intelligence model to date—one that takes even more time to deliberate over questions—just a day after Google announced its first model of this type.
OpenAI’s new model, called o3, replaces o1, which the company introduced in September. Like o1, the new model spends time ruminating over a problem in order to deliver better answers to questions that require step-by-step logical reasoning. (OpenAI chose to skip the “o2” moniker because it's already the name of a mobile carrier in the UK.)
“We view this as the beginning of the next phase of AI,” said OpenAI CEO Sam Altman on a livestream Friday. “Where you can use these models to do increasingly complex tasks that require a lot of reasoning.”
The o3 model scores much higher on several measures than its predecessor, OpenAI says, including ones that measure complex coding-related skills and advanced math and science competency. It is three times better than o1 at answering questions posed by ARC-AGI, a benchmark designed to test an AI models’ ability to reason over extremely difficult mathematical and logic problems they’re encountering for the first time.
Google is pursuing a similar line of research. Noam Shazeer, a Google researcher, yesterday revealed in a post on X that the company has developed its own reasoning model, called Gemini 2.0 Flash Thinking. Google’s CEO, Sundar Pichai, called it “our most thoughtful model yet” in his own post.
The two dueling models show competition between OpenAI and Google to be fiercer than ever. It is crucial for OpenAI to demonstrate that it can keep making advances as it seeks to attract more investment and build a profitable business. Google is meanwhile desperate to show that it remains at the forefront of AI research.
The new models also show how AI companies are increasingly looking beyond simply scaling up AI models in order to wring greater intelligence out of them.
OpenAI says there are two versions of the new model, o3 and o3-mini. The company is not making the models publicly available yet but says it will invite outsiders to apply to perform testing of them. OpenAI today also revealed more details of techniques used to align o1. This involves having the model reason about the nature of the request it is given to interrogate whether it may contravene its guardrails.
Large language models can answer many questions remarkably well, but they often stumble when asked to solve puzzles that require basic math or logic. OpenAI’s o1 incorporates training on step-by-step problem-solving that makes an AI model better able to tackle these types of problems.
Models that reason over problems will also be important as companies seek to deploy so-called AI agents that can reliably figure out how to solve complex problems on a users’ behalf. The o3 model is 20 percent better than o1 at a SWE-Bench, a test that measures a models’ agentic abilities.
“This really signifies that we are really climbing the frontier of utility,” Mark Chen, senior vice president of research at OpenAI said on today’s livestream.
“This model is incredible at programming,” Atlman added.
While a true breakthrough moment has eluded tech giants at the end of the year, the pace of AI announcements has been dizzying of late.
Early this month Google announced a new version of its flagship model, called Gemini 2.0, and demonstrated it as a web browsing helper and as an assistant that sees the world through a smartphone or a pair of smart glasses.
OpenAI has made numerous announcements in the run up to Christmas, including a new version of its video-generating model, a free version of its ChatGPT-powered search engine, and a way to access ChatGPT over the phone by calling 1-800-ChatGPT.
Update 12/20/24 1:16pm ET: This story has been updated with further comment and detail from OpenAI.