GPT-5.2's perceived "intelligence degradation" has drawn widespread criticism online, causing panic among Ultraman fans.

This article is machine translated
Show original

In the year-end AI drama, did OpenAI lose to Google? GPT-5.2 was met with widespread criticism online within 48 hours of its release. Third-party data, however, confirms that the Gemini 3 Pro is the true champion.

OpenAI played its "trump card" of the year, GPT-5.2, but still couldn't beat Google...

According to Epoch AI's latest report, GPT-5.2 scored 152 in the Capability Index (ECI), second only to the Gemini 3 Pro.

In multiple benchmark tests, GPT-5.2 did not dominate across the board.

In the FrontierMath exam, created by Terence Tao in collaboration with hundreds of mathematicians, GPT-5.2 only dominates the T1-3 levels, while T4 remains the high ground of Gemini 3.

In addition, GPT-5.2 achieved first place in Chess Puzzles.

The only exception is that on SimpleQA Verified, GPT-5.2 is inferior to GPT-5.1, meaning that the reliability of the iteration is worse.

Moreover, multiple third-party benchmark tests show that GPT-5.2 falls far short of expectations and fails to outperform Gemini 3.

In OCR-Arena, simple-bench, and Live-Bench, GPT-5.2 is even ranked after Claude Opus 4.5.

GPT-5.2 has barely made a splash in just two days since its release; in fact, it has been met with a lot of criticism from developers in the community.

In order to win this tough battle, OpenAI has sounded a "red alert" and prioritized improvements to ChatGPT.

Even more extreme, the internal development of AGI was completely halted, and Sora was also suspended for eight weeks, clearly indicating a do-or-die attitude.

However, from the industry's perspective, OpenAI has yet to escape its passive situation.

Heavy GPT-5 users have spoken out, saying, "GPT-5.2 is not far from becoming a stone."

Did OpenAI lose the year-end battle?

Three years ago, Google missed its chance and was overshadowed by OpenAI ChatGPT.

Recently, Google founder Sergey Brin returned to Stanford University for a speech, where he publicly admitted to his "biggest mistake":

We messed up—we were too afraid that AI would say the wrong thing, and as a result, we lost an era.

Now, with the Gemini 3 Pro and Nano Banana Pro, Google has returned to the forefront of the AI wave.

What goes around comes around. This time, it was OpenAI's turn, but it lost its footing in this crucial battle in 2025.

On its first day of launch, Ultraman excitedly announced that API calls had exceeded one trillion tokens, and the growth rate was extremely fast.

Previously, Information reported that GPT-5.2, codenamed Garlic, was originally scheduled to be unveiled early next year.

Throughout Silicon Valley, a rumor circulated that OpenAI's pre-training had ended, and that GPT-5.1 might even be based on training after 40, thus offering little improvement .

Indeed, OpenAI has encountered a scaling bottleneck in pre-training.

Pre-training scaling may not be very effective.

Regarding the development of GPT-5.2 (garlic), the original source claimed that OpenAI solved some key problems encountered in the pre-training stage.

Improve upon the previous "best" and "much larger" pre-trained model.

Internally, OpenAI integrated the bugs fixed during the development of "Shallotpeat" and accumulated a lot of pre-training experience.

As the Information states, the most crucial breakthrough occurred during the "pre-training phase".

However, all of the above information comes from news reports. Whether OpenAI has actually achieved a major breakthrough in pre-training remains unknown.

However, the fact that GPT-5.2 outperformed the Gemini 3 benchmark across the board suggests that it has made some improvements in pre-training.

However, based on third-party reviews and user feedback, GPT-5.2 has not achieved any breakthroughs in its underlying technology iterations.

In another Epoch AI evaluation, Gemini 3 still outperformed top-tier AI models in long-term task performance.

Gemini 3 Pro: 4.9 hours

GPT-5.2: 3.5 hours

Opus 4.5: 2.6 hours

As engineer Dan Mac stated, the Gemini 3 Pro possesses deeper intelligence because of the strongest Google pre-training.

GPT-5.2 possesses the best dedicated intelligence, which is the result of OpenAI's post-training optimizations.

Early next year, there will be even bigger [events].

According to a recent report by The New York Times, OpenAI will continue to focus on optimizing ChatGPT in the coming weeks.

They are preparing for a larger launch early next year.

Internally, OpenAI operates in parallel with a "dual-track" approach, focusing on both B2B and B2C strategies.

OpenAI is also advancing other projects, including trials related to advertising and e-commerce.

Despite the criticism, they are still exploring "more restrained" methods, such as completing shopping via ChatGPT and taking a cut from the transactions.

In the enterprise market, OpenAI is introducing the same set of AI technologies that underpin ChatGPT into the enterprise software field.

Data shows that ChatGPT has over 800 million weekly users, representing a market share of approximately 76%.

An AI expert said, "Consumer AI is almost synonymous with OpenAI. If this were lost, the company would not have the value it has today."

However, in the past 12 months, many AI startups around the world have developed technologies that can match, or even surpass, OpenAI’s leading model in some aspects.

The release of Google Gemini 3 Pro is a significant blow to OpenAI's business.

Gemini 3 outperforms GPT-5.2; was OpenAI just making a feint?

From the perspective of users' actual testing, GPT-5.2 still has a lot of room for improvement.

Some netizens, unable to bear it any longer, bluntly stated that OpenAI is completely brainless:

GPT-5.2's tone is icy, comparable to the Arctic, completely disregarding user experience. "It keeps regressing, making the originally normal and natural language more and more outrageous, and finally turning it into a bunch of insults and preaching, and then selling it as some kind of victory."

OpenAI deserves to be scared off by Gemini 3.

For example, in visual reasoning , Gemini 3 Pro completely outperforms GPT-5.2.

In 3D model generation, GPT-5.2 is slower and more expensive, and its overall performance is not as good as Gemini 3.

In terms of generating transgressive novels , GPT-5.2 ranks last, inferior to Gemini 3 Pro, Claude 4.5 Opus, and Grok 4.

Transgressive fiction is a literary genre that centers on characters who yearn to break free from social constraints and basic norms.

These works typically involve a range of taboo themes, dark subjects, and extreme issues.

In front-end code generation, Gemini 3 is far ahead, while GPT-5.2 is still far behind.

Under the same prompt, over 530,000 people discussed the design of the Gemini 3, GPT-5.2, and Claude Opus 4.5 on the homepage of the fitness dashboard.

Keyword: Fitness Dashboard Homepage. The top displays a weekly activity overview (compact), followed by today's calories burned and a circular progress bar (compact card). Below the calorie cards is a continuous workout counter, and the bottom features a weekly workout bar chart. Mobile application, single-screen display. Visual Style: Light color scheme, soft milky white background, rounded cards with subtle shadows, coral as the primary accent color, and electronic blue for charts and highlighted sections. Clean sans-serif typography, modern card layout. Mood: Inspiring and energetic. Fresh, pure, and approachable. Modern health aesthetics, inspiring and uplifting.

GPT 5.2 almost always ranked last:

Developer Mattia used the AI search model Perplexity to review all the reviews, and the Gemini 3 emerged as the ultimate winner!

If the above are just isolated cases, then the following data does not lie: GPT-5.2 is inferior to Gemini 3 Pro.

GPT-5.2 suffered a crushing defeat.

On the betting website Ploymarket, most users believe that Google will have the best AI model by the end of this year.

On Dubesors, a benchmark for small manual transmission performance by user Lisan al Gaib, the Gemini 3 Pro ranked first, while the GPT-5.2 ranked 16th.

CAIS (Center for AI Safety), dedicated to promoting AI safety research and raising public awareness, released its latest CAIS AI Dashboard. The results showed that Gemini 3 Pro outperformed GPT-5.2 in text and visual capabilities, but lagged behind GPT-5.2 in risk index.

In the text proficiency index test, the Gemini 3 Pro only lagged behind in ARC-AGI-2, while it was almost completely defeated in GPT-5.2!

In the visual capability index test, the Gemini 3 Pro once again won almost all the tests, scoring 4.5 points higher than the average score of GPT-5.2!

In the risk index test, GPT-5.2 outperformed Gemini 3 Pro, but lagged behind Claude Opus 4.5 and Claude Sonnet 4.5.

On Terminus, a test platform for evaluating the ability of language models to drive autonomous agents in terminal environments, Gemini 3.0 Pro and GPT-5.2 are almost on par, but Gemini 3.0 Pro still outperforms GPT-5.2 in high inference mode by an average of 0.2%.

In addition, netizens also verified other benchmark tests, such as SWE-Bench and IUMB:

In summary, GPT-5.2 appears to have failed, seemingly lagging behind Gemini 3 in several key benchmark tests:

Ultraman Christmas Surprise

On the day GPT-5.2 was released, Ultraman also teased that there would be a "Christmas gift" the following week.

As for the new product, it will likely be the next-generation GPT Image v2 model.

A few days ago, two mysterious AI image models, "Chestnut" and "Hazelnut," were tested on the LM Arena platform.

However, after testing, the developers stated that the OpenAI image model appears to be less than promising.

In terms of image generation/editing, the GPT image model lags far behind the Nano Banana Pro powered by Gemini 3.

Moreover, the output results have a series of problems—

Problems include yellowish hue, poor logic, weak consistency, low image quality, and insufficient world knowledge.

It is said that the base of this model may still be GPT-4o.

Has the final battle of 2025 truly come to an end?

References:

https://www.nytimes.com/2025/12/11/technology/openai-google-ai-technology-gap.html

https://dashboard.safe.ai/

This article is from the WeChat public account "New Intelligence" , author: PeachKingHZ, and published with authorization from 36Kr.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments