Is GPT-5.2 out of its mind, costing 400 times more than DeepSeek?

12-15

This article is machine translated

Show original

It is 400 times more expensive than DeepSeek and nearly 10 times more expensive than Google's Gemini 3 Pro.

What exactly is the level of OpenAI's newly released GPT-5.2?

To put it simply, this guy might be the most suitable AI for working people, because it is likely to initiate the transformation of AI from human assistants to experts.

Firstly, in terms of professional knowledge, GPT-5.2 is 70% confident that it can outperform the industry experts watching videos on their screens.

Looking at benchmark scores alone, the GPT-5.2 outperforms the Gemini 3 Pro slightly in every aspect.

Of course, it was only slightly higher, and it's possible that OpenAI was trying to boost its score against Gemini.

But what OpenAI cares about most this time is actually the final GDPval test result.

This is a brand new testing method they proposed on September 25th this year, used to measure whether AI can truly help workers complete their jobs.

So they invited experts from nine fields and forty-four industries to come up with a bunch of questions based on their work environments.

Then let's see if AI can do the work of these experts.

The result is that the latest GPT-5.2 can match, or even outperform, humans in 70% of the tasks.

We also had a brief hands-on experience with this new model, having GPT-5.2 collect statistics on all the models released by these AI companies on the internet.

Then, the scores generated by these models on various leaderboards are tallied, and finally, these scores are tabulated by month.

After a full 14 minutes of deliberation, GPT-5.2 successfully completed the tasks of data collection, statistical analysis, and table creation for us.

The level of completion is indeed quite good.

In addition, GPT-5.2 can also complete some complex table tasks, and the tables it produces are much more aesthetically pleasing than those it used to create.

Moreover, there has been an improvement of about 9% in the test indicators for various tasks.

GPT-5.2 has also seen significant improvements in coding performance.

The probability of experiencing hallucinations has decreased by 38% compared to before.

The goal is to give everyone more peace of mind when using it.

We did a simple test, but perhaps because of the excellent Gemini, the GPT-5.2 felt somewhat unremarkable to me.

Let it write an Aimlab (a small game for practicing aiming)

It can indeed be written, and the program can not only run, but also adjust basic parameters such as target size and game duration.

There's nothing wrong with these, but they're just a bit too conventional.

In terms of aesthetics, it was somewhat outclassed by the Gemini 3 released last month.

For the same game created with the same sentence, Gemini has already started considering various trendy color schemes, while GPT is still painting plain white walls and building a bare house.

Of course, it's also possible that I didn't specify what form GPT should take.

In addition to improvements in various work capabilities, GPT-5.2 has another very interesting change.

It has become better at understanding human speech.

During testing, it was discovered that if GPT was asked to write 50 ideas, it would seriously write 50 ideas, instead of starting to slack off after writing 10 ideas, as was the case with previous models.

In addition, OpenAI has also enhanced its contextual capabilities. In the pin insertion experiment, even when the text length reached 256K, the success rate was still close to 100%.

This is equivalent to him being able to pinpoint exactly where you secretly added some material or insulted me in a few places in a classic novel of hundreds of thousands of words.

This is another major boost for working professionals and researchers who write code, do academic research, and summarize and organize documents.

Despite its impressive paper strength, it still stumbled in some areas.

For example, in the image recognition cases showcased by the official team, it was found that the granularity of Gemini 3 Pro completely outclassed GPT 5.2.

Some people are complaining that with the new model released, the old version will probably become even less intelligent.

It's a classic old anime.

Finally, the release of GPT-5.2 actually shows us a trend.

That is, in the future, the differences between top models may become more and more obvious, with each one leaning slightly towards a particular scientific approach.

For example, Gemini may be far ahead in the multimodal domain; GPT is still ahead of its peers in logical reasoning and productivity; and Claude continues to lead by a wide margin in coding ability and writing.

Ultimately, the differences among major companies regarding how to achieve AGI have become apparent. Google may believe that multimodal perception of the world is the future; OpenAI, on the other hand, believes in extreme logical reasoning and productivity improvements; and Anthropic believes that high-dimensional semantic understanding and alignment are the keys to AGI.

The current situation of AI taking turns holding the top spot continues, and in order, Anthropic should be next to make a move.

By the way, I also want to remind you guys one more time: when is Ultraman going to release the adult mode you promised?

This article is from the WeChat public account "Cha Ping X.PIN" , authored by Jiang Jiang & Zao Qi, and published with authorization from 36Kr.

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content

ME News

Breaking News! The Year of China's RWA: A Compliant Channel Opens for Trillions of Yuan in Domestic Assets to Go Global

BlockTempo

Arthur Hayes speculates that the reason for the BTC crash is "institutional hedging operations": IBIT options saw a surge of $900 million.

BTC

2.7%

The Defiant

Bitcoin Selloff Sparks Hedge Fund Speculation Around BlackRock ETF

BTC

2.7%