Actual test results for GPT-5.2: Slight increase in price surge capability, what makes it capable of countering Gemini?

avatar
36kr
12-12
This article is machine translated
Show original

GPT 5.2, designed to outperform Gemini, was officially released early this morning and rolled out to all users.

I just canceled my ChatGPT Plus subscription last month and switched to Gemini. Do I need to go back to it now because of GPT-5.2?

After reading these real user experiences shared by netizens and the hands-on test of APPSO, you might find an answer.

This time I finally didn't draw the table wrong.

GPT 5.2 actually updates three models: GPT-5.2 Instant, Thinking, and Pro. If you're used to the thoughtful consideration involved in each question and answer in Gemini 3.0 Pro, you'll find that ChatGPT's thinking speed is slower and takes longer than before when you start using GPT-5.2 Thinking/Pro.

This is also the feedback shared by most users who have had early access on social media. In other words, GPT-5.2 is an improvement over 5.1 in every aspect, and GPT-5.2 Pro is very suitable for professional inference tasks that require a long time to complete; however, the waiting time for the results becomes longer.

For example, one user shared that when they entered the prompt "Help me draw a chart of HLE test scores", it took GPT-5.2 Pro a full 24 minutes to generate the chart.

Image source: https://x.com/emollick/status/1999185755617300796/photo/1

Fortunately, all the information was accurate, and even the best result on the chart still showed Gemini 3.0 Pro.

This is also thanks to the fact that the knowledge expiration date for GPT-5.2 has moved to August 2025 , while the knowledge expiration date for GPT-5.1 was September 2024, and the knowledge expiration date for Gemini 3.0, which was just released last month, is 2025.1.

When we used GPT-5.2 Thinking to generate a chart of OpenAI's model release history, it didn't take too long, and the information was quite accurate. For simple tasks, the time taken using the Thinking model is significantly different from that of the Pro model.

Hint: Generate a chart graph of OpenAI model release over time

With its "ultra-high intensity" reasoning and the latest world knowledge, combined with multimodal understanding and reasoning capabilities of images, GPT 5.2 quickly soared to second place in the large model arena. GPT-5.2-High ranked second in the WebDev (web development) project , while GPT-5.2 ranked sixth. In comparison, Gemini 3.0 Pro ranked third, with Claude still holding the top spot.

LMArena also released a test video showing them using GPT-5.2 to complete a series of 3D modeling tasks with a very high level of accuracy. However, some netizens commented below, "Is this still in 2003?"

Video source: https://x.com/arena/status/1999189215603753445

This 3D effect achieved using three.js requires a high degree of multimodal understanding and reasoning capabilities of the model, as well as optimization in programming development and program design; GPT-5.2 is well worth this 0.1 upgrade.

Currently, many tests shared by netizens mainly focus on building complete 3D engines, and GPT-5.2 performs very well. For example, there are also tests using GPT-5.2 Thinking's high-difficulty reasoning mode, which, in a single-page file, built a 3D snowy ice kingdom model that supports interactive control and can be exported in 4K resolution.

https://x.com/skirano/status/1999182295685644366

There are also 3D turbulent Gothic city buildings created using GPT-5.2 Pro.

Tip: create a visually interesting shader that can run in twigl-dot-app make it like an infinite city of neo-gothic towers partially drowned in a stormy ocean with large waves.|Source: https://x.com/emollick/status/1999185085719887978?s=20

Regarding 3D understanding and reasoning capabilities, we also used the prompt Ian Goodfellow used after the release of Gemini 3.0 Pro, which is to upload an image and then tell the model to generate a beautiful voxel art Three.js single-page application scene based on the image.

Since ChatGPT didn't generate it for me within the canvas, I copied the code it generated in the dialog box and opened it in HTML View, as shown in the image on the right.

The difference is quite obvious. Although ChatGPT also read the content of the uploaded image—a pink book, a green field, a gray sinking, and white water—the 3D animation it generated was somewhat rudimentary compared to Gemini 3.0 Pro.

All I can say is that Ultraman issuing this "red alert" demonstrates Gemini's true quality.

Testing programming skills inevitably involves classic hexagonal ball physics simulations. One blogger has increased the difficulty by using entirely illuminated red 3D balls. The effect looks very cool, and many netizens are asking how it was achieved; however, some have pointed out that these balls seem to be unaffected by gravity.

Then some netizens replied that this was simulating space.

Video source: https://x.com/flavioAd/status/1999183432203567339

There's also an SVG code test, and a pelican riding a bicycle.

Image source: https://arena.jit.dev/

Some netizens also shared that they made a forest fire simulator using GPT-5.2, which can adjust the speed, area size, and range of fire burning, etc.

Image source: https://x.com/1littlecoder/status/1999191170581434557?s=20

We created a webpage for planetary signals, with a layout almost identical to this forest fire visualization webpage. The only difference is that the content displayed on the left side has been changed from scattered stars to celestial bodies.

Tip: Create an interactive HTML, CSS, and JavaScript simulation of a satellite system that transmits signals to ground receivers. The simulation should show a satellite orbiting the Earth and periodically sending signals that are received by multiple

We also used the instant camera we made with the Gemini 3 to test the GPT-5.2. We entered the same prompts, asking it to develop a retro instant camera-style web application.

Prompt: Develop a retro-style skeuomorphic single-page camera app. The page background should be designed as corkboard or dark wood grain material. A skeuomorphic instant camera model, drawn entirely with CSS or SVG, should be fixed in the lower left corner, with the lens area displaying the user's camera view in real-time. In terms of interaction logic, when the user clicks the shutter button, a shutter sound effect should play, and a photo paper with a white border should slowly emerge from the top of the camera. Use CSS filters to make the emerging photo initially highly blurred and black and white, smoothly transitioning to a clear, full-color state within 5 seconds. Finally, all developed photos must be draggable, allowing users to freely place them anywhere on the page, with random slight rotation angles and shadows. Clicking on a photo should bring it to the top, creating a realistic free-form photo collage wall.

Somewhat surprisingly, ChatGPT can also make instant photos in one go.

When we tested Gemini 3.0 Pro before, its most powerful capabilities were programming and the fact that it didn't require us to input many prompts. We could simply give it a screenshot or video and tell it to replicate it, and Gemini could do it.

This time, we also gave it a video and asked it to replicate the webpage that generates this ancient poem.

https://chatgpt.com/canvas/shared/693b6d1b8fa881919c6298a4aed05581

Compared to GPT-5.1, which was completely unaware of the color scheme of my uploaded videos, this time it seems to have learned something. However, since Gemini-generated web pages can directly incorporate AI functionality through its API, ChatGPT hasn't yet integrated AI into these generated web pages. Therefore, the poems here can only be the few that have already been written.

Besides classic programming ability tests and simply creating a single-page HTML file, some users also use it to write Python code.

The suggestion entered by the user was "write a Python code that visualizes how a traffic light works in a one-way street with cars entering at random rate."

He tested both GPT 5.2 Extended Thinking and Claude Opus 4.5, and the results were obvious. It's safe to say that we're often asked which programming model is the best, and there's a reason why Claude is so popular among developers.

The image below shows GPT-5.2. Source: https://x.com/diegocabezas01/status/1999228052379754508

Moreover, the biggest drawback of the Claude model before was its high price. Claude Opus 4.5 cost $5 per million tokens as input and $25 per million tokens as output. Now, the price of GPT-5.2 has caught up, and it is about 40% more expensive than GPT-5.1. GPT-5.2 Pro costs $21 per token as input and $168 per token as output.

In its official release blog, OpenAI mentioned that GPT-5.2 has also improved its image processing capabilities.

GPT-5.2 Thinking is our most powerful visual model to date, reducing the error rate by about half in graph reasoning and software interface understanding.

It also provides an example of using AI to add some boxed markings to a motherboard that looks blurry; compared to GPT-5.1, GPT-5.2, although it also makes mistakes, marks more areas.

But what about Nano Banana Pro? Some users have used Nano Banana Pro to remove the annotations from the images and then asked it to add new target location boxes. Which do you think is better?

From left to right: GPT-5.1, GPT-5.2, Nano Banana Pro | Image source: https://x.com/bcaine/status/1999212747213656072

My feeling is that ChatGPT is "humiliating itself" in areas where others excel. Nano Banana is now the undisputed leader in image-related work, even though GPT-5.2 has more annotation information, many bounding boxes are still not accurately located.

Programming and image processing have seen significant improvements compared to the previous generation GPT-5.1. If you've been a ChatGPT user for a while, you should be able to directly feel the difference after the upgrade. However, compared to other models, in terms of programming and image processing, it still doesn't achieve the same level of dominance as Nano Banana did when it was first released.

Regarding aesthetic web design, some netizens have shared some front-end web pages they created using GPT-5.2. Let's see if front-end programmers will be dragged out and "killed" again this time.

Image source: https://x.com/secondfret/status/1999235822034547011

Compared to the previously ubiquitous gradient purple, the design level of GPT-5.2 has indeed improved. However, as the blogger himself said, GPT-5.2 seems to particularly like drawing squares on the screen, with layers of grids everywhere.

There is also a special list regarding design capabilities. GPT-5.2 has made a leap forward, jumping from GPT-5.1, which was previously ranked outside the top ten, to third place. However, the highest score is still Gemini 3.0 Pro.

Image source: https://www.designarena.ai/leaderboard

We also gave GPT-5.2 some requirements to create a "high-end" website, specifically for an AI company's homepage. The result? GPT-5.2 really loves using boxes; and I somehow ended up with gradient purple again.

Tip: You are the top 0.1% designer and developer for the world's cutting-edge innovation on front-end design and development. You are tasked to create a full landing page with {Dither + Shaders} using {WebGL + ThreeJs} in the styling of an uploaded image for the AI company. - Focus mainly on the design part, not the development. Import all necessary files and libraries: Three.js, WebGL, GSAP, Any other animation libraries related to 3D development.

Finally, regarding writing, according to feedback from some users who had advanced experience, GPT-5.2 is starting to have the ability to complete the creation of some long novels.

For example, when ChatGPT is asked to generate 50 plot ideas, it completes them all, instead of generating only a portion like other models. And when asked to write a 200-page book, ChatGPT doesn't simply say it can't do it; instead, it actually tries, not only constructing the entire book's structure but also generating a PDF file.

Netizens commented that although the pages themselves are rather thin and the book is short... after all, it's currently impossible for it to write a novel that can be published in one go, but the fact that it can actually start doing it, giving 50 ideas and writing a 200-page book, shows that it has enough depth of thought.

The most remarkable thing about GPT-5.2 is its ability to follow instructions very well... not just basically do what I say, but actually complete the entire task I describe.

GPT-5.2 should now be gradually being rolled out to all users. What's your hands-on experience like?

The upgrade to GPT-5.2 wasn't enough to make me switch from Gemini. Although it showed it had broken many leaderboards, achieving good results in both its own and public tests, the actual experience was lacking. In the 3D program generation part, code errors were frequent, and the overall aesthetic style hadn't seen any significant improvement, all for the price.

Netizens' sharp comments

Gemini hasn't stopped either, continuing to put pressure on Ultraman. Early this morning, while no new model was released, a redesigned Gemini Deep Research was launched, accessible via API, and will be upgraded in Gemini, Google Search, and NotebookLM in the future.

The new Gemini Deep Research Agent outperformed the newly released GPT-5.2 Thinking (which scored 45.5%) in the Human Last Exam (HLE) with a score of 46.4% (the highest being GPT-5.2 Pro at 50.0%), and also achieved good results in Google's own DeepSearchQA and BrowseComp tests.

Ultraman's red alert will probably be on for a while longer.

This article is from the WeChat official account "APPSO" , authored by Discover Tomorrow's Products, and published with authorization from 36Kr.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments