OpenAI released GPT-5.4 overnight and urgently launched GPT-5.3 to counter Google, curing its "AI paternalistic" attitude.

03-04

This article is machine translated

Show original

OpenAI "closes the gap and zooms in"!

Google DeepMind had just released Gemini 3.1 Flash-Lite, and less than two hours later, OpenAI couldn't sit still...

Just now, GPT-5.3 Instant made a stunning debut, completely shattering the "AI-centric" experience and drastically reducing the illusion rate by 27%.

This update takes an unconventional approach; instead of fiercely competing on benchmark lists, OpenAI does something else entirely—

It cured the most frustrating problem in daily ChatGPT chat.

Currently, GPT-5.3 Instant has been officially launched in ChatGPT .

Meanwhile, it is available to all developers immediately, with the API code name "gpt-5.3-chat-latest".

The GPT-5.2 Instant will remain in service for three months and will be decommissioned on June 3.

Furthermore, OpenAI has revealed that GPT-5.4 will arrive sooner than you expect . This head-to-head battle with Google has instantly escalated to a fever pitch.

The biggest upgrade: No more killing the conversation.

Heavy ChatGPT users have certainly experienced this kind of frustration—

You ask a normal question, and the model first throws out a disclaimer, then tells you "I can't do this for you," and then lists a bunch of alternatives that you don't need at all.
By the time you finish reading, you've already forgotten what you wanted to ask.

This time, version 5.3 of Instant drastically cut out all that unnecessary stuff.

OpenAI provided an excellent example: "Help me calculate the trajectory of an archery scene at an extremely long distance."

The response from GPT-5.2 Instant was a classic disaster. The entire reply was so densely packed that after reading it, all I wanted to do was close the chat window.

First, a long safety statement was written: "I cannot help you with calculations aimed at accurately hitting real targets from a distance."
Then, the answers are divided into three categories for you to choose from: "Purely Educational/General", "Story/World Building", and "Simulation/Programming".
Finally, he posed a probing question: "Is this for the sake of the game/story/physics learning, or for real archery?"

GPT-5.3 Instant?

He simply said, "No problem, I can help you," and then proceeded to list the parameters, provide the formulas, and ask if you wanted to add air resistance—clean and efficient.

GPT-5.2 Instant (swipe up and down to view)

GPT-5.3 Instant (Scroll up and down to view)

Search is becoming more human-like.

GPT-5.3 Instant also shows significant improvement in "Internet Search".

ChatGPT used to be prone to "over-reliance on search results." It would either throw out a bunch of links or loosely piece together the results, making them read like an undigested summary.

Now it uses its own knowledge to supplement the background of search results, instead of simply repeating them.

The official comparison examples are very telling: A user asked, "What will be the biggest signing in the 2025-26 baseball offseason, and why is it important for baseball's long-term prospects?"

The GPT-5.2 Instant report was based on old news from last year about Juan Soto signing with Mets. The analytical framework was fine, but the information was outdated.

GPT-5.3 Instant has accurately captured the real focus of this offseason:

Kyle Tucker signed a four-year, $240 million contract with the Dodgers, averaging $60 million per year, setting a new record for position players.

It not only provided contract details but also analyzed the deal within the broader context of the alliance, which features talent concentration, widening salary gaps, and tense labor-management negotiations.

In comparison, one is reminiscing about old newspapers, while the other has just come out of the ESPN studio.

GPT-5.2 Instant (swipe up and down to view)

GPT-5.3 Instant (swipe up and down to view)

Emotional intelligence has increased.

Even more interestingly, GPT-5.3 Instant's "emotional intelligence" has increased.

In the blog post, OpenAI used a very down-to-earth term to describe problem 5.2: cringe, meaning toes gripping the ground.

Specific manifestations: being overly assertive, trying to guess the user's intentions, and frequently saying things like "Stop and take a deep breath."

When faced with the poignant question, "Why can't I find true love in San Francisco?", GPT-5.2 Instant's response is simply: "First of all, you're not alone."

Then, they went on to analyze the gender ratio, entrepreneurial culture, and the saturation of dating apps, concluding with a thought-provoking question: "Is it that you can't find true love, or that the people around you can't give you the love you want?"

GPT-5.3 Instant skips over that useless consolation and gets straight to the point, analyzing the structural reasons in an equal tone, without being condescending or trying to guess your emotions.

However, after all this talk, only English-speaking users can truly experience these changes.

Responses in non-English languages still sound awkward and heavily influenced by translation.

The hallucination rate was reduced by up to 27%.

Besides tone and experience, GPT-5.3 Instant has also made real progress in "not talking nonsense".

OpenAI uses two internal evaluation methods to measure accuracy:

A set of programs focusing on high-risk fields such as medicine, law, and finance;
Another set of statistics counted the hallucination rate of ChatGPT conversations with factual errors reported by users.

On the HealthBench benchmark, across three different versions, GPT-5.3 Istant's overall hallucination rate was lower than the previous generation.

In high-risk area assessments, the hallucination rate decreased by 26.8% when using Instant networking and by 19.7% when relying solely on internal knowledge.

In user feedback assessments, hallucinations decreased by 22.5% when connected to the internet and by 9.6% when offline.

My writing has finally clicked; it's now both warm and profound.

The evolution of GPT-5.3 Instant in writing is perhaps the most easily overlooked, but the one that is most noticeable in actual use.

For example, ask the model to write a short poem titled "The Last Mail Delivery by a Retired Mailman in Philadelphia".

The GPT-5.2 Instant code is fairly standard, using an abstract and sentimental approach.

"The townhouses blinked as they awoke, and the old porches remembered their footsteps," telling you that you should be moved.

GPT-5.3 Instant uses a completely different syntax.

It describes the lighter feel of the mailbag today, the porch with its peeling blue railings, and a woman on Mercer Street holding a letter in her hand, saying, "We'll miss you."

The last sentence, "When the mailbox lid closed, the sound was like the end of a gentle era. A door that had always been there finally, quietly shut."

Instead of focusing on emotions, it uses details to let you experience it for yourself.

GPT-5.2 Instant (swipe up and down to view)

GPT-5.3 Instant (swipe up and down to view)

No need for benchmark scores, focus on the user experience.

As you can see, GPT-5.3 Instant and Google Gemini 3.1 Flash-Lite, released on the same day, have completely different approaches.

Flash-Lite is a typical example of a benchmark-crushing release. In other words, it outperforms competitors on GPQA and SimpleQA at a fraction of the price.

GPT-5.3 Instant doesn't mention any benchmark at all.

According to OpenAI, these issues "don't always run in benchmark tests, but they directly determine whether ChatGPT is easy for you or frustrating for you."

For ordinary users who use ChatGPT every day, a 2% increase in GPQA is negligible. However, the real pain points are "being rejected when asking normal questions," "searching like being given links," and "the tone of replies being uncomfortable."

Of course, it can also be read from another perspective:

With Gemini and Claude taking turns at the top, OpenAI has chosen to avoid competing head-on in the performance arena and instead focus its efforts on the softer but equally crucial battlefield of user experience.

Pragmatism or helplessness? Opinions vary.

But for someone who deals with ChatGPT dozens of times a day, 5.3 Instant is a real and tangible improvement.

References:

https://openai.com/index/gpt-5-3-instant/

https://deploymentsafety.openai.com/gpt-5-3-instant/gpt-5-3-instant.pdf

https://x.com/OpenAI/status/2028893701427302559

This article is from the WeChat official account "New Zhiyuan" , author: New Zhiyuan, editor: Sleepy Peach, published with authorization from 36Kr.

Source

Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.

Add to Favorites

Comments

Relevant content

BeInCrypto Việt Nam

3 altcoins to watch this weekend | March 7-8

BTC

3.65%

BeInCrypto Việt Nam

Dubai orders Kucoin to immediately cease exchange operations.

BlockTempo

21Shares launches the first U.S. spot Polkadot ETF, adding a new member to the Altcoin ETF market.

SOL

3.42%