The first batch of GPT-5.6 tests are here, accurately targeting Mythos.

avatar
36kr
06-10
This article is machine translated
Show original

Just now, Anthropic released its trump card, which it had been keeping hidden for two months – the Claude Fable 5 and Mythos 5 , which was tantamount to dropping a bomb.

The pressure is now directly on OpenAI.

At the same time, GPT-5.6 was also leaked.

Starting last week, OpenAI has been testing two new checkpoints internally codenamed kepler and kindle . Kindle-alpha has reportedly been selected as a release candidate.

The internal test version of GPT-5.6 has begun to be tested extensively by overseas developers and the leak community. Codename, candidate version, benchmark results, and user experience have all been unearthed.

Whether it's vying for an IPO or having the same flagship model, the two companies are constantly "filing their applications and I'll file mine too" and "releasing new models and I'll release new models too".

They were fighting fiercely.

But the question is, can GPT-5.6 really beat Mythos?

GPT-5.6 surfaced

As of now, OpenAI has not made any official announcements regarding GPT-5.6, and it has not yet been officially released.

However, many overseas netizens have already conducted probe tests on the "internal checkpoints" that have not yet been made public.

A checkpoint is a snapshot of the model's parameters at a specific point in time during the training process.

OpenAI stores many versions internally, compares them horizontally, and then selects one version that is considered "good enough to be released". This version is called the release candidate (RC).

Starting last week, OpenAI has been internally testing two new checkpoints, codenamed kindle and kepler. Kindle-alpha has been selected as a release candidate.

Based on leaked user feedback, the most frequently mentioned upgrade in GPT-5.6 is the front-end/UI generation .

According to Pankaj Kumar, the front-end generation capabilities of Kindle Alpha have been greatly improved, and it can directly produce stronger interface outputs without the need for complex prompts or additional techniques .

In addition, its visual capabilities are also very strong, performing well in image understanding and image referencing tasks, and showing significant improvements in reasoning, coding, and UI generation overall.

This is a test conducted by user Chris on a Kindle, using the medium setting:

This is the result of another user's previous test on the non-deduction version of Joule:

It's clear that the former is much more exquisite.

However, user Leo used the same prompt to test both the Kepler and Kindle versions on the xhigh setting.

I discovered that Kindle has actually regressed compared to Kepler.

Hmm... it's really hard to judge the effect.

He even predicted that OpenAI would likely continue to refine its design and might eventually abandon the Kindle version as a candidate .

The latest news is that Kindle has been removed from Arena, and a new model, Levi, has appeared.

Some netizens speculated that Levi might be a codename for an internal version of GPT-5.6, and compared its front-end capabilities with those of GPT-5.5:

It's clear that Levi's front-end is quite impressive, with a clean, simple, and sophisticated style, and excellent attention to detail.

However, some netizens discovered after investigation that Levi may come from Meta, rather than GPT-5.6.

So, can GPT-5.6 actually beat Mythos?

User mark_k claims that GPT-5.6 "beats Mythos on multiple agentic coding benchmarks".

However, the more convincing evidence at present comes from the test conducted by user Leo, as shown earlier. He believes the situation with GPT-5.6 is not optimistic:

Kindle is a step backward compared to Kepler. In its current form, it would be easily defeated by Mythos .

In June, the "Fast and Furious" trio will be on display.

June brings the arrival of summer, and the world of large model kits is heating up.

The release dates of models from the three leading overseas AI companies all coincided: Fable 5, Gemini 3.5 Pro, and GPT-5.6, creating a "race against time."

Moreover, they are targeting the same set of capabilities—reasoning, intelligent agents, coding, and front-end generation.

Interestingly, although all three companies set their deadlines for June, only Company A has actually submitted its paper so far .

The Gemini 3.5 Pro was unveiled at Google I/O on May 19, highlighting its 2 million token context and Deep Think inference.

However, it has not been officially launched yet, and the official release date is set for June.

GPT-5.6 is rumored to be released later this month .

This adds another layer of tension to OpenAI's situation: its competitors have already posted their scores, while internally they may still be struggling with which version of the RC to submit.

But besides benchmark scores, pricing is also an important factor.

Fable 5 and Mythos 5 are priced at $10 per million input tokens and $50 per million output tokens.

It is approximately twice the size of the existing Opus.

If GPT-5.6 can match or even slightly outperform Mythos in terms of capabilities, but is significantly cheaper, it could potentially regain some ground in terms of actual adoption.

Currently, OpenAI has not made any official announcements. The real showdown will have to wait until the official release of GPT-5.6 and the head-to-head benchmark test between Fable and GPT-5.6.

The outcome will most likely be revealed this month, so stay tuned!

Reference link:

[1]https://x.com/mark_k/status/2063922897341567488?s=20

[2]https://x.com/AiBattle_/status/2064078302394917157?s=20

[3]https://x.com/pankajkumar_dev/status/2063272015214354908?s=20

[4]https://x.com/synthwavedd/status/2063245096951160865?s=20

[5]https://x.com/ChrissGPT/status/2063135842906808579?s=20

[6]https://x.com/koltregaskes/status/2062806155139912164?s=20

This article is from the WeChat public account "Quantum Bit" , author: Tingyu, and published with authorization from 36Kr.

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments