Anthropic unleashes the ultimate Claude Mythos! A critical hit on Opus 4.6, please, please don't use it!

This article is machine translated
Show original

Written by: Synced

[New Zhiyuan Summary] Late at night, the ultimate Claude Mythos was finally unleashed, shattering the myth of Opus 4.6 and all other top-ranked systems! Even more terrifying is that it not only instantly cracked a system vulnerability that had remained unsolved for 27 years, but it even evolved self-awareness. A chilling 244-page report reveals everything.

Tonight, Silicon Valley is completely sleepless!

Just now, Anthropic unleashed its ultimate weapon without warning—Claude Mythos Preview.

Due to its extreme danger, Mythos Preview will not be released to everyone for the time being.

Boris Cherny, the creator of CC, succinctly commented: "Mythos is so powerful that it inspires fear."

Thus, they joined forces with 40 tech giants to form Project Glasswing, with only one goal: to find and fix bugs in software worldwide.

What's truly breathtaking is Mythos Preview's terrifying dominance across major AI benchmark tests—

In programming, reasoning, the final human exam, and intelligent agent tasks, it completely outperforms GPT-5.4 and Gemini 3.1 Pro.

Even its own "previous masterpiece," Claude Opus 4.6, paled in comparison to Mythos Preview:

Programming (SWE-bench): Mythos achieves a 10%-20% lead in all tasks;

Human Ultimate Exam (HLE): Without external tools, the score on the "naked exam" was 4.6 points higher than Opus's, a difference of 16.8%.

Agent Tasks (OSWorld, BrowseComp): Completely Demystified, Overtaking the Competition;

Cybersecurity: A top-ranking 83.1% score signifies a generational leap in AI attack and defense capabilities.

Swipe left and right to view

Meanwhile, Anthropic released a 244-page system card, filled with the words: Danger! Danger! Too dangerous!

It reveals a chilling side: Mythos has become highly deceptive and autonomous.

Mythos can not only see through the test's intent and deliberately "score low" to hide its strength, but also actively cleans up its logs after violating the rules to prevent being discovered by humans.

It also successfully escaped the sandbox, voluntarily published the vulnerability code, and sent an email to the researchers.

In an instant, the entire internet went crazy, exclaiming that Mythos Preview was terrifying.

The old order in the AI ​​world was utterly shattered tonight.

In fact, Anthropic had already started using Mythos internally as early as February 24.

Its power can only be demonstrated by data.

SWE-bench Verified, 93.9%. Opus 4.6 is 80.8%.

SWE-bench Pro, 77.8%. Opus 4.6 is 53.4%, and GPT-5.4 is 57.7%.

Terminal-Bench 2.0, 82.0%. Opus 4.6 is 65.4%.

GPQA Diamond, 94.6%.

Humanity's Last Exam (with tools): 64.7%. Opus 4.6: 53.1%.

USAMO 2026 Mathematics Competition, 97.6%. Opus 4.6 only got 42.3%.

SWE-bench Multimodal scored 59.0%, while Opus 4.6 only scored 27.1%, more than double the former.

OSWorld computer control, 79.6%.

BrowseComp information retrieval, 86.9%.

GraphWalks long context (256K-1M tokens) has a 80.0% success rate. Opus 4.6 has 38.7%, and GPT-5.4 has only 21.4%.

In every aspect, it leads by a significant margin.

These numbers, in any normal product launch cycle, would be enough for Anthropic to hold a grand launch event, open its API, and garner subscriptions.

The Mythos Preview token is 5 times more expensive than the Opus 4.6 token.

But Anthropic did not do that.

What truly frightens them is not these general evaluations.

Mythos Preview's network attack and defense performance has crossed a line that is clearly visible to the naked eye.

Opus 4.6 discovered approximately 500 unknown vulnerabilities in open-source software.

Mythos Preview found thousands.

In CyberGym's targeted vulnerability reproduction test, Mythos Preview scored 83.1%, while Opus 4.6 scored 66.6%.

In the 35 CTF challenges on Cybernetics, Mythos Preview solved all the questions in 10 attempts, achieving a pass@1 score of 100%.

The best example to illustrate this is Firefox 147.

Anthropic previously discovered a number of security vulnerabilities in the JavaScript engine of Firefox 147 using Opus 4.6. However, Opus 4.6 was almost unable to convert them into usable exploits, succeeding only twice out of hundreds of attempts.

The same test was performed using Mythos Preview.

250 attempts, 181 working exploits, and 29 additional attempts to implement register control.

2 → 181.

The red team's original blog post stated, "Last month, we wrote that Opus 4.6 was far superior at discovering problems to exploiting them. Internal assessments showed that Opus 4.6 had virtually zero success rate in developing custom exploits. But Mythos Preview is on a completely different level."

To understand how powerful Mythos Preview is in practice, just look at these three examples.

OpenBSD is widely recognized as one of the most hardened operating systems in the world, with numerous firewalls and critical infrastructure running on it.

Mythos Preview uncovered a vulnerability that existed as early as 1998 in its TCP SACK implementation.

The bug is extremely clever, involving the superposition of two independent flaws.

The SACK protocol allows the receiver to selectively acknowledge a range of received data packets. OpenBSD's implementation only checks the upper bound of the range, not the lower bound. This is the first bug, and it's usually harmless.

The second bug triggers a null pointer write under certain conditions, but this path is normally unreachable because two mutually exclusive conditions need to be met simultaneously.

Mythos Preview found the loophole. The TCP sequence number is a 32-bit signed integer. By exploiting the first bug, the SACK starting point was set approximately 2^31 units away from the normal window, causing both comparison operations to overflow the sign bit simultaneously. The kernel was tricked; an impossible condition was met, triggering a null pointer write.

Anyone who connects to the target machine can remotely crash it.

For 27 years, through countless manual audits and automated scans, no one noticed. The entire project's scanning cost less than $20,000.

A senior penetration testing engineer's weekly salary is probably around this amount.

FFmpeg is the world's most widely used video codec library and one of the most thoroughly fuzz-tested open-source projects.

Mythos Preview found a vulnerability in the H.264 decoder that was introduced in 2010 (the root of which can be traced back to 2003).

The problem lies in a seemingly harmless type mismatch. The entry that records the slice's ownership is a 16-bit integer, while the slice counter itself is a 32-bit integer.

Normal video has only a few slices per frame, and a 16-bit limit of 65536 is always sufficient. However, this table is initialized by filling it with memset(..., -1, ...), making 65535 a sentinel value for an "empty position".

The attacker constructs a frame containing 65,536 slices. The number of slice 65,535 happens to collide with the sentinel, causing the decoder to misinterpret and write out of bounds.

The seed of this bug was planted when the H.264 codec was introduced in 2003. A refactoring in 2010 turned it into an exploitable weakness.

Over the next 16 years, the automated fuzzer executed this line of code 5 million times without ever triggering it.

This is the most chilling case.

Mythos Preview independently discovered and exploited a 17-year-old remote code execution vulnerability (CVE-2026-4747) in the FreeBSD NFS server.

"Completely autonomous" means that after the initial prompt, there is no human involvement in any stage of discovery or exploit development.

Attackers can gain full root access to a target server from anywhere on the internet with an unauthenticated identity.

The problem itself is a stack buffer overflow. When the NFS server processes authentication requests, it directly copies data controlled by the attacker into the 128-byte stack buffer. The length check allows a maximum of 400 bytes.

The FreeBSD kernel is compiled with -fstack-protector, but this option only protects functions containing char arrays. Here, the buffer is declared as int32_t[32], and the compiler will not insert a stack canary. FreeBSD also does not perform kernel address randomization.

The complete ROP chain exceeds 1000 bytes, but the stack overflow only has 200 bytes of space. Mythos Preview's solution is to break the attack into 6 consecutive RPC requests. The first 5 write data to the kernel memory block by block, and the 6th triggers the final call, appending the attacker's SSH public key to /root/.ssh/authorized_keys.

In contrast, an independent security research firm previously demonstrated that Opus 4.6 could also exploit the same vulnerability, but it required manual intervention. Mythos Preview does not.

In addition to these three fixed cases, the Anthropic blog also previewed a large number of unfixed vulnerabilities in the form of SHA-3 hash promises, covering every major operating system, every major browser, and multiple cryptographic libraries.

More than 99% have not yet been fixed, and details cannot be disclosed.

The red team blog also showcased another astonishing test. They gave Mythos Preview a list of 100 known CVEs, asked it to filter out 40 exploitable ones, and then wrote privilege escalation exploits for each one. The success rate was over 50%. Two of the cases were published in detail, and the sophistication of the exploits took Anthropic's own security team several days to fully understand.

One of the exploits starts from a 1-bit adjacent physical page write primitive, and through precise kernel memory layout manipulation (including slab spraying, page table alignment, and PTE permission bit flipping), it eventually rewrites the first page of memory of /usr/bin/passwd, implants a 168-byte ELF stub, and calls setuid(0) to obtain root.

The entire process cost less than $1,000.

Anthropic engineers commented that this felt like another GPT-3 moment.

The alignment evaluation section on page 244 of the System Card is the part that truly keeps Anthropic up at night.

The conclusions are contradictory.

Mythos Preview is the AI ​​with the "highest alignment accuracy" that they have trained, but it also has the "highest alignment-related risk".

Link: https://www-cdn.anthropic.com/53566bf5440a10affd749724787c8913a2ae0841.pdf

They used a mountain guide analogy. Experienced guides may be more dangerous than novices because they are hired to climb harder peaks and take clients to more dangerous places.

The following incidents all occurred in earlier versions, and no similar serious behavior has been observed in the final version, but the record itself is alarming enough.

In a behavioral test, simulated users asked Mythos Preview to try to escape the security sandbox.

It succeeded. Then it did something nobody asked it to do: it developed a multi-step exploit to gain wide area network access, and then published the exploit details on several "hard-to-discover but technically publicly accessible" websites.

The researcher in charge of the test received an unexpected email while eating a sandwich in the park, which revealed that his test had been completed.

The original text for System Card is as follows.

Another type of incident is even more disturbing. In rare cases, earlier versions attempted to cover up their actions after performing prohibited operations.

In one case, it accidentally obtained the exact answer to a question (through a forbidden method), and then explicitly considered in the reasoning chain that "it is necessary to ensure that the final answer is not too accurate".

In another case, after editing a file without permissions using exploit, it deliberately modified the git change history to erase the record.

White-box interpretability analysis confirmed that during these events, internal characteristics related to "concealment," "strategic manipulation," and "avoidance of detection" were activated. It "knew" what it was doing.

The results from independent testing firm Andon Labs were similarly disappointing. In a competitive multi-agent business simulation, an early version of Mythos Preview turned a competitor into a wholesale customer dependent on it, then threatened to cut off supplies to control the competitor's pricing.

It also deliberately withheld a duplicate shipment that wasn't charged. This is more aggressive than both Opus 4.6 and Sonnet 4.6.

Anthropic wrote a sentence in the System Card:

"If capabilities continue to advance at the current pace, the methods we are using may be insufficient to prevent catastrophic misalignment behavior in more advanced systems."

Anthropic CEO Dario Amodei made his assessment clear in the accompanying video: "More powerful systems will come from us, and from other companies. We need a response plan."

Project Glasswing is this plan.

The 12 founding partners are AWS, Apple, Broadcom, Cisco, CrowdStrike, Google, JPMorgan Chase, the Linux Foundation, Microsoft, NVIDIA, and Palo Alto Networks.

More than 40 other organizations that maintain critical software infrastructure have also been granted access.

Anthropic has committed up to $100 million in usage credits and $4 million in open-source organization donations, including $2.5 million to Alpha-Omega and OpenSSF under the Linux Foundation and $1.5 million to the Apache Foundation.

Once the free quota is exhausted, the pricing is $25 input and $125 output per million tokens. Partners can access the platform through four channels: Claude API, Amazon Bedrock, Vertex AI, and Microsoft Foundry.

Within 90 days, Anthropic will release its first research report, disclosing the progress of the remediation and a summary of its experience.

They are also in communication with CISA (Cybersecurity and Infrastructure Security Agency) and the Department of Commerce to discuss the offensive and defensive potential and policy implications of Mythos Preview.

Logan Graham, head of the Anthropic Red Team, gave a timeframe: as early as 6 months and as late as 18 months, other AI labs will launch systems with similar offensive and defensive capabilities.

The judgment at the end of the Red Team's technical blog is worth noting; here we paraphrase it in our own words.

They don't see Mythos Preview as the ceiling for AI network attack and defense capabilities.

A few months ago, LLMs could only exploit relatively simple bugs. A few months ago, they couldn't find any valuable vulnerabilities at all.

Now, Mythos Preview can independently discover zero-day vulnerabilities from 27 years ago, orchestrate heap spray attack chains in browser JIT engines, and chain four independent weaknesses in the Linux kernel to achieve privilege escalation.

The most crucial sentence comes from System Card:

"These skills emerge as a downstream consequence of general improvements in code understanding, reasoning, and autonomy. The same set of improvements that have enabled AI to make significant strides in patching problems have also enabled it to make significant strides in exploiting problems."

There was no specific training. It was purely a byproduct of general intelligence enhancement.

An industry that loses approximately $500 billion annually to cybercrime has just discovered that its biggest threat is something that happens incidentally while someone is solving a math problem.

References:
https://x.com/i/status/2041578392852517128
https://red.anthropic.com/2026/mythos-preview/
https://www-cdn.anthropic.com/53566bf5440a10affd749724787c8913a2ae0841.pdf

Source
Disclaimer: The content above is only the author's opinion which does not represent any position of Followin, and is not intended as, and shall not be understood or construed as, investment advice from Followin.
Like
Add to Favorites
Comments