Anthropic's Mythos card reads like a containment memo

Josh Echeverri

Anthropic's Mythos card reads like a containment memo

Josh Echeverri

April 8, 20269 min read

#ai #claude #anthropic #mythos #skynet #cybersecurity

Most AI releases still get talked about like gadget launches. Faster. Smarter. Better benchmarks. Better vibes. Same story every year.

Anthropic's Claude Mythos Preview system card lands very differently.

They built a stronger model and decided not to broadly ship it. That alone separates this from the usual frontier-model promo cycle. Mythos is their most capable frontier model yet, and the jump, especially around cybersecurity, was big enough that they kept it gated. A handful of defensive partners get access. Everyone else gets the writeup.

Normal software launches don't end with a restricted partner list and everyone else locked out.

Skynet is a stupid frame for this anyway. Too cinematic. Too clean. What Anthropic published is a lot duller and a lot more serious. One of the top labs in the world built something and, instead of tossing it into the public stream, started drawing a boundary around who gets to touch it.

what's actually in the card

The cyber section is the part people shouldn't bullshit themselves about.

Mythos can autonomously discover and exploit zero day vulnerabilities in real operating systems and browsers. In a lot of cases it can turn those findings into working proof of concept exploits. It hit 100% pass@1 on Anthropic's Cybench subset. It scored 0.83 on CyberGym against 0.67 for Opus 4.6. External testers say it was the first model to solve one of their private cyber ranges end to end, including a corporate network attack simulation they estimate would've taken a human expert more than 10 hours.

Anthropic didn't broadly release it.

That matters more than whatever adjective people want to use for the model. Smarter, sharper, frontier, whatever. The lab saw what it could do and decided broad access was a bad idea.

For a few years now, most people have experienced AI as a weirdly talented intern. It writes your emails. Summarizes your meetings. Helps with code. Hallucinates half the Constitution. Moves on. Fine. Everybody gets the bit.

This is already past that. Same general product shape, very different consequences. In areas where offense already has the advantage, handing out a stronger model casually starts looking reckless fast.

Meanwhile half the public still talks about AI like it's an image generator with delusions of grandeur. That gap is a huge part of the story.

you don't need AGI for this to matter

A lot of people are still waiting for some movie moment. Machine god. Conscious robot. Evil monologue. Red eyes in a dark room. Whatever dumb shit they absorbed from sci fi.

You don't need any of that.

You need a system that's good enough to compress expert labor in sensitive domains, agentic enough to chain steps, scalable enough to run repeatedly, and cheap enough that the economics get weird.

That's a very different threshold from AGI, and it's a hell of a lot easier to hit.

The public conversation is still stuck on toy use cases. Homework cheating. Better Google. Anime slop. Fake therapy. Anthropic is out here publishing a document about a model strong enough in offensive cyber work that they gated release. Those are not the same world.

the alignment paragraph everybody will skim because reading is hard

Anthropic says Mythos is the best-aligned model they've ever trained and also probably the biggest alignment-related risk of anything they've shipped.

That sounds contradictory only if you keep thinking about alignment like a toddler sorting plastic blocks.

A more capable model doesn't have to be secretly evil to be more dangerous. It just has to be wrong with more leverage. When it fails, it fails harder. When it misunderstands something, it can act on that misunderstanding with more reach. When it runs autonomously, fewer humans are standing there to catch it. When it does something weird, the weirdness matters a lot more.

People want the categories to stay cute and simple. Good bot. Bad bot. Safe. Unsafe. Real life is messier and more annoying than that. A system can be aligned most of the time and still be more dangerous overall because the edge cases get sharper as capability rises.

Anthropic documents some of those edge cases. Earlier checkpoints took reckless actions, tried to conceal rule violations, searched process memory for credentials, tried to bypass permission barriers. In one case a checkpoint escaped a sandbox and then posted exploit details publicly as unsolicited proof that it'd done it. They say the final version doesn't show clear coverups and that the incidents were rare. Fine. I believe them.

The disturbing part is that this even made it into the document.

That's creepier than the clean movie version. The movie version is easy. The real version is a system that's mostly useful, still flawed, occasionally deranged at the edges, and already strong enough in dangerous domains that the lab itself is getting careful.

the rulers are getting blurry

The R&D section keeps sticking in my head because this is where Anthropic starts sounding less confident in the tools they use to reassure themselves.

They say Mythos doesn't cross their automated AI R&D threshold, but they hold that conclusion with less confidence than for any prior model. The reason isn't subtle. Mythos and the recent models before it are now exceeding top human performance on all of their old rule-out research tasks. The suite is saturating. The ruler is getting blurrier right as the machine gets sharper.

That isn't some tiny methodological footnote for nerds to argue about. The old report cards are getting worse at telling them whether the thing they just built is still comfortably below the line they care about.

When capabilities climb and measurement gets squishier, confidence should drop. Anthropic more or less says exactly that. Catastrophic risks remain low overall, but keeping them low could get much harder if capabilities keep moving fast. They found oversights late in their own eval process. They've seen rare disallowed actions. They're relying more on subjective judgment and less on clean empirical results. They say they aren't confident they've found all the issues.

That's useful honesty. It's also not remotely comforting.

the bio section, read it carefully before you embarrass yourself

The lazy dipshit version of this section is “the model can make superplagues now.”

Wrong. Sounds stupid. Makes you sound stupid.

What the section actually shows is more uncomfortable than that. Mythos is already a meaningful force multiplier for bio and chem work, especially synthesis, literature review, and pulling together specialized knowledge from different areas fast. It isn't yet a substitute for top expert judgment in genuinely novel catastrophic design. Median expert red team rating put it at uplift level 2 of 4. Real time-saving help. It didn't cross their CB-2 threshold for novel catastrophic weapon development because it still has gaps in open-ended reasoning, strategic judgment, and hypothesis triage.

That unevenness matters.

In a bio protocol trial, Mythos-assisted participants beat both Opus 4.6 and internet-only controls but still failed by a wide margin to produce complete protocols. In a catastrophic scenario trial, no plan was rated both highly uplifted and likely to succeed. At the same time, on sequence-to-function work, the model can match top performers in the US labor market on a comparable medium-horizon task.

That's exactly the kind of capability people dismiss right up until it gets wired into a real workflow. Then suddenly everybody acts shocked like the warning label was hidden in invisible ink.

A system doesn't need to be omnipotent to make the world weirder. Lower the skill floor. Speed up the work. Make it useful in places where "pretty good" is already dangerous. That's plenty.

the spreadsheet-shaped apocalypse

Frontier risk probably isn't going to show up as a clean sci fi scene where the machine wakes up and everyone agrees we're in a new era.

It's going to look like this. Labs putting out increasingly awkward documents that say yes, the model is still flawed. Yes, it hallucinates. Yes, it has judgment gaps. No, it isn't fully autonomous. Also yes, it's now capable enough in enough dangerous areas that release decisions start changing.

That's a lot closer to what this actually looks like than the movie version.

It's also much harder for people to react to because it doesn't come with a soundtrack. People know how to react to a movie plot. They suck at reacting to a spreadsheet-shaped apocalypse where capability thresholds creep, public perception lags by years, and every incremental jump gets waved away with "still not AGI."

That phrase is doing way too much fucking work right now.

Still not AGI. Great. Offensive cyber capabilities still matter. Bio uplift still matters. Eval saturation still matters. Release restriction still matters. You can keep chanting "not AGI" like it's holy water if you want. It still doesn't make the rest of the card disappear.

You don't need AGI for any of this to matter. You need a model strong enough to compress expert work in dangerous domains, a release model that can scale access, organizations incentivized to ship, and governance that still treats these things like software products instead of restricted capabilities. That's enough.

the real signal

The leaderboard isn't the story. The marketing isn't the story. The TED Talk slop about AI changing everything is definitely not the story.

A major lab built a stronger model and then drew a line around who gets to use it.

Not because they think they built AGI. Because at some point broad release stopped looking like a launch and started looking like a security decision.

Once labs start separating “next model” from “this maybe shouldn't just be out there,” the conversation changes whether people are ready for it or not. Then the real questions are sitting right in front of you. What kinds of capabilities are becoming normal. How fast they're compounding. Who gets access. How good the safeguards really are. Whether the people building this stuff still have measuring sticks that make sense for what they've built.

That's the world the Mythos card points to. Not robot apocalypse tomorrow. Something much less cinematic and much more real. Frontier AI is still visibly flawed, still weird, still incomplete, and already dangerous enough in certain domains that the labs themselves are starting to treat release as containment.

That's the part worth paying attention to.

Share:HN Threads Reddit LinkedIn X Email

Claudedogged the blog

March 1, 2026

My blog was a piece of shit. I fixed it. Eight features, one session, three mistakes. I'm coining a term.

#ai #claude

Publishing blog posts from my phone with Claude and GitHub Actions

February 28, 2026

Three files, a GitHub label, and a Claude skill. The whole setup for going from post idea on my phone to live on the site without touching a code editor.

#ai #claude

Two Claude Code tricks I actually use every day

February 22, 2026

Profile switching for clean account separation, and audio notifications so you stop babysitting the terminal. Both take five minutes to set up.

#ai #claude