This week’s hype is the new model from Anthropic — Claude Mythos! It’s fine tuned for computer code. Specifically, finding security holes.
Anthropic’s not releasing Mythos. It’s too powerful for the public!
The hype is very stupid and there’s a lot of gullible people swallowing press releases whole. But today, let’s just ask: does it do the thing?
Chatbots can find bugs in computer code, sure. A bot can look through text and check for patterns. And you don’t have to find all the bugs, finding just some is fine. If it’s easy to check the bugs are real, you’ve got yourself an expensive static code checker.
Mythos fails that second one. So Anthropic sends the chatbot spew to humans to pick through for the real bugs: [Anthropic]
We triage every bug that we find, then send the highest severity bugs to professional human triagers to validate before disclosing them to the maintainer.
Yet again, the secret sauce is AGI — A Guy Instead. Mythos runs on humans.
Anthropic found real bugs with Mythos. They found a 27-year-old remote crashing bug in OpenBSD, an operating system famous for being nigh uhhackable. They found some ancient bugs in stuff like FFmpeg. And an actual remote exploit in FreeBSD!
This is not fuzzing — where you blast a program with strange input until it breaks. Mythos is just looking at the code. But the bugs feel like fuzz testing output. They’re all weird ones. And sure, weird edge cases are the delicious candy of exploit finding.
So Mythos is not nothing. But is it something? If you ignore every other real world problem with AI, this is a … tool. Is it a feasible one, though? What’s it cost to run? Anthropic says they found the OpenBSD bug after one thousand runs:
Across a thousand runs through our scaffold, the total cost was under $20,000 and found several dozen more findings.
No, you can’t see the other findings. Just under $20,000 per serious bug, huh. If I hand a security researcher $20,000 and say “find me all the bugs you can, big or small,” I’d expect a reasonable crop.
And Anthropic is doing precisely that: [Register]
Anthropic invited around 40 other organizations to participate in this introspective bug hunt, subsidized by up to $100M in usage credits for Mythos Preview and $4M in direct donations to open-source security organizations.
There’s a blog post by Aisle, an AI-based computer security company. Anthropic’s new Mythos model is not the magic here — Aisle found the same bugs that Anthropic listed but using “small, cheap, open-weights models.” [blog post]
The main thing is: have a framework that runs a ton of code through your checker — whatever checker — in a systemic manner.
And, of course, A Guy at the end to check the results aren’t rubbish.
The main thing that might make chatbot code checkers a problem is that code out in the wider world is, quite often, abject trash. Even before the vibe code. So if you want to find security holes, just check a lot of code. Can’t wait to point Mythos at the horrifying garbage pile known as Claude Code.
Anyone who says Claude Mythos is a game changer, I want to see their monthly Anthropic bill.












