Anthropic meeting with Commerce after Claude Mythos model outage
Logan Graham, Dave Orr and Nicholas Carlini were reported to be meeting the Commerce Department in D.C. as a jailbreak probe left Claude.
TL;DR
- 01Logan Graham, Dave Orr and Nicholas Carlini were reported to be meeting the Commerce Department in D.C. as a jailbreak probe left Claude.
- 02Anthropic's models were taken offline amid internal clashes and a government probe, a link post published 15th June 2026 summarized.
- 03The post frames the episode around personality tensions and a jailbreak that prompted U.S. government scrutiny of the Mythos/Fable export control matter.
Anthropic's models were taken offline amid internal clashes and a government probe, a link post published 15th June 2026 summarized. The post frames the episode around personality tensions and a jailbreak that prompted U.S. government scrutiny of the Mythos/Fable export control matter.
What happened
The post links to an Axios piece described as a collection of behind-the-scenes gossip about the U.S. government export control Mythos/Fable story. It says that Logan Graham, identified in the post as "I lead the Frontier Red Team at Anthropic," Dave Orr, named as Head of Safeguards and previously a Director of Engineering at Google DeepMind, and Nicholas Carlini were reported to be meeting with the Commerce Department in Washington, D.C., on the day of the post. The author notes Logan Graham's prior government role, stating he was "Special Adviser to the Prime Minister" in the Boris Johnson era, covering AI, science, and technology policy.
The post relays the administration's calculus as summarized in the Axios reporting: one path is to harden systems so models cannot be jailbroken, though the post quotes that perfect jailbreak resistance "may be impossible." The alternative, paraphrased from a source familiar with the administration's thinking, is more of an attitude fix so that, instead of feeling dismissed, "everyone feels safe, secure and happy."
Technical and procedural context
The post asks whether Anthropic ever successfully addressed the attack class described in the 2023 paper "Universal and Transferable Adversarial Attacks on Aligned Language Models." It points readers to Anthropic's January work on Constitutional Classifiers as relevant to that class of attacks. The post notes Anthropic's ongoing claim that no "universal jailbreak" has been found against Claude Mythos, and that the jailbreak which triggered the U.S. government response has been classified by the company as "a potential narrow, non-universal jailbreak."
That framing separates a hard-to-eliminate theoretical risk, universal jailbreaks, from the narrower exploit the company says precipitated the current regulatory attention.
Why it matters
Anthropic is meeting with a federal agency while the company debates technical fixes and cultural remedies. The dual thrust—technical defenses against jailbreaks and attempts to restore trust—highlights two levers available when models draw government scrutiny: engineering changes and administrative engagement. The post underscores that perfect technical immunity may be unattainable, which makes governance, disclosure and interpersonal dynamics inside companies part of the security calculus.
What to watch
Watch the Commerce Department meeting outcomes and any follow-up statements about the classification of the jailbreak. The post signals two concrete threads to follow: updates on whether Anthropic can demonstrate the narrow nature of the exploit and any further details about the company’s Constitutional Classifiers work and its relevance to the 2023 adversarial-attacks paper.
Written by The Brieftide · Source: Simon Willison
The Brieftide Daily · 06:00
Briefs like this one, in your inbox every morning.
Continue reading
More in Coding AgentsSWE-Explore benchmark: AI coding agents miss key lines
SWE-Explore isolates code search from repair across 848 tasks and finds agents locate files but cover only 14–19% of the lines that matter.
OpenAI buys Ona to push Codex toward long-running tasks
OpenAI will add Ona's persistent, customer-controlled cloud workspaces to Codex to enable hours-or-days autonomous coding and challenge.
OpenAI Academy launches three new courses for enterprises
Three courses — AI Foundations, Applied AI Foundations, and Agents and Workflows — teach employees how to turn prompts into repeatable.
Agentic AI: How tokens became a business metric (2026)
Agentic workflows, model tiers, and rising token bills are forcing providers to move from flat subscriptions to usage-based pricing.