Coding Agents4 min read

Anthropic meeting with Commerce after Claude Mythos model outage

Logan Graham, Dave Orr and Nicholas Carlini were reported to be meeting the Commerce Department in D.C. as a jailbreak probe left Claude.

The Brieftide

TL;DR

  • 01Logan Graham, Dave Orr and Nicholas Carlini were reported to be meeting the Commerce Department in D.C. as a jailbreak probe left Claude.
  • 02Anthropic's models were taken offline amid internal clashes and a government probe, a link post published 15th June 2026 summarized.
  • 03The post frames the episode around personality tensions and a jailbreak that prompted U.S. government scrutiny of the Mythos/Fable export control matter.

Anthropic's models were taken offline amid internal clashes and a government probe, a link post published 15th June 2026 summarized. The post frames the episode around personality tensions and a jailbreak that prompted U.S. government scrutiny of the Mythos/Fable export control matter.

What happened

The post links to an Axios piece described as a collection of behind-the-scenes gossip about the U.S. government export control Mythos/Fable story. It says that Logan Graham, identified in the post as "I lead the Frontier Red Team at Anthropic," Dave Orr, named as Head of Safeguards and previously a Director of Engineering at Google DeepMind, and Nicholas Carlini were reported to be meeting with the Commerce Department in Washington, D.C., on the day of the post. The author notes Logan Graham's prior government role, stating he was "Special Adviser to the Prime Minister" in the Boris Johnson era, covering AI, science, and technology policy.

The post relays the administration's calculus as summarized in the Axios reporting: one path is to harden systems so models cannot be jailbroken, though the post quotes that perfect jailbreak resistance "may be impossible." The alternative, paraphrased from a source familiar with the administration's thinking, is more of an attitude fix so that, instead of feeling dismissed, "everyone feels safe, secure and happy."

Technical and procedural context

The post asks whether Anthropic ever successfully addressed the attack class described in the 2023 paper "Universal and Transferable Adversarial Attacks on Aligned Language Models." It points readers to Anthropic's January work on Constitutional Classifiers as relevant to that class of attacks. The post notes Anthropic's ongoing claim that no "universal jailbreak" has been found against Claude Mythos, and that the jailbreak which triggered the U.S. government response has been classified by the company as "a potential narrow, non-universal jailbreak."

That framing separates a hard-to-eliminate theoretical risk, universal jailbreaks, from the narrower exploit the company says precipitated the current regulatory attention.

Why it matters

Anthropic is meeting with a federal agency while the company debates technical fixes and cultural remedies. The dual thrust—technical defenses against jailbreaks and attempts to restore trust—highlights two levers available when models draw government scrutiny: engineering changes and administrative engagement. The post underscores that perfect technical immunity may be unattainable, which makes governance, disclosure and interpersonal dynamics inside companies part of the security calculus.

What to watch

Watch the Commerce Department meeting outcomes and any follow-up statements about the classification of the jailbreak. The post signals two concrete threads to follow: updates on whether Anthropic can demonstrate the narrow nature of the exploit and any further details about the company’s Constitutional Classifiers work and its relevance to the 2023 adversarial-attacks paper.

Advertisement

Written by The Brieftide · Source: Simon Willison

The Brieftide Daily · 06:00

Briefs like this one, in your inbox every morning.

 

FreeOne email a dayEvery claim sourcedUnsubscribe in one click
Advertisement