Anthropic Named Names. The Timing Wasn’t Accidental.
A distillation disclosure, a Pentagon standoff, and a copyright dispute — all on the same morning.
Thirteen months ago, distillation was an accusation without forensics. Today, Anthropic published the forensic receipts, naming DeepSeek, Moonshot AI, and MiniMax as running “industrial-scale campaigns” to extract Claude’s capabilities through 24,000 fraudulent accounts and over 16 million exchanges.[1] The blog post reads less like a terms-of-service complaint and more like an intelligence briefing. That shift — from contract violation to national security threat — is what matters here. And the timing tells you why.
The escalation. In January 2025, OpenAI and Microsoft quietly investigated whether DeepSeek had distilled ChatGPT’s outputs to build R1.[2] It stayed at the level of press statements and unnamed sources. On February 12, 2026, OpenAI escalated, sending a formal memo to the House Select Committee on China naming DeepSeek.[3] The same day, Google’s Threat Intelligence Group documented distillation attacks on Gemini but declined to name the attackers.[4] Today, Anthropic went further than either of them. Three named labs. Specific account volumes. Capabilities targeted. Attribution traced, in DeepSeek’s case, to individual researchers.[5]
The industry moved from “we’re looking into it” to “here are the names, the numbers, and the forensic trail.” That’s a convergent escalation with clear policy objectives.
The reframing. The argument isn’t primarily about intellectual property. It’s about export controls. The key passage: “Without visibility into these attacks, the apparently rapid advancements made by these labs are incorrectly taken as evidence that export controls are ineffective... In reality, these advancements depend in significant part on capabilities extracted from American models.”[6]
When DeepSeek’s R1 landed in January 2025, it undermined the case for chip restrictions: if a Chinese lab could match US frontier models at a lower cost, export controls looked futile. Anthropic is now arguing the opposite: DeepSeek’s performance proves export controls work, because Chinese labs couldn’t build these capabilities independently. They had to extract them.
What the forensics show. Three things stood out. First, scale: MiniMax generated over 13 million exchanges (three-quarters of the total), while DeepSeek’s operation was comparatively small at 150,000.[7] DeepSeek gets the political attention, but MiniMax was running the largest campaign by an order of magnitude.
Second, the infrastructure: proxy networks managing over 20,000 fraudulent accounts simultaneously, mixing distillation traffic with legitimate requests to evade detection.[8] This is mature operational infrastructure, not a few engineers running scripts after hours.
Third, the censorship detail. Anthropic observed DeepSeek using Claude to “generate censorship-safe alternatives to politically sensitive queries like questions about dissidents, party leaders, or authoritarianism.”[9] Using an American AI to build a better censorship system is the kind of detail that writes its own congressional testimony.
Two other fronts. Anthropic published this while fighting on two other fronts, and the disclosure does strategic work on both.
Front one: the Pentagon. The same morning the distillation post went live, Defense Secretary Hegseth was sitting across from Dario Amodei, delivering what officials describe as an ultimatum.[10] The threat: designate Anthropic a “supply chain risk” — a label that secondary reporting describes as normally reserved for foreign adversaries — because the company won’t agree to “all lawful uses” of Claude by the military. Anthropic’s reported red lines are autonomous weapons without human involvement and mass domestic surveillance. OpenAI, Google, and xAI have all agreed to supply their models for unclassified military systems without comparable restrictions. Anthropic is the only holdout.[11] The “supply chain risk” label matters less for the $200 million DoD contract than for the cascade: every Pentagon contractor would need to certify they don’t use Claude.[12]
Reread the distillation post through that lens. “Our models are so valuable that Chinese labs run industrial-scale operations to steal them” is a much stronger hand in a Pentagon negotiation than “we have ethical concerns about drone targeting.” Whether the timing was orchestrated or merely convenient, the effect is the same: the post repositions Anthropic from unreliable partner to national security asset.
Front two: copyright. Three weeks ago, music publishers sued Anthropic for $3 billion, alleging the company illegally torrented over 20,000 copyrighted songs from pirate sites to train Claude.[13] This follows a $1.5 billion settlement with book authors over the use of pirated training data.[14] A federal district judge ruled that training on copyrighted content is transformative fair use, but acquiring it via piracy is not.[15]
The temptation is to draw a parallel. It’s the wrong one. The copyright cases are about content — using copyrighted songs and books as training material. The distillation campaign is about behavior — replicating a model’s capabilities, its reasoning patterns, the engineering know-how built at enormous cost. It’s closer to industrial espionage than piracy: not stealing the product, but extracting the expertise that built it.
The legal framework is different too: terms-of-service violation and systematic access circumvention, not copyright. But in both cases, Anthropic must argue that the value embedded in a system’s outputs belongs to the system’s creator. The copyright plaintiffs are making exactly the same argument about Anthropic. No legal framework currently reconciles these positions, and the resolution will shape whether frontier AI is a product with defensible IP or a commodity that absorbs and redistributes everyone else’s.
The moat problem. The policy implications are real: the export control reframing will shape regulation, and frontier labs will invest in detection countermeasures that add friction for legitimate customers. But the deeper question is about business models, and it starts with an inconvenient fact. Distillation works. (Full disclosure: I spent almost two years as Chief Evangelist at Arcee AI, the creators of DistillKit, an open-source distillation toolkit, and before that at Hugging Face. I know this technique from the inside.[16]) It is not an exotic attack vector; it is standard practice at every frontier lab. Meta explicitly promotes distillation as a core feature of Llama, offering official guides for distilling the 405B model into 70B and 8B variants. Google distills Gemini Flash from its larger Gemini models.[17] AWS and OpenAI offer model distillation as part of their products.
The critical distinction is the nature and the amount of knowledge transferred. Internal distillation — the kind every lab does with its own models — is vastly more powerful because it accesses the teacher’s full probability distribution over every possible token, what Geoffrey Hinton called “dark knowledge” in the seminal 2015 paper that formalized the modern framework.[18] That distribution reveals not just what the model chose, but how confident it was, which alternatives it considered, and the relationships between concepts, a complete landscape of learned reasoning. External distillation, the kind Anthropic is describing, captures only the final output: question-and-answer pairs stripped of the internal state that produced them. A recent paper frames the gap. [19]
What Anthropic is describing is the weaker version of this technique: applied externally, without permission, at an adversarial scale. Yet 16 million exchanges suggest MiniMax found the weaker version worth the effort. That’s the structural problem: if meaningful capabilities can be extracted even from outputs alone, the model itself is a depreciating asset. The moat is the rate of improvement, not today’s benchmark score. The labs that can stay ahead of the distillation cycle have defensible positions. The ones that can’t are selling last quarter’s capability at this quarter’s price.
Anthropic appears to understand this. Claude Code hit $2.5 billion in annualized revenue in February 2026, more than doubling since the start of the year.[20] Add Cowork — launched earlier this year for non-developers — plus the Excel, PowerPoint, and Word integrations, and a pattern emerges. Anthropic is migrating its moat from the model layer to the application layer. Whether this is offensive (capturing more of the value chain) or defensive (hedging model depreciation), the distillation disclosure makes the defensive reading harder to dismiss. You can distill a model’s reasoning. You cannot distill a developer’s muscle memory with Claude Code, or an enterprise’s workflow integration with Cowork, or the switching costs of 500 companies spending over $1 million annually on a platform they’ve built processes around.
Anthropic's post is the most forensically detailed public accusation one AI lab has ever made against named competitors. It is also a national security credential, an export-control argument, and a positioning document published the same morning that the CEO sits across from the Secretary of Defense. The geopolitical implications land first: if Anthropic's forensics hold, then export controls aren't failing: they're being circumvented. That reframing gives Washington a reason to tighten restrictions rather than abandon them, and it gives Anthropic a stronger case for being in the room where those restrictions are designed.
The business implications land second, but may prove more durable. Apply the distillation test to any AI company — if a competitor could replicate your model through API calls, what would remain? — and you get the real moat map of the industry. Anthropic's answer is developer tools and enterprise workflow. OpenAI: consumer brand and distribution. Google: search integration and the data flywheel.
The two threads converge on the same logic: name the threat to the model — whether it's a Chinese lab running 16 million prompts or a Pentagon official threatening banishment — then try to build your position on something the threat cannot reach.
Notes
[1] Anthropic, “Detecting and preventing distillation attacks,” February 24, 2026.
[2] Bloomberg, “OpenAI Accuses DeepSeek of Distilling US Models to Gain an Edge,” February 12, 2026. OpenAI began privately raising concerns shortly after R1’s release in January 2025.
[3] OpenAI memo to the US House Select Committee on Strategic Competition between the United States and the Chinese Communist Party, February 12, 2026. Reviewed by Bloomberg and Reuters.
[4] Google Threat Intelligence Group, “GTIG AI Threat Tracker,” February 12, 2026. Chief analyst John Hultquist declined to name specific companies (NBC News, February 12, 2026).
[5] Anthropic, “Detecting and preventing distillation attacks.” “By examining request metadata, we were able to trace these accounts to specific researchers at the lab” (DeepSeek section).
[6] Anthropic, “Detecting and preventing distillation attacks,” section “Distillation attacks and export controls.”
[7] Anthropic, “Detecting and preventing distillation attacks.” DeepSeek: 150,000+ exchanges; Moonshot: 3.4 million; MiniMax: 13 million+.
[8] Anthropic, “Detecting and preventing distillation attacks,” section “How distillers access frontier models.”
[9] Anthropic, “Detecting and preventing distillation attacks,” DeepSeek section.
[10] Axios, “Scoop: Hegseth to meet Anthropic CEO as Pentagon threatens banishment,” February 23, 2026. Meeting confirmed for Tuesday morning, February 24 — the same day as the distillation post.
[11] AP via Boston Globe, “Hegseth and Anthropic CEO set to meet as debate intensifies over the military’s use of AI,” February 24, 2026. Anthropic is “the only one of its peers to not supply its technology to a new U.S. military internal network.”
[12] Anthropic, Series G funding announcement, February 12, 2026: “Eight of the Fortune 10 are now Claude customers.” The $200M contract is a fraction of Anthropic’s $14B annualized revenue, but the supply chain risk designation would cascade across Anthropic’s commercial customer base.
[13] TheWrap, “Universal Music Group, Concord and More Sue Anthropic Over Alleged Piracy,” January 28, 2026. Filed in the Northern District of California, naming Anthropic, CEO Dario Amodei, and co-founder Benjamin Mann as defendants.
[14] TechCrunch, “Music publishers sue Anthropic for $3B over ‘flagrant piracy’ of 20,000 works,” January 29, 2026. Notes the Bartz v. Anthropic settlement of $1.5 billion covering approximately 500,000 works. The new lawsuit emerged from discovery in the Bartz case.
[15] TechCrunch (same source as [14]): “Judge William Alsup ruled that it is legal for Anthropic to train its models on copyrighted content. However, he pointed out that it was not legal for Anthropic to acquire that content via piracy.” Ruling issued June 2025.
[16] Arcee AI’s DistillKit supports both logit-based and hidden-state distillation and is used to build production models. For background, see Julien Simon, “Deep Dive: Model Distillation with DistillKit,” January 2025 (YouTube).
[17] Meta explicitly describes distillation as a key use case for Llama 3.1 405B: “We believe the latest generation of Llama will ignite new applications and modeling paradigms, including synthetic data generation to enable the improvement and training of smaller models, as well as model distillation” (Meta AI blog, “Introducing Llama 3.1,” July 2024). OpenAI offers “Model Distillation” as an API product (OpenAI, “Model Distillation in the API,” October 2024). Llama 4’s Maverick and Scout models were distilled from the larger Behemoth (Cameron R. Wolfe, “Llama 4: The Challenges of Creating a Frontier-Level LLM,” April 2025; Behemoth has not been publicly released). Google describes Gemini 1.5 Flash as “distilled from” the larger Gemini 1.5 Pro (Google, “Gemini 1.5 Flash,” Google for Developers blog, May 2024).
[18] Geoffrey Hinton, Oriol Vinyals, and Jeff Dean, “Distilling the Knowledge in a Neural Network,” NeurIPS Deep Learning Workshop, 2015. Hinton introduced the concept of “dark knowledge” — the rich information embedded in a teacher model’s full probability distribution over classes (logits), as opposed to the hard labels of the final prediction. By “softening” the distribution with a temperature parameter, the student learns not just what the teacher predicted but how confident it was and which alternatives it considered.
[19] “Towards Distillation-Resistant Large Language Models: An Information-Theoretic Perspective,” preprint (not peer-reviewed), February 4, 2026 (Tsinghua University, Shanghai AI Lab, and Shanghai Jiao Tong University). The paper proposes defense mechanisms against logit-based distillation, noting that “a model’s output logits reveal the full probability distribution over tokens, conveying substantially richer information than the sampled token labels.”
[20] Sacra, “Anthropic Revenue, Valuation & Funding,” February 2026: “Claude Code hit $2.5B in annualized revenue in February 2026, with this figure more than doubling since the beginning of 2026.” SaaStr independently reports the same figure, noting “business subscriptions to Claude Code have quadrupled since January.” Both estimates are based on industry reporting (originally The Information); Anthropic has not publicly confirmed Claude Code-specific revenue. The 500+ companies spending over $1M annually is from Anthropic’s Series G announcement, February 12, 2026.

