AI's Extraction Problem: Distillation and Data Moats

There is a word in AI called distillation. A smaller model learns by studying the outputs of a larger, more capable one. The student queries the teacher at scale. The teacher’s answers become training data. The smaller model achieves comparable performance at a fraction of the cost.

This is not controversial. Every major AI lab does it to their own models. It is how frontier systems get compressed into products affordable enough to deploy.

This week, Anthropic published a detailed account of what it described as “industrial-scale” distillation campaigns against Claude by three Chinese AI labs: DeepSeek, Moonshot, and MiniMax. Over 16 million exchanges through roughly 24,000 fraudulent accounts. OpenAI has made similar claims. Google has reported the same pattern against Gemini.

Training, or theft?

Here is the part worth sitting with. When AI companies train on massive volumes of publicly available text, images, and code created by millions of people who never consented to the arrangement, that process is called training.

When a competitor systematically queries a model to extract its capabilities, that process is called theft. Or attack.

The mechanisms are different. The legal frameworks are different. But the structural relationship is similar: one party builds capability by extracting value from another party’s work at far lower marginal cost, without the original party participating in the terms.

Anthropic settled a copyright lawsuit last September for $1.5 billion after a federal judge found the company had downloaded millions of books from pirate libraries to train Claude. The judge ruled that AI training on legally acquired content was fair use. The pirated copies were not. So the same company that just accused competitors of illicit extraction also paid the largest copyright settlement in U.S. history for its own extraction practices.

I am not saying these are equivalent. They are not. Terms of service violations, national security framing, and copyright infringement are different legal categories with different implications. But the structure repeats.

Someone builds something of value.
Someone else extracts that value at scale, at a fraction of the original cost.
The extraction gets reframed as innovation, standard practice, or strategic necessity.
Governance catches up later, if it catches up at all.

The pattern doesn’t stop at model compression

That pattern does not stop at model compression. Follow the chain.

The universal extraction pattern: creation, extraction, reframing, governance lag — The universal pattern: value is built, extracted at low cost, reframed as standard practice, and governance lags

If you have been reading this series, you will remember the grid from Part 2. 2,500 squares representing the global population, organized by level of AI interaction. Most grey: never used AI. A band of green: free chatbot users. A smaller band of orange: paying subscribers. A thin red line: people building with APIs and advanced tools.

Grid where each dot is 3.2 million people, organized by level of AI interaction — Each dot is ~3.2 million people, by level of AI interaction

That grid is usually read as adoption. It can also be read as a supply chain.

Large language models were trained on massive corpora of publicly available text, code, images, and cultural production. Much of that material was created by people who had no role in negotiating how it would be reused. Frontier labs converted that corpus into proprietary systems. Subscription users pay for access to compressed capability. Free users consume output without visibility into the layers beneath it.

Then distillation happens again. Capabilities from one proprietary system get compressed into another. The economic logic repeats at a new layer. At each step, value moves upward. Governance evolves slower.

The governance retreat

Anthropic released version 3.0 of its Responsible Scaling Policy yesterday. It is a significant rewrite. In their own assessment, the company acknowledged that key parts of their original theory of change did not play out as hoped. Capability thresholds proved more ambiguous than anticipated. Government action on safety moved slowly. The political environment shifted toward prioritizing competitiveness over regulation. And the safeguards required at higher capability levels may be, in Anthropic’s words, “very hard to meet unilaterally.”

Their response was to separate what they plan to do as a company from what they believe the full industry needs to do collectively. The earlier RSP versions had included commitments to pause development if certain capability thresholds were crossed without adequate safeguards. Version 3.0 replaces those with publicly declared goals that Anthropic will grade its own progress toward. Nonbinding, but transparent.

That logic is coherent. It also names the structural problem clearly: governance frameworks are strongest when aligned with competitive incentives. When they conflict with them, they get revised. This is not unique to Anthropic. It is a pattern that shows up anywhere an emerging infrastructure market is moving faster than the institutions designed to govern it.

The bias debate asks whether data is representative. That matters. The deeper question is structural.

Who extracts?
Who compresses?
Who deploys?
Who governs?
And what happens to governance when it becomes strategically inconvenient?

The leadership decision

If you are leading an organization that depends on AI systems, this is not abstract.

You are not choosing a model. You are choosing a position inside an extraction infrastructure.

The leadership decision: data sourcing alignment, compression economics, governance durability — The leadership decision: data sourcing, compression economics, and governance durability

That position carries assumptions about data sourcing, compression economics, competitive dynamics, and governance durability that most vendor evaluations never surface. This is not a feature decision. It is an alignment decision at the organizational level.

I wrote this with the help of an AI model built by one of the companies discussed above. I used it to verify facts, pressure test claims, and draft language. That is not irony. It is the condition. The infrastructure is pervasive enough that opting out is no longer a meaningful stance.

The question is whether we examine the chain we are part of, or treat it as neutral substrate.

Part 4 of a series examining the structural challenges AI is surfacing.

AI’s Extraction Problem

Training, or theft?

The pattern doesn’t stop at model compression

The governance retreat

The leadership decision

Joe Post

Previous Post"Taste" Is Doing More Work Than You Think

Next PostSeeing the Matrix