TL;DR

A Stanford–Yale research team tested four production large language models and were able to extract substantial portions of Harry Potter and the Sorcerer's Stone. Extraction success varied by model and prompting technique; the team reported results to the vendors and noted implications for ongoing legal debates about training data and fair use.

What happened

A group of researchers from Stanford and Yale examined whether commercial, production large language models (LLMs) memorize and can reproduce copyrighted texts. The paper, titled "Extracting books from production language models," evaluated Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3. Using prompting strategies — including jailbreaking in some cases to bypass safety filters — the team recovered large swaths of Harry Potter and the Sorcerer's Stone. Reported recall rates were 95.8% for a jailbroken Claude 3.7 Sonnet, 76.8% for Gemini 2.5 Pro and 70.3% for Grok 3 without jailbreaking, and 4% for GPT-4.1. The researchers say recall varied by experimental settings and does not necessarily reflect upper bounds. They disclosed their findings to Anthropic, Google DeepMind, OpenAI, and xAI; xAI did not acknowledge the report. The work is offered as technical evidence relevant to discussions about training data, memorization, and legal liability.

Why it matters

  • If models can reproduce copyrighted works verbatim, that may weaken claims that model outputs are "transformative" for fair use arguments.
  • Safety mechanisms and guardrails designed to prevent large-scale verbatim reproduction can be bypassed in some cases, exposing companies to legal and reputational risk.
  • Commercial AI providers have been sued over alleged unauthorized use of copyrighted content; technical demonstrations of memorization feed into those legal disputes.
  • Lack of transparency about training corpora makes it difficult for users, rights holders, and courts to assess whether copyrighted material is present in models.

Key facts

  • The paper authors are Ahmed Ahmed, A. Feder Cooper, Sanmi Koyejo, and Percy Liang, affiliated with Stanford and Yale.
  • Models tested: Anthropic's Claude 3.7 Sonnet, OpenAI's GPT-4.1, Google's Gemini 2.5 Pro, and xAI's Grok 3.
  • Researchers report extracting nearly all of Harry Potter and the Sorcerer's Stone from a jailbroken Claude 3.7 Sonnet (95.8% recall).
  • Gemini 2.5 Pro and Grok 3 produced substantial portions without jailbreaking (76.8% and 70.3% recall, respectively); GPT-4.1 reproduced about 4% of the book when prompted.
  • The authors cautioned that reported recall rates vary by experimental settings and are not necessarily the maximum achievable.
  • The team notified Anthropic, Google DeepMind, OpenAI, and xAI; only xAI failed to acknowledge the disclosure within the researchers' reporting process.
  • Anthropic removed Claude 3.7 Sonnet as a customer option on November 29, 2025, though the company indicated the model may have been superseded.
  • The researchers noted their findings could be relevant to ongoing legal debates but left detailed legal analysis to others.

What to watch next

  • Court rulings in the many pending lawsuits about training-data use and fair use — researchers say their findings may inform those debates.
  • Whether providers update or harden safety filters and guardrails in response to extraction techniques (not confirmed in the source).
  • Moves toward greater transparency about training datasets or mandatory disclosures (not confirmed in the source).

Quick glossary

  • Guardrails: Filtering and safety mechanisms implemented around AI models to limit harmful or undesired outputs, including reproduction of copyrighted content.
  • Model weights: Numerical parameters learned during training that determine how a machine learning model transforms inputs into outputs.
  • Memorization: Phenomenon where a model learns and can reproduce portions of its training data verbatim rather than generating novel, generalized outputs.
  • Jailbreaking: Prompting or input strategies designed to bypass a model's safety filters and elicit restricted or disallowed outputs.
  • Recall rate: In this context, the percentage of a target text (e.g., a book) that researchers were able to recover from a model.

Reader FAQ

Did the researchers extract the full Harry Potter book from commercial models?
They report recovering nearly all of Harry Potter and the Sorcerer's Stone from a jailbroken Claude 3.7 Sonnet (95.8% recall) and substantial portions from other models; exact full-book extraction beyond those reports is not confirmed in the source.

Which commercial models were tested?
The study evaluated Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3.

Did the companies respond to the disclosure?
The researchers said they reported the findings to Anthropic, Google DeepMind, OpenAI, and xAI; xAI did not acknowledge the report.

Does this prove the companies infringed copyrights?
The source does not provide a legal determination. The researchers noted their findings may be relevant to legal debates but did not make legal conclusions.

AI + ML Boffins probe commercial AI models, find an entire Harry Potter book Dark copyright evasion magic makes light work of developers' guardrails Thomas Claburn Fri 9 Jan 2026 // 01:03 UTC Machine…

Sources

Related posts

By

Leave a Reply

Your email address will not be published. Required fields are marked *