TL;DR
Researchers tested whether production large language models can reproduce copyrighted books from their training data. Using a two-phase probing procedure and a block-based recall score, they extracted substantial text from several commercial models, sometimes requiring jailbreak methods.
What happened
A research team evaluated whether production large language models (LLMs) can be induced to output whole books or large portions of them. They used a two-phase approach: an initial probe to check extraction feasibility — sometimes employing a Best-of-N (BoN) jailbreak — followed by iterative continuation prompts aimed at completing the text. The study tested four production systems: Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3. Extraction success was measured with an nv-recall score, a block-based approximation of longest common substring. Results varied by model and configuration: in some trials jailbroken Claude 3.7 Sonnet produced near-verbatim books (nv-recall as high as 95.8%), while Gemini 2.5 Pro and Grok 3 produced high recall for specific titles without needing jailbreaks (e.g., 76.8% and 70.3% for a Harry Potter book). GPT-4.1 required many more BoN attempts and eventually refused to continue in tested runs (nv-recall reported at 4.0% in a cited case). Experiments ran from mid-August to mid-September 2025; providers were notified and the results were released after a 90-day disclosure period.
Why it matters
- Demonstrates that production LLMs can, under some conditions, reproduce copyrighted training text at scale, raising questions about memorization risk.
- Suggests that system-level safeguards and safety filters do not uniformly prevent extraction across different models and settings.
- Impacts discussions about copyright, training data practices, and liability for models that memorize and emit protected works.
- Highlights a need for clearer technical and policy measures to limit unintended verbatim reproduction from deployed models.
Key facts
- Study authors: Ahmed Ahmed, A. Feder Cooper, Sanmi Koyejo, Percy Liang.
- Two-phase extraction method: initial probe (sometimes BoN jailbreak) + iterative continuation prompts.
- Models tested: Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, Grok 3.
- Measurement: nv-recall, a block-based approximation of longest common substring, used to quantify extraction.
- Examples: Gemini 2.5 Pro achieved nv-recall of 76.8% and Grok 3 reached 70.3% for Harry Potter and the Sorcerer's Stone in Phase 1 probes.
- Claude 3.7 Sonnet produced near-verbatim book outputs in some jailbroken trials (nv-recall up to 95.8%).
- GPT-4.1 required many more Best-of-N attempts (reported ~20x more) and in at least one experiment ultimately refused to continue (nv-recall = 4.0%).
- Experiments conducted mid-August to mid-September 2025; affected providers were notified and a 90-day disclosure window was observed before public release.
What to watch next
- Whether affected providers update model- or system-level safeguards in response to these results — not confirmed in the source.
- Any regulatory or legal actions prompted by demonstrated extraction of copyrighted material from deployed models — not confirmed in the source.
- Follow-up studies that replicate the procedure across more models, datasets, or with different jailbreak strategies — not confirmed in the source.
Quick glossary
- Large language model (LLM): A type of AI model trained on large amounts of text to generate human-like language in response to prompts.
- Jailbreak: A prompt or sequence of prompts designed to bypass safety filters or usage restrictions in a deployed model.
- Best-of-N (BoN): A technique that generates multiple candidate outputs and selects the best according to some criterion, sometimes used to find an output that bypasses controls.
- Memorization: The phenomenon where a model stores and can reproduce verbatim pieces of its training data.
- nv-recall: A block-based approximation of longest common substring used in this study to quantify how much text a model reproduces from a reference.
Reader FAQ
Did the researchers extract entire copyrighted books from production models?
The study reports that in some jailbroken trials a model produced near-verbatim book outputs (example nv-recall up to 95.8%); the exact extent varied by model and configuration.
Which commercial models were tested?
The paper evaluated Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3.
Were model providers informed before publication?
Yes. The authors say they notified affected providers shortly after running experiments and waited a 90-day disclosure window before public release.
Will this lead to legal or regulatory action against providers?
Not confirmed in the source.
Computer Science > Computation and Language [Submitted on 6 Jan 2026] Extracting books from production language models Ahmed Ahmed, A. Feder Cooper, Sanmi Koyejo, Percy Liang Many unresolved legal questions…
Sources
- Extracting books from production language models (2026)
- Extracting Books from Production LLMs
- Can production, consumer-facing LLMs (with guardrails …
- Boffins probe commercial AI models, find Harry Potter
Related posts
- UK Directs Ofcom to Assess Compulsory Scanning of Encrypted Messages
- Show HN: Using Claude Code to discover thematic connections across 100 books
- Whenwords: a language-agnostic library built from specs, no code