TL;DR

Researchers tested whether production large language models can reproduce copyrighted books from their training data. Using a two-phase probing procedure and a block-based recall score, they extracted substantial text from several commercial models, sometimes requiring jailbreak methods.

What happened

A research team evaluated whether production large language models (LLMs) can be induced to output whole books or large portions of them. They used a two-phase approach: an initial probe to check extraction feasibility — sometimes employing a Best-of-N (BoN) jailbreak — followed by iterative continuation prompts aimed at completing the text. The study tested four production systems: Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3. Extraction success was measured with an nv-recall score, a block-based approximation of longest common substring. Results varied by model and configuration: in some trials jailbroken Claude 3.7 Sonnet produced near-verbatim books (nv-recall as high as 95.8%), while Gemini 2.5 Pro and Grok 3 produced high recall for specific titles without needing jailbreaks (e.g., 76.8% and 70.3% for a Harry Potter book). GPT-4.1 required many more BoN attempts and eventually refused to continue in tested runs (nv-recall reported at 4.0% in a cited case). Experiments ran from mid-August to mid-September 2025; providers were notified and the results were released after a 90-day disclosure period.

Why it matters

  • Demonstrates that production LLMs can, under some conditions, reproduce copyrighted training text at scale, raising questions about memorization risk.
  • Suggests that system-level safeguards and safety filters do not uniformly prevent extraction across different models and settings.
  • Impacts discussions about copyright, training data practices, and liability for models that memorize and emit protected works.
  • Highlights a need for clearer technical and policy measures to limit unintended verbatim reproduction from deployed models.

Key facts

  • Study authors: Ahmed Ahmed, A. Feder Cooper, Sanmi Koyejo, Percy Liang.
  • Two-phase extraction method: initial probe (sometimes BoN jailbreak) + iterative continuation prompts.
  • Models tested: Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, Grok 3.
  • Measurement: nv-recall, a block-based approximation of longest common substring, used to quantify extraction.
  • Examples: Gemini 2.5 Pro achieved nv-recall of 76.8% and Grok 3 reached 70.3% for Harry Potter and the Sorcerer's Stone in Phase 1 probes.
  • Claude 3.7 Sonnet produced near-verbatim book outputs in some jailbroken trials (nv-recall up to 95.8%).
  • GPT-4.1 required many more Best-of-N attempts (reported ~20x more) and in at least one experiment ultimately refused to continue (nv-recall = 4.0%).
  • Experiments conducted mid-August to mid-September 2025; affected providers were notified and a 90-day disclosure window was observed before public release.

What to watch next

  • Whether affected providers update model- or system-level safeguards in response to these results — not confirmed in the source.
  • Any regulatory or legal actions prompted by demonstrated extraction of copyrighted material from deployed models — not confirmed in the source.
  • Follow-up studies that replicate the procedure across more models, datasets, or with different jailbreak strategies — not confirmed in the source.

Quick glossary

  • Large language model (LLM): A type of AI model trained on large amounts of text to generate human-like language in response to prompts.
  • Jailbreak: A prompt or sequence of prompts designed to bypass safety filters or usage restrictions in a deployed model.
  • Best-of-N (BoN): A technique that generates multiple candidate outputs and selects the best according to some criterion, sometimes used to find an output that bypasses controls.
  • Memorization: The phenomenon where a model stores and can reproduce verbatim pieces of its training data.
  • nv-recall: A block-based approximation of longest common substring used in this study to quantify how much text a model reproduces from a reference.

Reader FAQ

Did the researchers extract entire copyrighted books from production models?
The study reports that in some jailbroken trials a model produced near-verbatim book outputs (example nv-recall up to 95.8%); the exact extent varied by model and configuration.

Which commercial models were tested?
The paper evaluated Claude 3.7 Sonnet, GPT-4.1, Gemini 2.5 Pro, and Grok 3.

Were model providers informed before publication?
Yes. The authors say they notified affected providers shortly after running experiments and waited a 90-day disclosure window before public release.

Will this lead to legal or regulatory action against providers?
Not confirmed in the source.

Computer Science > Computation and Language [Submitted on 6 Jan 2026] Extracting books from production language models Ahmed Ahmed, A. Feder Cooper, Sanmi Koyejo, Percy Liang Many unresolved legal questions…

Sources

Related posts

By

Leave a Reply

Your email address will not be published. Required fields are marked *