Extracting Books from Production Language Models

Ahmed Ahmed1* ahmedah@cs.stanford.edu
A. Feder Cooper1,2* a.feder.cooper@yale.edu
Sanmi Koyejo1 sanmi@cs.stanford.edu
Percy Liang1 pliang@cs.stanford.edu
1Stanford University 2Yale University *Equal contribution

January 2026

Read the Paper on arXiv

We extract thousands of words of memorized books from production language models.

To do so, we use a simple two-phase procedure that evades guardrails:

  1. Instruct the LLM to continue a short prefix from a book (querying directly for Gemini 2.5 Pro and Grok 3, using Best-of-N jailbreaking for Claude 3.7 Sonnet and GPT-4.1)
  2. If successful (it wasn't always), then repeatedly query to continue the book

We define near-verbatim recall (nv-recall) to quantify book extraction success: the proportion of long-form, near-verbatim blocks of text shared by both the book and the generation. For Harry Potter and the Sorcerer's Stone:

  • (jailbroken) Claude 3.7 Sonnet→95.8%, GPT-4.1→4.0%
  • (not jailbroken) Gemini 2.5 Pro→76.8%, Grok 3→70.3%

Our work highlights that, even with model- and system-level safeguards, extraction of (in-copyright) training data remains a risk for production LLMs.

Also check out our prior work, which extracted copyrighted books from open-weight models.

Extraction from Harry Potter and the Sorcerer's Stone using Claude 3.7 Sonnet.

Extraction from A Game of Thrones using Claude 3.7 Sonnet.