Extracting Books from Production Language Models

We extract thousands of words of memorized books from production language models.

To do so, we use a simple two-phase procedure that evades guardrails:

Instruct the LLM to continue a short prefix from a book (querying directly for Gemini 2.5 Pro and Grok 3, using Best-of-N jailbreaking for Claude 3.7 Sonnet and GPT-4.1)
If successful (it wasn't always), then repeatedly query to continue the book

We define near-verbatim recall (nv-recall) to quantify book extraction success: the proportion of long-form, near-verbatim blocks of text shared by both the book and the generation. For Harry Potter and the Sorcerer's Stone:

(jailbroken) Claude 3.7 Sonnet→95.8%, GPT-4.1→4.0%
(not jailbroken) Gemini 2.5 Pro→76.8%, Grok 3→70.3%

Our work highlights that, even with model- and system-level safeguards, extraction of (in-copyright) training data remains a risk for production LLMs.

Also check out our prior work, which extracted copyrighted books from open-weight models.