January 2026
We extract thousands of words of memorized books from production language models.
To do so, we use a simple two-phase procedure that evades guardrails:
We define near-verbatim recall (nv-recall) to quantify book extraction success: the proportion of long-form, near-verbatim blocks of text shared by both the book and the generation. For Harry Potter and the Sorcerer's Stone:
Our work highlights that, even with model- and system-level safeguards, extraction of (in-copyright) training data remains a risk for production LLMs.
Also check out our prior work, which extracted copyrighted books from open-weight models.
Extraction from Harry Potter and the Sorcerer's Stone using Claude 3.7 Sonnet.
Extraction from A Game of Thrones using Claude 3.7 Sonnet.