Anchored Decoding: Provably Reducing Copyright Risk for Any Language Model
Abstract
Anchor decoding suppresses verbatim copying in language models while maintaining fluency and factual accuracy through constrained generation that balances risk and utility.
Modern language models (LMs) tend to memorize portions of their training data and emit verbatim spans. When the underlying sources are sensitive or copyright-protected, such reproduction raises issues of consent and compensation for creators and compliance risks for developers. We propose Anchored Decoding, a plug-and-play inference-time method for suppressing verbatim copying: it enables decoding from any risky LM trained on mixed-license data by keeping generation in bounded proximity to a permissively trained safe LM. Anchored Decoding adaptively allocates a user-chosen information budget over the generation trajectory and enforces per-step constraints that yield a sequence-level guarantee, enabling a tunable risk-utility trade-off. To make Anchored Decoding practically useful, we introduce a new permissively trained safe model (TinyComma 1.8B), as well as Anchored_{Byte} Decoding, a byte-level variant of our method that enables cross-vocabulary fusion via the ByteSampler framework (Hayase et al., 2025). We evaluate our methods across six model pairs on long-form evaluations of copyright risk and utility. Anchored and Anchored_{Byte} Decoding define a new Pareto frontier, preserving near-original fluency and factuality while eliminating up to 75% of the measurable copying gap (averaged over six copying metrics) between the risky baseline and a safe reference, at a modest inference overhead.
Community
The memorization and reproduction of copyrighted text in LLMs is an issue that has potentially harmful repercussions for both data creators and AI developers. To this end, Anchored Decoding is a decoding technique for language models (LMs) that provably reduces the likelihood of generating copyrighted text. It requires two LMs: a safe model trained exclusively on permissively licensed data, and a risky model that is higher-utility and trained on mixed-licensed data.
Anchored Decoding works for both token-level and byte-level decoding. To make this algorithm as practical as possible, we release (1) TinyLlama 1.8B, a safe base LM that is tokenizer-compatible with the Llama 3 model family, and (2) byte-level support to facilitate mixed-tokenizer decoding.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- AdaFuse: Adaptive Ensemble Decoding with Test-Time Scaling for LLMs (2026)
- Distilling Token-Trained Models into Byte-Level Models (2026)
- Cross-Tokenizer Likelihood Scoring Algorithms for Language Model Distillation (2025)
- Context-Aware Initialization for Reducing Generative Path Length in Diffusion Language Models (2025)
- CD4LM: Consistency Distillation and aDaptive Decoding for Diffusion Language Models (2026)
- Flatter Tokens are More Valuable for Speculative Draft Model Training (2026)
- Bolmo: Byteifying the Next Generation of Language Models (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper