MemTrain: Self-Supervised Context Memory Training
Abstract
A self-supervised training framework called MemTrain enhances long-horizon language model agents' memory capabilities through proxy tasks optimized via GRPO, improving downstream reasoning performance.
Memory is an indispensable capability for long-horizon LLM agents, enabling them to preserve and utilize information accumulated across extended interactions. Existing memory-agent approaches are typically trained end-to-end with reinforcement learning on downstream tasks. However, collecting high-quality annotated problems for memory-intensive scenarios is costly, and the resulting training data often lack sufficient diversity to cover general memory behaviors. In this work, we propose MemTrain, a self-supervised training framework for generally enhancing the context-memory capability of LLM agents for more effective downstream post-training. MemTrain introduces two coupled proxy tasks over unlabeled Wikipedia corpora: (1) an end-to-end masked reconstruction objective, which requires the model to recover masked entities after multiple rounds of memory updates, thereby encouraging memory maintenance from the final outcome perspective; and (2) an intermediate memory recall objective, which requires the model to reconstruct masked historical information using intermediate memory states, encouraging faithful compression and memory completeness throughout the interaction process. The two objectives are jointly optimized using GRPO. Extensive experiments on long-text QA and search-based QA benchmarks demonstrate that MemTrain consistently improves downstream memory-intensive reasoning performance across different models, achieving gains of up to 17.67 points over direct task-specific post-training.
Community
A simple yet effective approach to enhancing agent memory in a general-purpose manner. By performing self-supervised training solely on Wikipedia, it produces a stronger starter checkpoint that substantially improves the effectiveness of subsequent post-training across a wide range of memory-intensive tasks.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- InfoMem: Training Long-Context Memory Agents with Answer-Conditioned Information Gain (2026)
- Meta-Cognitive Memory Policy Optimization for Long-Horizon LLM Agents (2026)
- Memory-R2: Fair Credit Assignment for Long-Horizon Memory-Augmented LLM Agents (2026)
- ElasticMem: Latent Memory as a Learnable Resource for LLM Agents (2026)
- Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory (2026)
- R^2-Mem: Reflective Experience for Memory Search (2026)
- Tree-based Credit Assignment for Multi-Agent Memory System (2026)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend
Get this paper in your agent:
hf papers read 2606.03197 Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper