δ-mem: Efficient Online Memory for Large Language Models Paper • 2605.12357 • Published 12 days ago • 120
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information Paper • 2605.11609 • Published 12 days ago • 189
Entropy-Adaptive Fine-Tuning: Resolving Confident Conflicts to Mitigate Forgetting Paper • 2601.02151 • Published Jan 5 • 115
Rewarding the Rare: Uniqueness-Aware RL for Creative Problem Solving in LLMs Paper • 2601.08763 • Published Jan 13 • 150
Improving Data and Reward Design for Scientific Reasoning in Large Language Models Paper • 2602.08321 • Published Feb 9 • 44
SpargeAttention2: Trainable Sparse Attention via Hybrid Top-k+Top-p Masking and Distillation Fine-Tuning Paper • 2602.13515 • Published Feb 13 • 45
Good SFT Optimizes for SFT, Better SFT Prepares for Reinforcement Learning Paper • 2602.01058 • Published Feb 1 • 45
Length-Unbiased Sequence Policy Optimization: Revealing and Controlling Response Length Variation in RLVR Paper • 2602.05261 • Published Feb 5 • 53
BitDance: Scaling Autoregressive Generative Models with Binary Tokens Paper • 2602.14041 • Published Feb 15 • 54
HyTRec: A Hybrid Temporal-Aware Attention Architecture for Long Behavior Sequential Recommendation Paper • 2602.18283 • Published Feb 20 • 57
SLA2: Sparse-Linear Attention with Learnable Routing and QAT Paper • 2602.12675 • Published Feb 13 • 59
F-GRPO: Don't Let Your Policy Learn the Obvious and Forget the Rare Paper • 2602.06717 • Published Feb 6 • 75
Composition-RL: Compose Your Verifiable Prompts for Reinforcement Learning of Large Language Models Paper • 2602.12036 • Published Feb 12 • 94
Golden Goose: A Simple Trick to Synthesize Unlimited RLVR Tasks from Unverifiable Internet Text Paper • 2601.22975 • Published Jan 30 • 112