-
Revisiting On-Policy Distillation: Empirical Failure Modes and Simple Fixes
Paper • 2603.25562 • Published • 19 -
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe
Paper • 2604.13016 • Published • 108 -
From P(y|x) to P(y): Investigating Reinforcement Learning in Pre-train Space
Paper • 2604.14142 • Published • 30 -
TIP: Token Importance in On-Policy Distillation
Paper • 2604.14084 • Published • 15
Hugo Laurençon
HugoLaurencon
AI & ML interests
None yet
Recent Activity
upvoted a paper 7 days ago
Anti-Self-Distillation for Reasoning RL via Pointwise Mutual Information upvoted a paper 7 days ago
You Only Need Minimal RLVR Training: Extrapolating LLMs via Rank-1 Trajectories