Do not copy and paste! Rewriting strategies for code retrieval Paper • 2605.08299 • Published 6 days ago • 4
Continual Harness: Online Adaptation for Self-Improving Foundation Agents Paper • 2605.09998 • Published 3 days ago • 9
Beyond GRPO and On-Policy Distillation: An Empirical Sparse-to-Dense Reward Principle for Language-Model Post-Training Paper • 2605.12483 • Published 1 day ago • 7
LLMs Improving LLMs: Agentic Discovery for Test-Time Scaling Paper • 2605.08083 • Published 6 days ago • 60
AI Co-Mathematician: Accelerating Mathematicians with Agentic AI Paper • 2605.06651 • Published 7 days ago • 13
Skill1: Unified Evolution of Skill-Augmented Agents via Reinforcement Learning Paper • 2605.06130 • Published 7 days ago • 94
SkillOS: Learning Skill Curation for Self-Evolving Agents Paper • 2605.06614 • Published 7 days ago • 38
Auto Research with Specialist Agents Develops Effective and Non-Trivial Training Recipes Paper • 2605.05724 • Published 7 days ago • 14
Nonsense Helps: Prompt Space Perturbation Broadens Reasoning Exploration Paper • 2605.05566 • Published 7 days ago • 35
Accelerating RL Post-Training Rollouts via System-Integrated Speculative Decoding Paper • 2604.26779 • Published 15 days ago • 13
Learning from Noisy Preferences: A Semi-Supervised Learning Approach to Direct Preference Optimization Paper • 2604.24952 • Published 17 days ago • 6
Compliance versus Sensibility: On the Reasoning Controllability in Large Language Models Paper • 2604.27251 • Published 15 days ago • 8
Nemotron 3 Nano Omni: Efficient and Open Multimodal Intelligence Paper • 2604.24954 • Published 17 days ago • 22
Length Value Model: Scalable Value Pretraining for Token-Level Length Modeling Paper • 2604.27039 • Published 15 days ago • 24
Intern-Atlas: A Methodological Evolution Graph as Research Infrastructure for AI Scientists Paper • 2604.28158 • Published 14 days ago • 47
Efficient Training on Multiple Consumer GPUs with RoundPipe Paper • 2604.27085 • Published 15 days ago • 40