GRACE: Generative Representation Learning via Contrastive Policy Optimization Paper • 2510.04506 • Published Oct 6 • 10
Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window Paper • 2510.08276 • Published Oct 9 • 9
From Captions to Rewards (CAREVL): Leveraging Large Language Model Experts for Enhanced Reward Modeling in Large Vision-Language Models Paper • 2503.06260 • Published Mar 8
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 187