Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models Paper • 2606.03988 • Published 7 days ago • 109
iVGR: Internalizing Visually Grounded Reasoning for MLLMs with Reinforcement Learning Paper • 2605.31096 • Published 12 days ago • 7
Crafter: A Multi-Agent Harness for Editable Scientific Figure Generation from Diverse Inputs Paper • 2605.30611 • Published 13 days ago • 192
MRT: Masked Region Transformer for Layered Image Generation and Editing at Scale Paper • 2605.27235 • Published 15 days ago • 8
Multi-view Consistent 3D Gaussian Head Avatars 'without' Multi-view Generation Paper • 2605.25220 • Published 17 days ago • 7
DelTA: Discriminative Token Credit Assignment for Reinforcement Learning from Verifiable Rewards Paper • 2605.21467 • Published 21 days ago • 204
OpenSearch-VL: An Open Recipe for Frontier Multimodal Search Agents Paper • 2605.05185 • Published May 6 • 102
UniDoc-RL: Coarse-to-Fine Visual RAG with Hierarchical Actions and Dense Rewards Paper • 2604.14967 • Published Apr 16 • 15
RationalRewards: Reasoning Rewards Scale Visual Generation Both Training and Test Time Paper • 2604.11626 • Published Apr 13 • 102
SkillClaw: Let Skills Evolve Collectively with Agentic Evolver Paper • 2604.08377 • Published Apr 9 • 293
Faithful GRPO: Improving Visual Spatial Reasoning in Multimodal Language Models via Constrained Policy Optimization Paper • 2604.08476 • Published Apr 9 • 8
GrandCode: Achieving Grandmaster Level in Competitive Programming via Agentic Reinforcement Learning Paper • 2604.02721 • Published Apr 3 • 632
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published Mar 30 • 343
Bootstrapping Exploration with Group-Level Natural Language Feedback in Reinforcement Learning Paper • 2603.04597 • Published Mar 4 • 211
VESPO: Variational Sequence-Level Soft Policy Optimization for Stable Off-Policy LLM Training Paper • 2602.10693 • Published Feb 11 • 221