SePO: Self-Evolving Prompt Agent for System Prompt Optimization Paper • 2606.04465 • Published 3 days ago • 3
Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction Paper • 2606.05769 • Published 1 day ago • 4
World-Language-Action Model for Unified World Modeling, Language Reasoning, and Action Synthesis Paper • 2606.05979 • Published 1 day ago • 5
The Shadow Price of Reasoning: Economic Perspective on Optimal Budget Allocation for LLMs Paper • 2606.03092 • Published 4 days ago • 6
ArcANE: Do Role-Playing Language Agents Stay in Character at the Right Time? Paper • 2606.05553 • Published 1 day ago • 40
Benchmarks are Not Enough: RAMP for Runtime Assessing of Agentic Models in Production Systems Paper • 2605.27492 • Published 11 days ago • 21
ThoughtFold: Folding Reasoning Chains via Introspective Preference Learning Paper • 2606.03503 • Published 3 days ago • 24
Where Do Deep-Research Agents Go Wrong? Span-Level Error Localization in Agent Trajectories Paper • 2606.02060 • Published 5 days ago • 49
AutoMedBench: Towards Medical AutoResearch with Agentic AI Models Paper • 2606.01961 • Published 5 days ago • 25
Humanoid-GPT: Scaling Data and Structure for Zero-Shot Motion Tracking Paper • 2606.03985 • Published 4 days ago • 38
Skill is Not One-Size-Fits-All: Model-Aware Skill Alignment for LLM Agents Paper • 2605.30723 • Published 8 days ago • 16
Harness-1: Reinforcement Learning for Search Agents with State-Externalizing Harnesses Paper • 2606.02373 • Published 5 days ago • 42
A Matter of TASTE: Improving Coverage and Difficulty of Agent Benchmarks Paper • 2605.28556 • Published 10 days ago • 63