How Much Heavy Lifting Can an Agent Harness Do?: Measuring the LLM's Residual Role in a Planning Agent Paper • 2604.07236 • Published 3 days ago • 1
Becoming Experienced Judges: Selective Test-Time Learning for Evaluators Paper • 2512.06751 • Published Dec 7, 2025 • 1
V-Agent: An Interactive Video Search System Using Vision-Language Models Paper • 2512.16925 • Published Nov 4, 2025 • 2
ClawsBench: Evaluating Capability and Safety of LLM Productivity Agents in Simulated Workspaces Paper • 2604.05172 • Published 25 days ago • 24
SpatialBoost: Enhancing Visual Representation through Language-Guided Reasoning Paper • 2603.22057 • Published Mar 23 • 46
RoboAlign: Learning Test-Time Reasoning for Language-Action Alignment in Vision-Language-Action Models Paper • 2603.21341 • Published Mar 22 • 23