LightReasoner Models Collection https://arxiv.org/abs/2510.07962 • 3 items • Updated Oct 19, 2025 • 5
ScaleCUA: Scaling Open-Source Computer Use Agents with Cross-Platform Data Paper • 2509.15221 • Published Sep 18, 2025 • 111
Harnessing Uncertainty: Entropy-Modulated Policy Gradients for Long-Horizon LLM Agents Paper • 2509.09265 • Published Sep 11, 2025 • 47
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper • 2509.02479 • Published Sep 2, 2025 • 84
CODA: Coordinating the Cerebrum and Cerebellum for a Dual-Brain Computer Use Agent with Decoupled Reinforcement Learning Paper • 2508.20096 • Published Aug 27, 2025 • 37
WideSearch: Benchmarking Agentic Broad Info-Seeking Paper • 2508.07999 • Published Aug 11, 2025 • 110
Is Extending Modality The Right Path Towards Omni-Modality? Paper • 2506.01872 • Published Jun 2, 2025 • 24
A Controllable Examination for Long-Context Language Models Paper • 2506.02921 • Published Jun 3, 2025 • 33
ARIA: Training Language Agents with Intention-Driven Reward Aggregation Paper • 2506.00539 • Published May 31, 2025 • 30
BookWorld: From Novels to Interactive Agent Societies for Creative Story Generation Paper • 2504.14538 • Published Apr 20, 2025 • 30 • 2
JiraiBench: A Bilingual Benchmark for Evaluating Large Language Models' Detection of Human Self-Destructive Behavior Content in Jirai Community Paper • 2503.21679 • Published Mar 27, 2025 • 1
ValueFX9507/Tifa-DeepsexV2-7b-MGRPO-GGUF-Q8 Reinforcement Learning • 8B • Updated Mar 28, 2025 • 1.71k • 189
CapArena: Benchmarking and Analyzing Detailed Image Captioning in the LLM Era Paper • 2503.12329 • Published Mar 16, 2025 • 27