Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing Paper • 2601.04575 • Published 15 days ago • 8
Implicit Neural Representation Facilitates Unified Universal Vision Encoding Paper • 2601.14256 • Published 2 days ago • 5
FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning Paper • 2601.11141 • Published 7 days ago • 8
Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics Paper • 2601.14027 • Published 2 days ago • 9
Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning Paper • 2601.14750 • Published 1 day ago • 14
Rethinking Video Generation Model for the Embodied World Paper • 2601.15282 • Published 1 day ago • 36
MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents Paper • 2601.12346 • Published 5 days ago • 41
FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation Paper • 2601.13976 • Published 2 days ago • 10
KAGE-Bench: Fast Known-Axis Visual Generalization Evaluation for Reinforcement Learning Paper • 2601.14232 • Published 2 days ago • 8
ToolPRMBench: Evaluating and Advancing Process Reward Models for Tool-using Agents Paper • 2601.12294 • Published 5 days ago • 14
OmniTransfer: All-in-one Framework for Spatio-temporal Video Transfer Paper • 2601.14250 • Published 2 days ago • 35
Being-H0.5: Scaling Human-Centric Robot Learning for Cross-Embodiment Generalization Paper • 2601.12993 • Published 3 days ago • 70
Toward Efficient Agents: Memory, Tool learning, and Planning Paper • 2601.14192 • Published 2 days ago • 35
FutureOmni: Evaluating Future Forecasting from Omni-Modal Context for Multimodal LLMs Paper • 2601.13836 • Published 2 days ago • 30