LVOmniBench: Pioneering Long Audio-Video Understanding Evaluation for Omnimodal LLMs Paper • 2603.19217 • Published about 19 hours ago • 25
Anatomy of a Lie: A Multi-Stage Diagnostic Framework for Tracing Hallucinations in Vision-Language Models Paper • 2603.15557 • Published 4 days ago • 28
ReasonMap Collection A fine-grained visual reasoning benchmark (We show more question types in the extension dataset.) • 5 items • Updated 3 days ago • 8
ViFeEdit: A Video-Free Tuner of Your Video Diffusion Transformer Paper • 2603.15478 • Published 4 days ago • 24
The Trinity of Consistency as a Defining Principle for General World Models Paper • 2602.23152 • Published 22 days ago • 198
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights Paper • 2512.01816 • Published Dec 1, 2025 • 94
Forging Spatial Intelligence: A Roadmap of Multi-Modal Data Pre-Training for Autonomous Systems Paper • 2512.24385 • Published Dec 30, 2025 • 8
OmniAgent: Audio-Guided Active Perception Agent for Omnimodal Audio-Video Understanding Paper • 2512.23646 • Published Dec 29, 2025 • 15
WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion Paper • 2512.19678 • Published Dec 22, 2025 • 30
In-Video Instructions: Visual Signals as Generative Control Paper • 2511.19401 • Published Nov 24, 2025 • 32