VideoKR: Towards Knowledge- and Reasoning-Intensive Video Understanding Paper • 2606.05259 • Published 3 days ago • 31
From Pixels to Words -- Towards Native One-Vision Models at Scale Paper • 2605.28820 • Published 10 days ago • 72
GEM: Generative Supervision Helps Embodied Intelligence Paper • 2605.28548 • Published 10 days ago • 41
From Pixels to Words -- Towards Native One-Vision Models at Scale Paper • 2605.28820 • Published 10 days ago • 72
LongAV-Compass: Towards Unified Evaluation of Minute-Scale Audio-Visual Generation Across T2AV, I2AV, and V2AV Paper • 2605.26244 • Published 12 days ago • 38
Artifact-Bench: Evaluating MLLMs on Detecting and Assessing the Artifacts of AI-Generated Videos Paper • 2605.18984 • Published 19 days ago • 22
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture Paper • 2605.12500 • Published 25 days ago • 191
SenseNova-U1 Collection SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-Unify Architecture • 9 items • Updated 9 days ago • 69
SWE-chat: Coding Agent Interactions From Real Users in the Wild Paper • 2604.20779 • Published Apr 22 • 16
Agent-World: Scaling Real-World Environment Synthesis for Evolving General Agent Intelligence Paper • 2604.18292 • Published Apr 20 • 85
Rethinking On-Policy Distillation of Large Language Models: Phenomenology, Mechanism, and Recipe Paper • 2604.13016 • Published Apr 14 • 109
Prompt Relay: Inference-Time Temporal Control for Multi-Event Video Generation Paper • 2604.10030 • Published Apr 11 • 15
VisionFoundry: Teaching VLMs Visual Perception with Synthetic Images Paper • 2604.09531 • Published Apr 10 • 9
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published Apr 8 • 189