shannu122/whisper-small-en-ja-distil Automatic Speech Recognition • 0.2B • Updated 16 days ago • 28 • 1
Mean Mode Screaming: Mean--Variance Split Residuals for 1000-Layer Diffusion Transformers Paper • 2605.06169 • Published 27 days ago • 233
When Numbers Speak: Aligning Textual Numerals and Visual Instances in Text-to-Video Diffusion Models Paper • 2604.08546 • Published Apr 9 • 115
HY-Embodied-0.5: Embodied Foundation Models for Real-World Agents Paper • 2604.07430 • Published Apr 8 • 189
DynaVid: Learning to Generate Highly Dynamic Videos using Synthetic Motion Data Paper • 2604.01666 • Published Apr 2 • 10
CARLA-Air: Fly Drones Inside a CARLA World -- A Unified Infrastructure for Air-Ground Embodied Intelligence Paper • 2603.28032 • Published Mar 30 • 343