DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published 6 days ago • 181
TUNA: Taming Unified Visual Representations for Native Unified Multimodal Models Paper • 2512.02014 • Published 7 days ago • 57
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation Paper • 2511.20714 • Published 14 days ago • 45
Kinematify: Open-Vocabulary Synthesis of High-DoF Articulated Objects Paper • 2511.01294 • Published Nov 3 • 13
FlashWorld: High-quality 3D Scene Generation within Seconds Paper • 2510.13678 • Published Oct 15 • 71
FlashVSR: Towards Real-Time Diffusion-Based Streaming Video Super-Resolution Paper • 2510.12747 • Published Oct 14 • 37
InfiniHuman: Infinite 3D Human Creation with Precise Control Paper • 2510.11650 • Published Oct 13 • 5
Thinking with Camera: A Unified Multimodal Model for Camera-Centric Understanding and Generation Paper • 2510.08673 • Published Oct 9 • 125
Webscale-RL: Automated Data Pipeline for Scaling RL Data to Pretraining Levels Paper • 2510.06499 • Published Oct 7 • 31
VideoCanvas: Unified Video Completion from Arbitrary Spatiotemporal Patches via In-Context Conditioning Paper • 2510.08555 • Published Oct 9 • 63
Triangle Splatting+: Differentiable Rendering with Opaque Triangles Paper • 2509.25122 • Published Sep 29 • 8
Rolling Forcing: Autoregressive Long Video Diffusion in Real Time Paper • 2509.25161 • Published Sep 29 • 24
Kling-Avatar: Grounding Multimodal Instructions for Cascaded Long-Duration Avatar Animation Synthesis Paper • 2509.09595 • Published Sep 11 • 48
ToonComposer: Streamlining Cartoon Production with Generative Post-Keyframing Paper • 2508.10881 • Published Aug 14 • 52
VLM4D: Towards Spatiotemporal Awareness in Vision Language Models Paper • 2508.02095 • Published Aug 4 • 9