Thinking-while-Generating: Interleaving Textual Reasoning throughout Visual Generation Paper • 2511.16671 • Published 20 days ago • 15
Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm Paper • 2511.04570 • Published Nov 6 • 208
IMG: Calibrating Diffusion Models via Implicit Multimodal Guidance Paper • 2509.26231 • Published Sep 30 • 17
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6 • 188
ART: Anonymous Region Transformer for Variable Multi-Layer Transparent Image Generation Paper • 2502.18364 • Published Feb 25 • 37
Efficient Diffusion Transformer with Step-wise Dynamic Attention Mediators Paper • 2408.05710 • Published Aug 11, 2024 • 2
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper • 2405.15738 • Published May 24, 2024 • 46