Collections
Discover the best community collections!
Collections including paper arxiv:2403.03206
-
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Paper • 2506.07977 • Published • 41 -
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Paper • 2506.07986 • Published • 19 -
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Paper • 2506.06276 • Published • 26 -
Aligning Latent Spaces with Flow Priors
Paper • 2506.05240 • Published • 27
-
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher
Paper • 2408.14176 • Published • 62 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 63 -
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
Paper • 2409.01199 • Published • 14
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper • 2412.13303 • Published • 72 -
rStar2-Agent: Agentic Reasoning Technical Report
Paper • 2508.20722 • Published • 116 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 53 -
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Paper • 2509.12201 • Published • 104
-
aMUSEd: An Open MUSE Reproduction
Paper • 2401.01808 • Published • 31 -
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 1.22M • • 12k -
Qwen/Qwen2-VL-7B-Instruct
Image-Text-to-Text • 8B • Updated • 1.57M • • 1.24k -
zer0int/CLIP-GmP-ViT-L-14
Zero-Shot Image Classification • 0.4B • Updated • 7.23k • 507
-
FastVLM: Efficient Vision Encoding for Vision Language Models
Paper • 2412.13303 • Published • 72 -
rStar2-Agent: Agentic Reasoning Technical Report
Paper • 2508.20722 • Published • 116 -
AgentScope 1.0: A Developer-Centric Framework for Building Agentic Applications
Paper • 2508.16279 • Published • 53 -
OmniWorld: A Multi-Domain and Multi-Modal Dataset for 4D World Modeling
Paper • 2509.12201 • Published • 104
-
OneIG-Bench: Omni-dimensional Nuanced Evaluation for Image Generation
Paper • 2506.07977 • Published • 41 -
Rethinking Cross-Modal Interaction in Multimodal Diffusion Transformers
Paper • 2506.07986 • Published • 19 -
STARFlow: Scaling Latent Normalizing Flows for High-resolution Image Synthesis
Paper • 2506.06276 • Published • 26 -
Aligning Latent Spaces with Flow Priors
Paper • 2506.05240 • Published • 27
-
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher
Paper • 2408.14176 • Published • 62 -
Diffusion Models Are Real-Time Game Engines
Paper • 2408.14837 • Published • 126 -
Transfusion: Predict the Next Token and Diffuse Images with One Multi-Modal Model
Paper • 2408.11039 • Published • 63 -
OD-VAE: An Omni-dimensional Video Compressor for Improving Latent Video Diffusion Model
Paper • 2409.01199 • Published • 14
-
aMUSEd: An Open MUSE Reproduction
Paper • 2401.01808 • Published • 31 -
black-forest-labs/FLUX.1-dev
Text-to-Image • Updated • 1.22M • • 12k -
Qwen/Qwen2-VL-7B-Instruct
Image-Text-to-Text • 8B • Updated • 1.57M • • 1.24k -
zer0int/CLIP-GmP-ViT-L-14
Zero-Shot Image Classification • 0.4B • Updated • 7.23k • 507