Collections
Discover the best community collections!
Collections including paper arxiv:2403.03206
-
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Paper • 2405.08748 • Published • 24 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 30 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 132 -
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 41
-
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 29 -
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Paper • 2404.03653 • Published • 36 -
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Paper • 2404.02747 • Published • 13 -
3D Congealing: 3D-Aware Image Alignment in the Wild
Paper • 2404.02125 • Published • 10
-
Rich feature hierarchies for accurate object detection and semantic segmentation
Paper • 1311.2524 • Published • 1 -
DeepPose: Human Pose Estimation via Deep Neural Networks
Paper • 1312.4659 • Published • 1 -
Generative Adversarial Networks
Paper • 1406.2661 • Published • 5 -
scikit-image: Image processing in Python
Paper • 1407.6245 • Published • 1
-
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Paper • 2404.01367 • Published • 22 -
On the Scalability of Diffusion-based Text-to-Image Generation
Paper • 2404.02883 • Published • 19 -
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Paper • 2403.03206 • Published • 71 -
Improved Denoising Diffusion Probabilistic Models
Paper • 2102.09672 • Published • 2
-
Rich feature hierarchies for accurate object detection and semantic segmentation
Paper • 1311.2524 • Published • 1 -
DeepPose: Human Pose Estimation via Deep Neural Networks
Paper • 1312.4659 • Published • 1 -
Generative Adversarial Networks
Paper • 1406.2661 • Published • 5 -
scikit-image: Image processing in Python
Paper • 1407.6245 • Published • 1
-
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding
Paper • 2405.08748 • Published • 24 -
Grounding DINO 1.5: Advance the "Edge" of Open-Set Object Detection
Paper • 2405.10300 • Published • 30 -
Chameleon: Mixed-Modal Early-Fusion Foundation Models
Paper • 2405.09818 • Published • 132 -
OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework
Paper • 2405.11143 • Published • 41
-
No "Zero-Shot" Without Exponential Data: Pretraining Concept Frequency Determines Multimodal Model Performance
Paper • 2404.04125 • Published • 29 -
CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching
Paper • 2404.03653 • Published • 36 -
Cross-Attention Makes Inference Cumbersome in Text-to-Image Diffusion Models
Paper • 2404.02747 • Published • 13 -
3D Congealing: 3D-Aware Image Alignment in the Wild
Paper • 2404.02125 • Published • 10
-
Bigger is not Always Better: Scaling Properties of Latent Diffusion Models
Paper • 2404.01367 • Published • 22 -
On the Scalability of Diffusion-based Text-to-Image Generation
Paper • 2404.02883 • Published • 19 -
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis
Paper • 2403.03206 • Published • 71 -
Improved Denoising Diffusion Probabilistic Models
Paper • 2102.09672 • Published • 2