jmkim0309
's Collections
daily papers
updated
GenTron: Delving Deep into Diffusion Transformers for Image and Video
Generation
Paper
•
2312.04557
•
Published
•
13
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models
Paper
•
2312.04410
•
Published
•
15
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding
Paper
•
2312.04461
•
Published
•
62
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes
Interactively
Paper
•
2401.02955
•
Published
•
23
Denoising Vision Transformers
Paper
•
2401.02957
•
Published
•
31
SSR-Encoder: Encoding Selective Subject Representation for
Subject-Driven Generation
Paper
•
2312.16272
•
Published
•
7
PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with
Time-Decoupled Training and Reusable Coop-Diffusion
Paper
•
2312.16486
•
Published
•
7
Edify Image: High-Quality Image Generation with Pixel Space Laplacian
Diffusion Models
Paper
•
2411.07126
•
Published
•
30
Motion Control for Enhanced Complex Action Video Generation
Paper
•
2411.08328
•
Published
•
5
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified
Multimodal Understanding and Generation
Paper
•
2411.07975
•
Published
•
31
Pyramidal Flow Matching for Efficient Video Generative Modeling
Paper
•
2410.05954
•
Published
•
40
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation
Paper
•
2412.04432
•
Published
•
16
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment
Paper
•
2412.04814
•
Published
•
47
Mind the Time: Temporally-Controlled Multi-Event Video Generation
Paper
•
2412.05263
•
Published
•
11
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Paper
•
2412.01169
•
Published
•
13
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation
Paper
•
2410.13861
•
Published
•
56
UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit
Consistency
Paper
•
2412.15216
•
Published
•
5
MotiF: Making Text Count in Image Animation with Motion Focal Loss
Paper
•
2412.16153
•
Published
•
6
Large Motion Video Autoencoding with Cross-modal Video VAE
Paper
•
2412.17805
•
Published
•
24
AnyStory: Towards Unified Single and Multiple Subject Personalization in
Text-to-Image Generation
Paper
•
2501.09503
•
Published
•
14
Do generative video models learn physical principles from watching
videos?
Paper
•
2501.09038
•
Published
•
34
Small Models Struggle to Learn from Strong Reasoners
Paper
•
2502.12143
•
Published
•
40
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic
Faithfulness
Paper
•
2503.21755
•
Published
•
33
Efficient Generative Model Training via Embedded Representation Warmup
Paper
•
2504.10188
•
Published
•
12
Video-As-Prompt: Unified Semantic Control for Video Generation
Paper
•
2510.20888
•
Published
•
45
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation
Paper
•
2511.09611
•
Published
•
68
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models
Paper
•
2511.13704
•
Published
•
42
Back to Basics: Let Denoising Generative Models Denoise
Paper
•
2511.13720
•
Published
•
63
DiP: Taming Diffusion Models in Pixel Space
Paper
•
2511.18822
•
Published
•
24
Diffusion Transformers with Representation Autoencoders
Paper
•
2510.11690
•
Published
•
165
PixelDiT: Pixel Diffusion Transformers for Image Generation
Paper
•
2511.20645
•
Published
•
24