daily papers - a jmkim0309 Collection

jmkim0309 's Collections

long video generation

paper seminar_251001

daily papers

updated 3 days ago

GenTron: Delving Deep into Diffusion Transformers for Image and Video Generation

Paper • 2312.04557 • Published Dec 7, 2023 • 13
Smooth Diffusion: Crafting Smooth Latent Spaces in Diffusion Models

Paper • 2312.04410 • Published Dec 7, 2023 • 15
PhotoMaker: Customizing Realistic Human Photos via Stacked ID Embedding

Paper • 2312.04461 • Published Dec 7, 2023 • 62
Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively

Paper • 2401.02955 • Published Jan 5, 2024 • 23
Denoising Vision Transformers

Paper • 2401.02957 • Published Jan 5, 2024 • 31
SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation

Paper • 2312.16272 • Published Dec 26, 2023 • 7
PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion

Paper • 2312.16486 • Published Dec 27, 2023 • 7
Edify Image: High-Quality Image Generation with Pixel Space Laplacian Diffusion Models

Paper • 2411.07126 • Published Nov 11, 2024 • 30
Motion Control for Enhanced Complex Action Video Generation

Paper • 2411.08328 • Published Nov 13, 2024 • 5
JanusFlow: Harmonizing Autoregression and Rectified Flow for Unified Multimodal Understanding and Generation

Paper • 2411.07975 • Published Nov 12, 2024 • 31
Pyramidal Flow Matching for Efficient Video Generative Modeling

Paper • 2410.05954 • Published Oct 8, 2024 • 40
Divot: Diffusion Powers Video Tokenizer for Comprehension and Generation

Paper • 2412.04432 • Published Dec 5, 2024 • 16
LiFT: Leveraging Human Feedback for Text-to-Video Model Alignment

Paper • 2412.04814 • Published Dec 6, 2024 • 47
Mind the Time: Temporally-Controlled Multi-Event Video Generation

Paper • 2412.05263 • Published Dec 6, 2024 • 11
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows

Paper • 2412.01169 • Published Dec 2, 2024 • 13
PUMA: Empowering Unified MLLM with Multi-granular Visual Generation

Paper • 2410.13861 • Published Oct 17, 2024 • 56
UIP2P: Unsupervised Instruction-based Image Editing via Cycle Edit Consistency

Paper • 2412.15216 • Published Dec 19, 2024 • 5
MotiF: Making Text Count in Image Animation with Motion Focal Loss

Paper • 2412.16153 • Published Dec 20, 2024 • 6
Large Motion Video Autoencoding with Cross-modal Video VAE

Paper • 2412.17805 • Published Dec 23, 2024 • 24
AnyStory: Towards Unified Single and Multiple Subject Personalization in Text-to-Image Generation

Paper • 2501.09503 • Published Jan 16 • 14
Do generative video models learn physical principles from watching videos?

Paper • 2501.09038 • Published Jan 14 • 34
Small Models Struggle to Learn from Strong Reasoners

Paper • 2502.12143 • Published Feb 17 • 40
VBench-2.0: Advancing Video Generation Benchmark Suite for Intrinsic Faithfulness

Paper • 2503.21755 • Published Mar 27 • 33
Efficient Generative Model Training via Embedded Representation Warmup

Paper • 2504.10188 • Published Apr 14 • 12
Video-As-Prompt: Unified Semantic Control for Video Generation

Paper • 2510.20888 • Published Oct 23 • 45
MMaDA-Parallel: Multimodal Large Diffusion Language Models for Thinking-Aware Editing and Generation

Paper • 2511.09611 • Published 24 days ago • 68
TiViBench: Benchmarking Think-in-Video Reasoning for Video Generative Models

Paper • 2511.13704 • Published 19 days ago • 42
Back to Basics: Let Denoising Generative Models Denoise

Paper • 2511.13720 • Published 19 days ago • 63
DiP: Taming Diffusion Models in Pixel Space

Paper • 2511.18822 • Published 13 days ago • 24
Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13 • 165
PixelDiT: Pixel Diffusion Transformers for Image Generation

Paper • 2511.20645 • Published 11 days ago • 24