FLUX-Makeup: High-Fidelity, Identity-Consistent, and Robust Makeup Transfer via Diffusion Transformer Paper • 2508.05069 • Published Aug 7, 2025 • 1
Generation-Augmented Generation: A Plug-and-Play Framework for Private Knowledge Injection in Large Language Models Paper • 2601.08209 • Published 19 days ago • 1
RzenEmbed: Towards Comprehensive Multimodal Retrieval Paper • 2510.27350 • Published Oct 31, 2025 • 1
EVTAR: End-to-End Try on with Additional Unpaired Visual Reference Paper • 2511.00956 • Published Nov 2, 2025 • 5
FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model Paper • 2510.10921 • Published Oct 13, 2025 • 11 • 2
FG-CLIP 2 Collection FG-CLIP 2 is the foundation model for fine-grained vision-language understanding in both English and Chinese. • 10 items • Updated Nov 6, 2025 • 5
FG-CLIP 2: A Bilingual Fine-grained Vision-Language Alignment Model Paper • 2510.10921 • Published Oct 13, 2025 • 11
Prompt as Knowledge Bank: Boost Vision-language model via Structural Representation for zero-shot medical detection Paper • 2502.16223 • Published Feb 22, 2025
FG-CLIP Collection New generation of CLIP with strong fine grained discrimination capability • 6 items • Updated Oct 15, 2025 • 4
FG-CLIP: Fine-Grained Visual and Textual Alignment Paper • 2505.05071 • Published May 8, 2025 • 18 • 3
RelaCtrl: Relevance-Guided Efficient Control for Diffusion Transformers Paper • 2502.14377 • Published Feb 20, 2025 • 12
Bridge Diffusion Model: bridge non-English language-native text-to-image diffusion model with English communities Paper • 2309.00952 • Published Sep 2, 2023
FancyVideo: Towards Dynamic and Consistent Video Generation via Cross-frame Textual Guidance Paper • 2408.08189 • Published Aug 15, 2024 • 17
Qihoo-T2X: An Efficiency-Focused Diffusion Transformer via Proxy Tokens for Text-to-Any-Task Paper • 2409.04005 • Published Sep 6, 2024 • 19