Papers
arxiv:2602.17270

Unified Latents (UL): How to train your latents

Published on Feb 19
· Submitted by
taesiri
on Feb 20
#2 Paper of the day
Authors:
,

Abstract

Unified Latents framework learns joint latent representations using diffusion prior regularization and diffusion model decoding, achieving competitive FID scores with reduced training compute.

AI-generated summary

We present Unified Latents (UL), a framework for learning latent representations that are jointly regularized by a diffusion prior and decoded by a diffusion model. By linking the encoder's output noise to the prior's minimum noise level, we obtain a simple training objective that provides a tight upper bound on the latent bitrate. On ImageNet-512, our approach achieves competitive FID of 1.4, with high reconstruction quality (PSNR) while requiring fewer training FLOPs than models trained on Stable Diffusion latents. On Kinetics-600, we set a new state-of-the-art FVD of 1.3.

Community

Paper submitter

Unified Latents (UL) jointly regularizes encoders with a diffusion prior and decodes with a diffusion model, giving a tight latent bitrate bound and strong ImageNet/Kinetics performance.

arXivLens breakdown of this paper 👉 https://arxivlens.com/PaperView/Details/unified-latents-ul-how-to-train-your-latents-6833-6cd65751

  • Executive Summary
  • Detailed Breakdown
  • Practical Applications

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2602.17270 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2602.17270 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2602.17270 in a Space README.md to link it from this page.

Collections including this paper 2