InsightTok

InsightTok is a discrete visual tokenizer designed to improve the fidelity of text and faces, two of the most challenging yet perceptually important structures in autoregressive image generation.

It was introduced in the paper InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation.

Paper:: InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation
Code: https://github.com/LeapLabTHU/InsightTok

Model Details

Property	Value
Downsampling rate	16×
Codebook size	16,384
Latent dimension	256
Number of parameters	426M

Performance

InsightTok achieves strong text and face reconstruction quality while maintaining a compact discrete representation.

Usage

Please refer to our GitHub repository.

Citation

@article{yue2026insighttok,
  title={InsightTok: Improving Text and Face Fidelity in Discrete Tokenization for Autoregressive Image Generation},
  author={Yue, Yang and Wei, Fangyun and He, Tianyu and Zhao, Jinjing and Ni, Zanlin and Liu, Zeyu and Guo, Jiayi and Shi, Lei and Dong, Yue and Chen, Li and Li, Ji and Huang, Gao and Chen, Dong},
  journal={arXiv preprint arXiv:TODO},
  year={2026}
}

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support