Papers
arxiv:2603.25240

Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells

Published on Mar 26
· Submitted by
Han Zhang
on Apr 1
#3 Paper of the day
Authors:
,
,
,
,
,
,
,

Abstract

Lingshu-Cell is a masked discrete diffusion model that learns transcriptomic state distributions and enables conditional simulation of cellular perturbations across diverse tissues and species.

AI-generated summary

Modeling cellular states and predicting their responses to perturbations are central challenges in computational biology and the development of virtual cells. Existing foundation models for single-cell transcriptomics provide powerful static representations, but they do not explicitly model the distribution of cellular states for generative simulation. Here, we introduce Lingshu-Cell, a masked discrete diffusion model that learns transcriptomic state distributions and supports conditional simulation under perturbation. By operating directly in a discrete token space that is compatible with the sparse, non-sequential nature of single-cell transcriptomic data, Lingshu-Cell captures complex transcriptome-wide expression dependencies across approximately 18,000 genes without relying on prior gene selection, such as filtering by high variability or ranking by expression level. Across diverse tissues and species, Lingshu-Cell accurately reproduces transcriptomic distributions, marker-gene expression patterns and cell-subtype proportions, demonstrating its ability to capture complex cellular heterogeneity. Moreover, by jointly embedding cell type or donor identity with perturbation, Lingshu-Cell can predict whole-transcriptome expression changes for novel combinations of identity and perturbation. It achieves leading performance on the Virtual Cell Challenge H1 genetic perturbation benchmark and in predicting cytokine-induced responses in human PBMCs. Together, these results establish Lingshu-Cell as a flexible cellular world model for in silico simulation of cell states and perturbation responses, laying the foundation for a new paradigm in biological discovery and perturbation screening.

Community

Paper author Paper submitter

✨ Highlights

  • Lingshu-Cell introduces a generative cellular world model for single-cell transcriptomics based on a masked discrete diffusion framework.

  • Lingshu-Cell performs transcriptome-wide modeling over ~18,000 genes directly in a discrete token space that is compatible with the sparse, non-sequential nature of scRNA-seq data, without prior gene selection.

  • Lingshu-Cell reproduces realistic cell populations across diverse tissues and species, capturing marker-gene expression patterns, cell-subtype proportions, and transcriptomic distributions.

  • Lingshu-Cell achieves strong performance in response prediction under both genetic and cytokine perturbations.

one question that sticks with me: how robust is the fixed discrete token vocabulary to skewed expression, especially for rare transcripts that often drive fine-grained subtypes?

they claim no gene filtering and model all ~18k genes in a single vocab, but would varying tokenization granularity or using adaptive binning change the recovery of marker genes and cell-subtype proportions?

the arxivlens breakdown helped me parse the method details, particularly the discrete diffusion in token space and the conditioning scheme.

have you done any ablation with different vocab sizes or quantization levels to quantify how much the discreteness itself, beyond model size, drives performance?

overall it's a neat step toward truly generative, perturbation-aware cell modeling, and i’m curious how you see this scaling to longitudinal trajectories or multi-omics in the future.

·
Paper author

Thanks for the thoughtful question.

Just to clarify one point: in our setup, each cell is represented over a fixed list of ~18k genes, while the vocab is used for discretized expression values rather than gene identities.

The quantization is also not uniform: low counts are kept at fine resolution, while higher counts are compressed more coarsely, but still with roughly two significant digits preserved. So the design is meant to retain low-abundance signals while handling the heavy-tailed count range efficiently.

So for skewed expression distributions, especially low-abundance signals, we do not think the current discretization should wash them out in practice. Empirically, it also works well across the single-cell transcriptomic datasets in the paper, including good recovery of marker-gene patterns and subtype proportions. A more dedicated ablation over different quantization schemes or adaptive binning would definitely be worth exploring next.

We also agree that longitudinal trajectories and multi-omics are meaningful directions, and both feel like very natural next steps for this line of work.

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2603.25240
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2603.25240 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2603.25240 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2603.25240 in a Space README.md to link it from this page.

Collections including this paper 2