arxiv:2603.25240

Lingshu-Cell: A generative cellular world model for transcriptome modeling toward virtual cells

Published on Mar 26

· Submitted by

Han Zhang on Apr 1

#3 Paper of the day

DAMO Academy

Upvote

Authors:

Han Zhang ,

Yu Rong

Abstract

Lingshu-Cell is a masked discrete diffusion model that learns transcriptomic state distributions and enables conditional simulation of cellular perturbations across diverse tissues and species.

AI-generated summary

Modeling cellular states and predicting their responses to perturbations are central challenges in computational biology and the development of virtual cells. Existing foundation models for single-cell transcriptomics provide powerful static representations, but they do not explicitly model the distribution of cellular states for generative simulation. Here, we introduce Lingshu-Cell, a masked discrete diffusion model that learns transcriptomic state distributions and supports conditional simulation under perturbation. By operating directly in a discrete token space that is compatible with the sparse, non-sequential nature of single-cell transcriptomic data, Lingshu-Cell captures complex transcriptome-wide expression dependencies across approximately 18,000 genes without relying on prior gene selection, such as filtering by high variability or ranking by expression level. Across diverse tissues and species, Lingshu-Cell accurately reproduces transcriptomic distributions, marker-gene expression patterns and cell-subtype proportions, demonstrating its ability to capture complex cellular heterogeneity. Moreover, by jointly embedding cell type or donor identity with perturbation, Lingshu-Cell can predict whole-transcriptome expression changes for novel combinations of identity and perturbation. It achieves leading performance on the Virtual Cell Challenge H1 genetic perturbation benchmark and in predicting cytokine-induced responses in human PBMCs. Together, these results establish Lingshu-Cell as a flexible cellular world model for in silico simulation of cell states and perturbation responses, laying the foundation for a new paradigm in biological discovery and perturbation screening.

View arXiv page View PDF Project page Add to collection

Community

bibona

Paper author Paper submitter about 13 hours ago

✨ Highlights

Lingshu-Cell introduces a generative cellular world model for single-cell transcriptomics based on a masked discrete diffusion framework.
Lingshu-Cell performs transcriptome-wide modeling over ~18,000 genes directly in a discrete token space that is compatible with the sparse, non-sequential nature of scRNA-seq data, without prior gene selection.
Lingshu-Cell reproduces realistic cell populations across diverse tissues and species, capturing marker-gene expression patterns, cell-subtype proportions, and transcriptomic distributions.
Lingshu-Cell achieves strong performance in response prediction under both genetic and cytokine perturbations.

avahal

about 4 hours ago

one question that sticks with me: how robust is the fixed discrete token vocabulary to skewed expression, especially for rare transcripts that often drive fine-grained subtypes?

they claim no gene filtering and model all ~18k genes in a single vocab, but would varying tokenization granularity or using adaptive binning change the recovery of marker genes and cell-subtype proportions?

the arxivlens breakdown helped me parse the method details, particularly the discrete diffusion in token space and the conditioning scheme.

have you done any ablation with different vocab sizes or quantization levels to quantify how much the discreteness itself, beyond model size, drives performance?

overall it's a neat step toward truly generative, perturbation-aware cell modeling, and i’m curious how you see this scaling to longitudinal trajectories or multi-omics in the future.

bibona

Paper author about 1 hour ago

Thanks for the thoughtful question.

Just to clarify one point: in our setup, each cell is represented over a fixed list of ~18k genes, while the vocab is used for discretized expression values rather than gene identities.

The quantization is also not uniform: low counts are kept at fine resolution, while higher counts are compressed more coarsely, but still with roughly two significant digits preserved. So the design is meant to retain low-abundance signals while handling the heavy-tailed count range efficiently.

So for skewed expression distributions, especially low-abundance signals, we do not think the current discretization should wash them out in practice. Empirically, it also works well across the single-cell transcriptomic datasets in the paper, including good recovery of marker-gene patterns and subtype proportions. A more dedicated ablation over different quantization schemes or adaptive binning would definitely be worth exploring next.

We also agree that longitudinal trajectories and multi-omics are meaningful directions, and both feel like very natural next steps for this line of work.