Instructions to use multimolecule/mtsplice with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MultiMolecule
How to use multimolecule/mtsplice with MultiMolecule:
pip install multimolecule
from multimolecule import AutoModel, AutoTokenizer tokenizer = AutoTokenizer.from_pretrained("multimolecule/mtsplice") model = AutoModel.from_pretrained("multimolecule/mtsplice") inputs = tokenizer("UAGCUUAUCAGACUGAUGUUGA", return_tensors="pt") outputs = model(**inputs) embeddings = outputs.last_hidden_state - Notebooks
- Google Colab
- Kaggle
MTSplice
Tissue-specific modeling of the effects of genetic variants on splicing.
Disclaimer
This is an UNOFFICIAL implementation of the MTSplice predicts effects of genetic variants on tissue-specific splicing by Jun Cheng, et al.
The OFFICIAL repository of MTSplice is at gagneurlab/MMSplice_MTSplice.
The MultiMolecule team has confirmed that the provided model and checkpoints are producing the same intermediate representations as the original implementation.
The team releasing MTSplice did not write this model card for this model so this model card has been written by the MultiMolecule team.
Model Details
MTSplice is the tissue-specific second generation of MMSplice. It predicts the effect of genetic variants on cassette-exon splicing across 56 GTEx tissues. The cassette exon together with its flanking introns is fed into two parallel sequence towers whose outputs are combined into a per-tissue delta-logit-PSI splicing-effect vector. Please refer to the Training Details section for more information on the training process.
MTSplice is distributed as a deep four-member ensemble (mtsplice_deep0..3) and an earlier eight-member ensemble (mtsplice0..7). The architecture is shared across ensemble members.
Model Specification
| Num Blocks | Hidden Size | Num Tissues | Num Parameters (M) | FLOPs (M) | MACs (M) |
|---|---|---|---|---|---|
| 8 | 64 | 56 | 0.211 | 164.36 | 80.90 |
(Num Blocks is per tower; FLOPs and MACs measured on an 800 bp cassette-exon-with-flanks input.)
Links
- Code: multimolecule.mtsplice
- Data: GTEx cassette-exon PSI quantifications across 56 tissues
- Paper: MTSplice predicts effects of genetic variants on tissue-specific splicing
- Developed by: Jun Cheng, Muhammed Hasan Çelik, Anshul Kundaje, Julien Gagneur
- Model type: Two parallel dilated 1D CNN towers with positional B-spline re-weighting for tissue-specific delta-logit-PSI prediction
- Original Repository: gagneurlab/MMSplice_MTSplice
Usage
The model file depends on the multimolecule library. You can install it using pip:
pip install multimolecule
Direct Use
Tissue Scores
>>> import torch
>>> from multimolecule import RnaTokenizer, MtSpliceModel
>>> tokenizer = RnaTokenizer.from_pretrained("multimolecule/mtsplice")
>>> model = MtSpliceModel.from_pretrained("multimolecule/mtsplice")
>>> reference = tokenizer("AGCAGUCAUUAUGGCGAAUCUGGCAAGUA", return_tensors="pt")
>>> output = model(**reference)
>>> output["logits"].shape
torch.Size([1, 56])
Variant Effect
>>> import torch
>>> from multimolecule import RnaTokenizer, MtSpliceForSequencePrediction
>>> tokenizer = RnaTokenizer.from_pretrained("multimolecule/mtsplice")
>>> model = MtSpliceForSequencePrediction.from_pretrained("multimolecule/mtsplice")
>>> reference = tokenizer("AGCAGUCAUUAUGGCGAAUCUGGCAAGUA", return_tensors="pt")
>>> alternative = tokenizer("AGCAGUCAUUAUGGCUAAUCUGGCAAGUA", return_tensors="pt")
>>> output = model(
... reference["input_ids"],
... alternative_input_ids=alternative["input_ids"],
... )
>>> output["logits"].shape
torch.Size([1, 56])
Interface
- Input length: cassette exon with flanking intronic context (typical ~800 bp)
- Output (reference-only call,
input_ids/inputs_embeds): per-tissue score vectorlogitsof shape(batch_size, 56)
Variant Effect
- Reference + alternative call (also pass
alternative_input_ids/alternative_inputs_embeds): additionally returnsalternative_logitsand per-tissuedelta_logits = alternative_logits - logits MtSpliceForSequencePrediction: returns per-tissue deltas (or per-tissue scores when no alternative is supplied); applies standard regression loss when labels are provided
Training Details
MTSplice was trained to predict tissue-specific percent-spliced-in (PSI) of cassette exons across GTEx tissues, building on the MMSplice modular splicing model with an added tissue-specific neural module.
Training Data
MTSplice was trained on cassette-exon PSI quantifications across 56 GTEx tissues, together with the human reference splice-site and exon sequence context. The variant-effect predictions were validated against tissue-specific splicing quantitative trait loci (sQTL) and MPRA exon-skipping data.
Training Procedure
Pre-training
The two sequence towers consume one-hot encoded RNA. A dilated-convolution stack with positional B-spline re-weighting extracts splicing features, which a dense head maps to per-tissue delta-logit-PSI. The tissue-resolved predictions are formed from the reference/alternative score deltas.
Citation
@article{cheng2021mtsplice,
title = {MTSplice predicts effects of genetic variants on tissue-specific splicing},
author = {Cheng, Jun and {\c{C}}elik, Muhammed Hasan and Kundaje, Anshul and Gagneur, Julien},
journal = {Genome Biology},
volume = 22,
number = 1,
pages = {94},
year = 2021,
publisher = {Springer},
doi = {10.1186/s13059-021-02273-7}
}
The artifacts distributed in this repository are part of the MultiMolecule project. If MultiMolecule supports your research, please cite the MultiMolecule project as follows:
@software{chen_2024_12638419,
author = {Chen, Zhiyuan and Zhu, Sophia Y.},
title = {MultiMolecule},
doi = {10.5281/zenodo.12638419},
publisher = {Zenodo},
url = {https://doi.org/10.5281/zenodo.12638419},
year = 2024,
month = may,
day = 4
}
Contact
Please use GitHub issues of MultiMolecule for any questions or comments on the model card.
Please contact the authors of the MTSplice paper for questions or comments on the paper/model.
License
This model implementation is licensed under the GNU Affero General Public License.
For additional terms and clarifications, please refer to our License FAQ.
SPDX-License-Identifier: AGPL-3.0-or-later
- Downloads last month
- 54