BioReason-Pro
Collection
7 items • Updated • 1
GO-GPT is a decoder-only transformer model for predicting Gene Ontology (GO) terms from protein sequences. It combines ESM2 protein language model embeddings with an autoregressive decoder to generate GO term annotations across all three ontology aspects: Molecular Function (MF), Biological Process (BP), and Cellular Component (CC).
Unlike discriminative methods, GO-GPT treats GO prediction as a sequence generation task, capturing hierarchical and cross-aspect dependencies to achieve state-of-the-art weighted F_max of 0.65-0.70.
| Component | Description |
|---|---|
| Protein Encoder | ESM2-3B (facebook/esm2_t36_3B_UR50D) |
| Decoder | 12-layer GPT with prefix causal attention |
| Total Parameters | ~3.2B (3B ESM2 + 200M decoder) |
Training data: wanglab/gogpt-training-data
Code: github.com/bowang-lab/BioReason-Pro/gogpt
If you find this work useful, please cite our papers:
@article {Fallahpour2026.03.19.712954,
author = {Fallahpour, Adibvafa and Seyed-Ahmadi, Arman and Idehpour, Parsa and Ibrahim, Omar and Gupta, Purav and Naimer, Jack and Zhu, Kevin and Shah, Arnav and Ma, Shihao and Adduri, Abhinav and G{\"u}loglu, Talu and Liu, Nuo and Cui, Haotian and Jain, Arihant and de Castro, Max and Fallahpour, Amirfaham and Cembellin-Prieto, Antonio and Stiles, John S. and Nem{\v c}ko, Filip and Nevue, Alexander A. and Moon, Hyungseok C. and Sosnick, Lucas and Markham, Olivia and Duan, Haonan and Lee, Michelle Y. Y. and Salvador, Andrea F. M. and Maddison, Chris J. and Thaiss, Christoph A. and Ricci-Tam, Chiara and Plosky, Brian S. and Burke, Dave P. and Hsu, Patrick D. and Goodarzi, Hani and Wang, Bo},
title = {BioReason-Pro: Advancing Protein Function Prediction with Multimodal Biological Reasoning},
elocation-id = {2026.03.19.712954},
year = {2026},
doi = {10.64898/2026.03.19.712954},
publisher = {Cold Spring Harbor Laboratory},
URL = {https://www.biorxiv.org/content/early/2026/03/20/2026.03.19.712954},
eprint = {https://www.biorxiv.org/content/early/2026/03/20/2026.03.19.712954.full.pdf},
journal = {bioRxiv}
}
@misc{fallahpour2025bioreasonincentivizingmultimodalbiological,
title={BioReason: Incentivizing Multimodal Biological Reasoning within a DNA-LLM Model},
author={Adibvafa Fallahpour and Andrew Magnuson and Purav Gupta and Shihao Ma and Jack Naimer and Arnav Shah and Haonan Duan and Omar Ibrahim and Hani Goodarzi and Chris J. Maddison and Bo Wang},
year={2025},
eprint={2505.23579},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2505.23579},
}