DiscoPhon: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units
Paper • 2603.18612 • Published
HuBERT MMS-ulab is a HuBERT base model pretrained on the Segmented MMS ulab v2 dataset
for the DiscoPhon benchmark.
It was pretrained using the minimal_hubert library.
You can load it with HuggingFace Transformers:
from transformers import HubertModel
model = HubertModel.from_pretrained("coml/hubert-base-mmsulab")
Or with minimal_hubert:
from minimal_hubert import HuBERT, HuBERTPretrain
# Standard model
model = HuBERT.from_pretrained("coml/hubert-base-mmsulab")
# With pretraining head for classification
model_for_pretraining = HuBERTPretrain.from_pretrained("https://huggingface.co/coml/hubert-base-mmsulab/resolve/main/it2.pt")
Check out minimal_hubert if you are interested in pretraining or want
to load HuBERT checkpoints from different libraries.
model.safetensors and config.json: HuggingFace Transformers checkpoint and config.it1.pt: 1st iteration checkpoint.it2.pt: 2nd iteration checkpoint. Converted to HuggingFace state_dict to get model.safetensors.km100-mfcc.joblib: K-means trained on MFCCs of MMS-ulab. Used to train the 1st iteration.km500-it1-l9.joblib: K-means trained on features from the 9th layer of the 1st iteration model. Used to train the 2nd iteration.km256-it2-l12.joblib: K-means trained on features from the 12th layer of the 2nd iteration model. Used for DiscoPhon finetuning.Please cite the DiscoPhon paper
@misc{poli2026discophon,
title={{DiscoPhon}: Benchmarking the Unsupervised Discovery of Phoneme Inventories With Discrete Speech Units},
author={Maxime Poli and Manel Khentout and Angelo Ortiz Tandazo and Ewan Dunbar and Emmanuel Chemla and Emmanuel Dupoux},
year={2026},
eprint={2603.18612},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2603.18612},
}
along with XEUS and MMS to reference the Segmented MMS ulab v2 dataset.
Quantized