Papers
arxiv:2507.14129

OpenBEATs: A Fully Open-Source General-Purpose Audio Encoder

Published on Jul 18
· Submitted by Shikhar Bharadwaj on Jul 21
Authors:
,
,
,
,
,

Abstract

OpenBEATs, an open-source framework extending BEATs with multi-domain audio pre-training, achieves state-of-the-art performance on various audio tasks with a smaller parameter size compared to larger models.

AI-generated summary

Masked token prediction has emerged as a powerful pre-training objective across language, vision, and speech, offering the potential to unify these diverse modalities through a single pre-training task. However, its application for general audio understanding remains underexplored, with BEATs being the only notable example. BEATs has seen limited modifications due to the absence of open-source pre-training code. Furthermore, BEATs was trained only on AudioSet, restricting its broader downstream applicability. To address these gaps, we present OpenBEATs, an open-source framework that extends BEATs via multi-domain audio pre-training. We conduct comprehensive evaluations across six types of tasks, twenty five datasets, and three audio domains, including audio reasoning tasks such as audio question answering, entailment, and captioning. OpenBEATs achieves state-of-the-art performance on six bioacoustics datasets, two environmental sound datasets and five reasoning datasets, performing better than models exceeding a billion parameters at one-fourth their parameter size. These results demonstrate the effectiveness of multi-domain datasets and masked token prediction task to learn general-purpose audio representations. To promote further research and reproducibility, we release all pre-training and evaluation code, pretrained and fine-tuned checkpoints, and training logs at https://shikhar-s.github.io/OpenBEATs

Community

Paper author Paper submitter
This comment has been hidden (marked as Resolved)

Sign up or log in to comment

Models citing this paper 74

Browse 74 models citing this paper

Datasets citing this paper 1

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2507.14129 in a Space README.md to link it from this page.

Collections including this paper 3