the first OPEn Respiratory Acoustic foundation model pretraining and benchmarking system

OPERA is a system that curates large-scale unlabelled respiratory audio datasets to pretrain audio encoders that are generalizable to be adapted for various health tasks with limited labeled data.

OPERA system allows us to:

Curate a unique large-scale(~136K samples, 400+ hours), multi-source (5 datasets), multi-modal (breathing, coughing, and lung sounds) and publicly available (or under controlled access) dataset for model pretraining.
Pretrain 3 generalizable acoustic models with the curated unlabeled data using contrastive learning and generative pretraining, and release the model checkpoints.
Employ 10 labeled datasets (6 not covered by pretraining) to formulate 19 respiratory health tasks, ensuring fair, comprehensive and reproducible downstream evaluation.
Enable researchers and developers to extract feature using our model, or develop new models with our data and system, as a starting point for future exploration.

Paper · Blog · GitHub Repository

Supported by

cam erc

The study has been approved by the Ethics Committee of the Department of Computer Science and Technology, University of Cambridge, and is partly funded by the European Research Council through Project EAR and by the Engineering and Physical Sciences Research Council through Project RELOAD.

Publications

Zhang Y, Xia T, Han J, Wu Y, Rizos G, Liu Y, Mosuily M, Chauhan J, Mascolo C. Towards open respiratory acoustic foundation models: Pretraining and benchmarking. In Thirty-eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2024.
Zhang Y, Xia T, Saeed A, Mascolo C. RespLLM: Unifying Audio and Text with Multimodal LLMs for Generalized Respiratory Health Prediction. In Machine Learning for Health, 2024, PMLR.

Supported by

Publications#

Publications