Cargando…

Audio self-supervised learning: A survey

Similar to humans’ cognitive ability to generalize knowledge and skills, self-supervised learning (SSL) targets discovering general representations from large-scale data. This, through the use of pre-trained SSL models for downstream tasks, alleviates the need for human annotation, which is an expen...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Shuo, Mallol-Ragolta, Adria, Parada-Cabaleiro, Emilia, Qian, Kun, Jing, Xin, Kathan, Alexander, Hu, Bin, Schuller, Björn W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9768631/
https://www.ncbi.nlm.nih.gov/pubmed/36569546
http://dx.doi.org/10.1016/j.patter.2022.100616
_version_ 1784854213942050816
author Liu, Shuo
Mallol-Ragolta, Adria
Parada-Cabaleiro, Emilia
Qian, Kun
Jing, Xin
Kathan, Alexander
Hu, Bin
Schuller, Björn W.
author_facet Liu, Shuo
Mallol-Ragolta, Adria
Parada-Cabaleiro, Emilia
Qian, Kun
Jing, Xin
Kathan, Alexander
Hu, Bin
Schuller, Björn W.
author_sort Liu, Shuo
collection PubMed
description Similar to humans’ cognitive ability to generalize knowledge and skills, self-supervised learning (SSL) targets discovering general representations from large-scale data. This, through the use of pre-trained SSL models for downstream tasks, alleviates the need for human annotation, which is an expensive and time-consuming task. Its success in the fields of computer vision and natural language processing have prompted its recent adoption into the field of audio and speech processing. Comprehensive reviews summarizing the knowledge in audio SSL are currently missing. To fill this gap, we provide an overview of the SSL methods used for audio and speech processing applications. Herein, we also summarize the empirical works that exploit audio modality in multi-modal SSL frameworks and the existing suitable benchmarks to evaluate the power of SSL in the computer audition domain. Finally, we discuss some open problems and point out the future directions in the development of audio SSL.
format Online
Article
Text
id pubmed-9768631
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-97686312022-12-22 Audio self-supervised learning: A survey Liu, Shuo Mallol-Ragolta, Adria Parada-Cabaleiro, Emilia Qian, Kun Jing, Xin Kathan, Alexander Hu, Bin Schuller, Björn W. Patterns (N Y) Review Similar to humans’ cognitive ability to generalize knowledge and skills, self-supervised learning (SSL) targets discovering general representations from large-scale data. This, through the use of pre-trained SSL models for downstream tasks, alleviates the need for human annotation, which is an expensive and time-consuming task. Its success in the fields of computer vision and natural language processing have prompted its recent adoption into the field of audio and speech processing. Comprehensive reviews summarizing the knowledge in audio SSL are currently missing. To fill this gap, we provide an overview of the SSL methods used for audio and speech processing applications. Herein, we also summarize the empirical works that exploit audio modality in multi-modal SSL frameworks and the existing suitable benchmarks to evaluate the power of SSL in the computer audition domain. Finally, we discuss some open problems and point out the future directions in the development of audio SSL. Elsevier 2022-12-09 /pmc/articles/PMC9768631/ /pubmed/36569546 http://dx.doi.org/10.1016/j.patter.2022.100616 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Review
Liu, Shuo
Mallol-Ragolta, Adria
Parada-Cabaleiro, Emilia
Qian, Kun
Jing, Xin
Kathan, Alexander
Hu, Bin
Schuller, Björn W.
Audio self-supervised learning: A survey
title Audio self-supervised learning: A survey
title_full Audio self-supervised learning: A survey
title_fullStr Audio self-supervised learning: A survey
title_full_unstemmed Audio self-supervised learning: A survey
title_short Audio self-supervised learning: A survey
title_sort audio self-supervised learning: a survey
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9768631/
https://www.ncbi.nlm.nih.gov/pubmed/36569546
http://dx.doi.org/10.1016/j.patter.2022.100616
work_keys_str_mv AT liushuo audioselfsupervisedlearningasurvey
AT mallolragoltaadria audioselfsupervisedlearningasurvey
AT paradacabaleiroemilia audioselfsupervisedlearningasurvey
AT qiankun audioselfsupervisedlearningasurvey
AT jingxin audioselfsupervisedlearningasurvey
AT kathanalexander audioselfsupervisedlearningasurvey
AT hubin audioselfsupervisedlearningasurvey
AT schullerbjornw audioselfsupervisedlearningasurvey