Cargando…
Audio self-supervised learning: A survey
Similar to humans’ cognitive ability to generalize knowledge and skills, self-supervised learning (SSL) targets discovering general representations from large-scale data. This, through the use of pre-trained SSL models for downstream tasks, alleviates the need for human annotation, which is an expen...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9768631/ https://www.ncbi.nlm.nih.gov/pubmed/36569546 http://dx.doi.org/10.1016/j.patter.2022.100616 |
_version_ | 1784854213942050816 |
---|---|
author | Liu, Shuo Mallol-Ragolta, Adria Parada-Cabaleiro, Emilia Qian, Kun Jing, Xin Kathan, Alexander Hu, Bin Schuller, Björn W. |
author_facet | Liu, Shuo Mallol-Ragolta, Adria Parada-Cabaleiro, Emilia Qian, Kun Jing, Xin Kathan, Alexander Hu, Bin Schuller, Björn W. |
author_sort | Liu, Shuo |
collection | PubMed |
description | Similar to humans’ cognitive ability to generalize knowledge and skills, self-supervised learning (SSL) targets discovering general representations from large-scale data. This, through the use of pre-trained SSL models for downstream tasks, alleviates the need for human annotation, which is an expensive and time-consuming task. Its success in the fields of computer vision and natural language processing have prompted its recent adoption into the field of audio and speech processing. Comprehensive reviews summarizing the knowledge in audio SSL are currently missing. To fill this gap, we provide an overview of the SSL methods used for audio and speech processing applications. Herein, we also summarize the empirical works that exploit audio modality in multi-modal SSL frameworks and the existing suitable benchmarks to evaluate the power of SSL in the computer audition domain. Finally, we discuss some open problems and point out the future directions in the development of audio SSL. |
format | Online Article Text |
id | pubmed-9768631 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-97686312022-12-22 Audio self-supervised learning: A survey Liu, Shuo Mallol-Ragolta, Adria Parada-Cabaleiro, Emilia Qian, Kun Jing, Xin Kathan, Alexander Hu, Bin Schuller, Björn W. Patterns (N Y) Review Similar to humans’ cognitive ability to generalize knowledge and skills, self-supervised learning (SSL) targets discovering general representations from large-scale data. This, through the use of pre-trained SSL models for downstream tasks, alleviates the need for human annotation, which is an expensive and time-consuming task. Its success in the fields of computer vision and natural language processing have prompted its recent adoption into the field of audio and speech processing. Comprehensive reviews summarizing the knowledge in audio SSL are currently missing. To fill this gap, we provide an overview of the SSL methods used for audio and speech processing applications. Herein, we also summarize the empirical works that exploit audio modality in multi-modal SSL frameworks and the existing suitable benchmarks to evaluate the power of SSL in the computer audition domain. Finally, we discuss some open problems and point out the future directions in the development of audio SSL. Elsevier 2022-12-09 /pmc/articles/PMC9768631/ /pubmed/36569546 http://dx.doi.org/10.1016/j.patter.2022.100616 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Review Liu, Shuo Mallol-Ragolta, Adria Parada-Cabaleiro, Emilia Qian, Kun Jing, Xin Kathan, Alexander Hu, Bin Schuller, Björn W. Audio self-supervised learning: A survey |
title | Audio self-supervised learning: A survey |
title_full | Audio self-supervised learning: A survey |
title_fullStr | Audio self-supervised learning: A survey |
title_full_unstemmed | Audio self-supervised learning: A survey |
title_short | Audio self-supervised learning: A survey |
title_sort | audio self-supervised learning: a survey |
topic | Review |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9768631/ https://www.ncbi.nlm.nih.gov/pubmed/36569546 http://dx.doi.org/10.1016/j.patter.2022.100616 |
work_keys_str_mv | AT liushuo audioselfsupervisedlearningasurvey AT mallolragoltaadria audioselfsupervisedlearningasurvey AT paradacabaleiroemilia audioselfsupervisedlearningasurvey AT qiankun audioselfsupervisedlearningasurvey AT jingxin audioselfsupervisedlearningasurvey AT kathanalexander audioselfsupervisedlearningasurvey AT hubin audioselfsupervisedlearningasurvey AT schullerbjornw audioselfsupervisedlearningasurvey |