Cargando…

Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity

The sound of a person’s voice is commonly used to identify the speaker. The sound of speech is also starting to be used to detect medical conditions, such as depression. It is not known whether the manifestations of depression in speech overlap with those used to identify the speaker. In this paper,...

Descripción completa

Detalles Bibliográficos
Autores principales:	Dumpala, Sri Harsha, Dikaios, Katerina, Rodriguez, Sebastian, Langley, Ross, Rempel, Sheri, Uher, Rudolf, Oore, Sageev
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10333314/ https://www.ncbi.nlm.nih.gov/pubmed/37429935 http://dx.doi.org/10.1038/s41598-023-35184-7

_version_	1785070620624551936
author	Dumpala, Sri Harsha Dikaios, Katerina Rodriguez, Sebastian Langley, Ross Rempel, Sheri Uher, Rudolf Oore, Sageev
author_facet	Dumpala, Sri Harsha Dikaios, Katerina Rodriguez, Sebastian Langley, Ross Rempel, Sheri Uher, Rudolf Oore, Sageev
author_sort	Dumpala, Sri Harsha
collection	PubMed
description	The sound of a person’s voice is commonly used to identify the speaker. The sound of speech is also starting to be used to detect medical conditions, such as depression. It is not known whether the manifestations of depression in speech overlap with those used to identify the speaker. In this paper, we test the hypothesis that the representations of personal identity in speech, known as speaker embeddings, improve the detection of depression and estimation of depressive symptoms severity. We further examine whether changes in depression severity interfere with the recognition of speaker’s identity. We extract speaker embeddings from models pre-trained on a large sample of speakers from the general population without information on depression diagnosis. We test these speaker embeddings for severity estimation in independent datasets consisting of clinical interviews (DAIC-WOZ), spontaneous speech (VocalMind), and longitudinal data (VocalMind). We also use the severity estimates to predict presence of depression. Speaker embeddings, combined with established acoustic features (OpenSMILE), predicted severity with root mean square error (RMSE) values of 6.01 and 6.28 in DAIC-WOZ and VocalMind datasets, respectively, lower than acoustic features alone or speaker embeddings alone. When used to detect depression, speaker embeddings showed higher balanced accuracy (BAc) and surpassed previous state-of-the-art performance in depression detection from speech, with BAc values of 66% and 64% in DAIC-WOZ and VocalMind datasets, respectively. Results from a subset of participants with repeated speech samples show that the speaker identification is affected by changes in depression severity. These results suggest that depression overlaps with personal identity in the acoustic space. While speaker embeddings improve depression detection and severity estimation, deterioration or improvement in mood may interfere with speaker verification.
format	Online Article Text
id	pubmed-10333314
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-103333142023-07-12 Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity Dumpala, Sri Harsha Dikaios, Katerina Rodriguez, Sebastian Langley, Ross Rempel, Sheri Uher, Rudolf Oore, Sageev Sci Rep Article The sound of a person’s voice is commonly used to identify the speaker. The sound of speech is also starting to be used to detect medical conditions, such as depression. It is not known whether the manifestations of depression in speech overlap with those used to identify the speaker. In this paper, we test the hypothesis that the representations of personal identity in speech, known as speaker embeddings, improve the detection of depression and estimation of depressive symptoms severity. We further examine whether changes in depression severity interfere with the recognition of speaker’s identity. We extract speaker embeddings from models pre-trained on a large sample of speakers from the general population without information on depression diagnosis. We test these speaker embeddings for severity estimation in independent datasets consisting of clinical interviews (DAIC-WOZ), spontaneous speech (VocalMind), and longitudinal data (VocalMind). We also use the severity estimates to predict presence of depression. Speaker embeddings, combined with established acoustic features (OpenSMILE), predicted severity with root mean square error (RMSE) values of 6.01 and 6.28 in DAIC-WOZ and VocalMind datasets, respectively, lower than acoustic features alone or speaker embeddings alone. When used to detect depression, speaker embeddings showed higher balanced accuracy (BAc) and surpassed previous state-of-the-art performance in depression detection from speech, with BAc values of 66% and 64% in DAIC-WOZ and VocalMind datasets, respectively. Results from a subset of participants with repeated speech samples show that the speaker identification is affected by changes in depression severity. These results suggest that depression overlaps with personal identity in the acoustic space. While speaker embeddings improve depression detection and severity estimation, deterioration or improvement in mood may interfere with speaker verification. Nature Publishing Group UK 2023-07-10 /pmc/articles/PMC10333314/ /pubmed/37429935 http://dx.doi.org/10.1038/s41598-023-35184-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Dumpala, Sri Harsha Dikaios, Katerina Rodriguez, Sebastian Langley, Ross Rempel, Sheri Uher, Rudolf Oore, Sageev Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity
title	Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity
title_full	Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity
title_fullStr	Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity
title_full_unstemmed	Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity
title_short	Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity
title_sort	manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10333314/ https://www.ncbi.nlm.nih.gov/pubmed/37429935 http://dx.doi.org/10.1038/s41598-023-35184-7
work_keys_str_mv	AT dumpalasriharsha manifestationofdepressioninspeechoverlapswithcharacteristicsusedtorepresentandrecognizespeakeridentity AT dikaioskaterina manifestationofdepressioninspeechoverlapswithcharacteristicsusedtorepresentandrecognizespeakeridentity AT rodriguezsebastian manifestationofdepressioninspeechoverlapswithcharacteristicsusedtorepresentandrecognizespeakeridentity AT langleyross manifestationofdepressioninspeechoverlapswithcharacteristicsusedtorepresentandrecognizespeakeridentity AT rempelsheri manifestationofdepressioninspeechoverlapswithcharacteristicsusedtorepresentandrecognizespeakeridentity AT uherrudolf manifestationofdepressioninspeechoverlapswithcharacteristicsusedtorepresentandrecognizespeakeridentity AT ooresageev manifestationofdepressioninspeechoverlapswithcharacteristicsusedtorepresentandrecognizespeakeridentity

Manifestation of depression in speech overlaps with characteristics used to represent and recognize speaker identity

Ejemplares similares