Cargando…

The development of an automatic speech recognition model using interview data from long-term care for older adults

OBJECTIVE: In long-term care (LTC) for older adults, interviews are used to collect client perspectives that are often recorded and transcribed verbatim, which is a time-consuming, tedious task. Automatic speech recognition (ASR) could provide a solution; however, current ASR systems are not effecti...

Descripción completa

Detalles Bibliográficos
Autores principales: Hacking, Coen, Verbeek, Hilde, Hamers, Jan P H, Aarts, Sil
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9933064/
https://www.ncbi.nlm.nih.gov/pubmed/36495570
http://dx.doi.org/10.1093/jamia/ocac241
_version_ 1784889595090960384
author Hacking, Coen
Verbeek, Hilde
Hamers, Jan P H
Aarts, Sil
author_facet Hacking, Coen
Verbeek, Hilde
Hamers, Jan P H
Aarts, Sil
author_sort Hacking, Coen
collection PubMed
description OBJECTIVE: In long-term care (LTC) for older adults, interviews are used to collect client perspectives that are often recorded and transcribed verbatim, which is a time-consuming, tedious task. Automatic speech recognition (ASR) could provide a solution; however, current ASR systems are not effective for certain demographic groups. This study aims to show how data from specific groups, such as older adults or people with accents, can be used to develop an effective ASR. MATERIALS AND METHODS: An initial ASR model was developed using the Mozilla Common Voice dataset. Audio and transcript data (34 h) from interviews with residents, family, and care professionals on quality of care were used. Interview data were continuously processed to reduce the word error rate (WER). RESULTS: Due to background noise and mispronunciations, an initial ASR model had a WER of 48.3% on interview data. After finetuning using interview data, the average WER was reduced to 24.3%. When tested on speech data from the interviews, a median WER of 22.1% was achieved, with residents displaying the highest WER (22.7%). The resulting ASR model was at least 6 times faster than manual transcription. DISCUSSION: The current method decreased the WER substantially, verifying its efficacy. Moreover, using local transcription of audio can be beneficial to the privacy of participants. CONCLUSIONS: The current study shows that interview data from LTC for older adults can be effectively used to improve an ASR model. While the model output does still contain some errors, researchers reported that it saved much time during transcription.
format Online
Article
Text
id pubmed-9933064
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-99330642023-02-17 The development of an automatic speech recognition model using interview data from long-term care for older adults Hacking, Coen Verbeek, Hilde Hamers, Jan P H Aarts, Sil J Am Med Inform Assoc Research and Applications OBJECTIVE: In long-term care (LTC) for older adults, interviews are used to collect client perspectives that are often recorded and transcribed verbatim, which is a time-consuming, tedious task. Automatic speech recognition (ASR) could provide a solution; however, current ASR systems are not effective for certain demographic groups. This study aims to show how data from specific groups, such as older adults or people with accents, can be used to develop an effective ASR. MATERIALS AND METHODS: An initial ASR model was developed using the Mozilla Common Voice dataset. Audio and transcript data (34 h) from interviews with residents, family, and care professionals on quality of care were used. Interview data were continuously processed to reduce the word error rate (WER). RESULTS: Due to background noise and mispronunciations, an initial ASR model had a WER of 48.3% on interview data. After finetuning using interview data, the average WER was reduced to 24.3%. When tested on speech data from the interviews, a median WER of 22.1% was achieved, with residents displaying the highest WER (22.7%). The resulting ASR model was at least 6 times faster than manual transcription. DISCUSSION: The current method decreased the WER substantially, verifying its efficacy. Moreover, using local transcription of audio can be beneficial to the privacy of participants. CONCLUSIONS: The current study shows that interview data from LTC for older adults can be effectively used to improve an ASR model. While the model output does still contain some errors, researchers reported that it saved much time during transcription. Oxford University Press 2022-12-10 /pmc/articles/PMC9933064/ /pubmed/36495570 http://dx.doi.org/10.1093/jamia/ocac241 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research and Applications
Hacking, Coen
Verbeek, Hilde
Hamers, Jan P H
Aarts, Sil
The development of an automatic speech recognition model using interview data from long-term care for older adults
title The development of an automatic speech recognition model using interview data from long-term care for older adults
title_full The development of an automatic speech recognition model using interview data from long-term care for older adults
title_fullStr The development of an automatic speech recognition model using interview data from long-term care for older adults
title_full_unstemmed The development of an automatic speech recognition model using interview data from long-term care for older adults
title_short The development of an automatic speech recognition model using interview data from long-term care for older adults
title_sort development of an automatic speech recognition model using interview data from long-term care for older adults
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9933064/
https://www.ncbi.nlm.nih.gov/pubmed/36495570
http://dx.doi.org/10.1093/jamia/ocac241
work_keys_str_mv AT hackingcoen thedevelopmentofanautomaticspeechrecognitionmodelusinginterviewdatafromlongtermcareforolderadults
AT verbeekhilde thedevelopmentofanautomaticspeechrecognitionmodelusinginterviewdatafromlongtermcareforolderadults
AT hamersjanph thedevelopmentofanautomaticspeechrecognitionmodelusinginterviewdatafromlongtermcareforolderadults
AT aartssil thedevelopmentofanautomaticspeechrecognitionmodelusinginterviewdatafromlongtermcareforolderadults
AT hackingcoen developmentofanautomaticspeechrecognitionmodelusinginterviewdatafromlongtermcareforolderadults
AT verbeekhilde developmentofanautomaticspeechrecognitionmodelusinginterviewdatafromlongtermcareforolderadults
AT hamersjanph developmentofanautomaticspeechrecognitionmodelusinginterviewdatafromlongtermcareforolderadults
AT aartssil developmentofanautomaticspeechrecognitionmodelusinginterviewdatafromlongtermcareforolderadults