Cargando…

Detecting Screams From Home Audio Recordings to Identify Tantrums: Exploratory Study Using Transfer Machine Learning

BACKGROUND: Qualitative self- or parent-reports used in assessing children’s behavioral disorders are often inconvenient to collect and can be misleading due to missing information, rater biases, and limited validity. A data-driven approach to quantify behavioral disorders could alleviate these conc...

Descripción completa

Detalles Bibliográficos
Autores principales: O'Donovan, Rebecca, Sezgin, Emre, Bambach, Sven, Butter, Eric, Lin, Simon
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7327591/
https://www.ncbi.nlm.nih.gov/pubmed/32459656
http://dx.doi.org/10.2196/18279
_version_ 1783552576009011200
author O'Donovan, Rebecca
Sezgin, Emre
Bambach, Sven
Butter, Eric
Lin, Simon
author_facet O'Donovan, Rebecca
Sezgin, Emre
Bambach, Sven
Butter, Eric
Lin, Simon
author_sort O'Donovan, Rebecca
collection PubMed
description BACKGROUND: Qualitative self- or parent-reports used in assessing children’s behavioral disorders are often inconvenient to collect and can be misleading due to missing information, rater biases, and limited validity. A data-driven approach to quantify behavioral disorders could alleviate these concerns. This study proposes a machine learning approach to identify screams in voice recordings that avoids the need to gather large amounts of clinical data for model training. OBJECTIVE: The goal of this study is to evaluate if a machine learning model trained only on publicly available audio data sets could be used to detect screaming sounds in audio streams captured in an at-home setting. METHODS: Two sets of audio samples were prepared to evaluate the model: a subset of the publicly available AudioSet data set and a set of audio data extracted from the TV show Supernanny, which was chosen for its similarity to clinical data. Scream events were manually annotated for the Supernanny data, and existing annotations were refined for the AudioSet data. Audio feature extraction was performed with a convolutional neural network pretrained on AudioSet. A gradient-boosted tree model was trained and cross-validated for scream classification on the AudioSet data and then validated independently on the Supernanny audio. RESULTS: On the held-out AudioSet clips, the model achieved a receiver operating characteristic (ROC)–area under the curve (AUC) of 0.86. The same model applied to three full episodes of Supernanny audio achieved an ROC-AUC of 0.95 and an average precision (positive predictive value) of 42% despite screams only making up 1.3% (n=92/7166 seconds) of the total run time. CONCLUSIONS: These results suggest that a scream-detection model trained with publicly available data could be valuable for monitoring clinical recordings and identifying tantrums as opposed to depending on collecting costly privacy-protected clinical data for model training.
format Online
Article
Text
id pubmed-7327591
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-73275912020-07-06 Detecting Screams From Home Audio Recordings to Identify Tantrums: Exploratory Study Using Transfer Machine Learning O'Donovan, Rebecca Sezgin, Emre Bambach, Sven Butter, Eric Lin, Simon JMIR Form Res Original Paper BACKGROUND: Qualitative self- or parent-reports used in assessing children’s behavioral disorders are often inconvenient to collect and can be misleading due to missing information, rater biases, and limited validity. A data-driven approach to quantify behavioral disorders could alleviate these concerns. This study proposes a machine learning approach to identify screams in voice recordings that avoids the need to gather large amounts of clinical data for model training. OBJECTIVE: The goal of this study is to evaluate if a machine learning model trained only on publicly available audio data sets could be used to detect screaming sounds in audio streams captured in an at-home setting. METHODS: Two sets of audio samples were prepared to evaluate the model: a subset of the publicly available AudioSet data set and a set of audio data extracted from the TV show Supernanny, which was chosen for its similarity to clinical data. Scream events were manually annotated for the Supernanny data, and existing annotations were refined for the AudioSet data. Audio feature extraction was performed with a convolutional neural network pretrained on AudioSet. A gradient-boosted tree model was trained and cross-validated for scream classification on the AudioSet data and then validated independently on the Supernanny audio. RESULTS: On the held-out AudioSet clips, the model achieved a receiver operating characteristic (ROC)–area under the curve (AUC) of 0.86. The same model applied to three full episodes of Supernanny audio achieved an ROC-AUC of 0.95 and an average precision (positive predictive value) of 42% despite screams only making up 1.3% (n=92/7166 seconds) of the total run time. CONCLUSIONS: These results suggest that a scream-detection model trained with publicly available data could be valuable for monitoring clinical recordings and identifying tantrums as opposed to depending on collecting costly privacy-protected clinical data for model training. JMIR Publications 2020-06-16 /pmc/articles/PMC7327591/ /pubmed/32459656 http://dx.doi.org/10.2196/18279 Text en ©Rebecca O'Donovan, Emre Sezgin, Sven Bambach, Eric Butter, Simon Lin. Originally published in JMIR Formative Research (http://formative.jmir.org), 16.06.2020. https://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Formative Research, is properly cited. The complete bibliographic information, a link to the original publication on http://formative.jmir.org, as well as this copyright and license information must be included.
spellingShingle Original Paper
O'Donovan, Rebecca
Sezgin, Emre
Bambach, Sven
Butter, Eric
Lin, Simon
Detecting Screams From Home Audio Recordings to Identify Tantrums: Exploratory Study Using Transfer Machine Learning
title Detecting Screams From Home Audio Recordings to Identify Tantrums: Exploratory Study Using Transfer Machine Learning
title_full Detecting Screams From Home Audio Recordings to Identify Tantrums: Exploratory Study Using Transfer Machine Learning
title_fullStr Detecting Screams From Home Audio Recordings to Identify Tantrums: Exploratory Study Using Transfer Machine Learning
title_full_unstemmed Detecting Screams From Home Audio Recordings to Identify Tantrums: Exploratory Study Using Transfer Machine Learning
title_short Detecting Screams From Home Audio Recordings to Identify Tantrums: Exploratory Study Using Transfer Machine Learning
title_sort detecting screams from home audio recordings to identify tantrums: exploratory study using transfer machine learning
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7327591/
https://www.ncbi.nlm.nih.gov/pubmed/32459656
http://dx.doi.org/10.2196/18279
work_keys_str_mv AT odonovanrebecca detectingscreamsfromhomeaudiorecordingstoidentifytantrumsexploratorystudyusingtransfermachinelearning
AT sezginemre detectingscreamsfromhomeaudiorecordingstoidentifytantrumsexploratorystudyusingtransfermachinelearning
AT bambachsven detectingscreamsfromhomeaudiorecordingstoidentifytantrumsexploratorystudyusingtransfermachinelearning
AT buttereric detectingscreamsfromhomeaudiorecordingstoidentifytantrumsexploratorystudyusingtransfermachinelearning
AT linsimon detectingscreamsfromhomeaudiorecordingstoidentifytantrumsexploratorystudyusingtransfermachinelearning