Cargando…

Comparison of machine learning algorithms applied to symptoms to determine infectious causes of death in children: national survey of 18,000 verbal autopsies in the Million Death Study in India

BACKGROUND: Machine learning (ML) algorithms have been successfully employed for prediction of outcomes in clinical research. In this study, we have explored the application of ML-based algorithms to predict cause of death (CoD) from verbal autopsy records available through the Million Death Study (...

Descripción completa

Detalles Bibliográficos
Autores principales: Idicula-Thomas, Susan, Gawde, Ulka, Jha, Prabhat
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8488544/
https://www.ncbi.nlm.nih.gov/pubmed/34607591
http://dx.doi.org/10.1186/s12889-021-11829-y
_version_ 1784578192744382464
author Idicula-Thomas, Susan
Gawde, Ulka
Jha, Prabhat
author_facet Idicula-Thomas, Susan
Gawde, Ulka
Jha, Prabhat
author_sort Idicula-Thomas, Susan
collection PubMed
description BACKGROUND: Machine learning (ML) algorithms have been successfully employed for prediction of outcomes in clinical research. In this study, we have explored the application of ML-based algorithms to predict cause of death (CoD) from verbal autopsy records available through the Million Death Study (MDS). METHODS: From MDS, 18826 unique childhood deaths at ages 1–59 months during the time period 2004–13 were selected for generating the prediction models of which over 70% of deaths were caused by six infectious diseases (pneumonia, diarrhoeal diseases, malaria, fever of unknown origin, meningitis/encephalitis, and measles). Six popular ML-based algorithms such as support vector machine, gradient boosting modeling, C5.0, artificial neural network, k-nearest neighbor, classification and regression tree were used for building the CoD prediction models. RESULTS: SVM algorithm was the best performer with a prediction accuracy of over 0.8. The highest accuracy was found for diarrhoeal diseases (accuracy = 0.97) and the lowest was for meningitis/encephalitis (accuracy = 0.80). The top signs/symptoms for classification of these CoDs were also extracted for each of the diseases. A combination of signs/symptoms presented by the deceased individual can effectively lead to the CoD diagnosis. CONCLUSIONS: Overall, this study affirms that verbal autopsy tools are efficient in CoD diagnosis and that automated classification parameters captured through ML could be added to verbal autopsies to improve classification of causes of death. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12889-021-11829-y.
format Online
Article
Text
id pubmed-8488544
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-84885442021-10-04 Comparison of machine learning algorithms applied to symptoms to determine infectious causes of death in children: national survey of 18,000 verbal autopsies in the Million Death Study in India Idicula-Thomas, Susan Gawde, Ulka Jha, Prabhat BMC Public Health Research BACKGROUND: Machine learning (ML) algorithms have been successfully employed for prediction of outcomes in clinical research. In this study, we have explored the application of ML-based algorithms to predict cause of death (CoD) from verbal autopsy records available through the Million Death Study (MDS). METHODS: From MDS, 18826 unique childhood deaths at ages 1–59 months during the time period 2004–13 were selected for generating the prediction models of which over 70% of deaths were caused by six infectious diseases (pneumonia, diarrhoeal diseases, malaria, fever of unknown origin, meningitis/encephalitis, and measles). Six popular ML-based algorithms such as support vector machine, gradient boosting modeling, C5.0, artificial neural network, k-nearest neighbor, classification and regression tree were used for building the CoD prediction models. RESULTS: SVM algorithm was the best performer with a prediction accuracy of over 0.8. The highest accuracy was found for diarrhoeal diseases (accuracy = 0.97) and the lowest was for meningitis/encephalitis (accuracy = 0.80). The top signs/symptoms for classification of these CoDs were also extracted for each of the diseases. A combination of signs/symptoms presented by the deceased individual can effectively lead to the CoD diagnosis. CONCLUSIONS: Overall, this study affirms that verbal autopsy tools are efficient in CoD diagnosis and that automated classification parameters captured through ML could be added to verbal autopsies to improve classification of causes of death. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12889-021-11829-y. BioMed Central 2021-10-04 /pmc/articles/PMC8488544/ /pubmed/34607591 http://dx.doi.org/10.1186/s12889-021-11829-y Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Idicula-Thomas, Susan
Gawde, Ulka
Jha, Prabhat
Comparison of machine learning algorithms applied to symptoms to determine infectious causes of death in children: national survey of 18,000 verbal autopsies in the Million Death Study in India
title Comparison of machine learning algorithms applied to symptoms to determine infectious causes of death in children: national survey of 18,000 verbal autopsies in the Million Death Study in India
title_full Comparison of machine learning algorithms applied to symptoms to determine infectious causes of death in children: national survey of 18,000 verbal autopsies in the Million Death Study in India
title_fullStr Comparison of machine learning algorithms applied to symptoms to determine infectious causes of death in children: national survey of 18,000 verbal autopsies in the Million Death Study in India
title_full_unstemmed Comparison of machine learning algorithms applied to symptoms to determine infectious causes of death in children: national survey of 18,000 verbal autopsies in the Million Death Study in India
title_short Comparison of machine learning algorithms applied to symptoms to determine infectious causes of death in children: national survey of 18,000 verbal autopsies in the Million Death Study in India
title_sort comparison of machine learning algorithms applied to symptoms to determine infectious causes of death in children: national survey of 18,000 verbal autopsies in the million death study in india
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8488544/
https://www.ncbi.nlm.nih.gov/pubmed/34607591
http://dx.doi.org/10.1186/s12889-021-11829-y
work_keys_str_mv AT idiculathomassusan comparisonofmachinelearningalgorithmsappliedtosymptomstodetermineinfectiouscausesofdeathinchildrennationalsurveyof18000verbalautopsiesinthemilliondeathstudyinindia
AT gawdeulka comparisonofmachinelearningalgorithmsappliedtosymptomstodetermineinfectiouscausesofdeathinchildrennationalsurveyof18000verbalautopsiesinthemilliondeathstudyinindia
AT jhaprabhat comparisonofmachinelearningalgorithmsappliedtosymptomstodetermineinfectiouscausesofdeathinchildrennationalsurveyof18000verbalautopsiesinthemilliondeathstudyinindia