Cargando…

Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa

Computer Coded Verbal Autopsy (CCVA) algorithms are commonly used to determine the cause of death (CoD) from questionnaire responses extracted from verbal autopsies (VAs). However, they can only operate on structured data and cannot effectively harness information from unstructured VA narratives. Ma...

Descripción completa

Detalles Bibliográficos
Autores principales: Mapundu, Michael T., Kabudula, Chodziwadziwa W., Musenge, Eustasius, Olago, Victor, Celik, Turgay
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9552851/
https://www.ncbi.nlm.nih.gov/pubmed/36238252
http://dx.doi.org/10.3389/fpubh.2022.990838
_version_ 1784806339963256832
author Mapundu, Michael T.
Kabudula, Chodziwadziwa W.
Musenge, Eustasius
Olago, Victor
Celik, Turgay
author_facet Mapundu, Michael T.
Kabudula, Chodziwadziwa W.
Musenge, Eustasius
Olago, Victor
Celik, Turgay
author_sort Mapundu, Michael T.
collection PubMed
description Computer Coded Verbal Autopsy (CCVA) algorithms are commonly used to determine the cause of death (CoD) from questionnaire responses extracted from verbal autopsies (VAs). However, they can only operate on structured data and cannot effectively harness information from unstructured VA narratives. Machine Learning (ML) algorithms have also been applied successfully in determining the CoD from VA narratives, allowing the use of auxiliary information that CCVA algorithms cannot directly utilize. However, most ML-based studies only use responses from the structured questionnaire, and the results lack generalisability and comparability across studies. We present a comparative performance evaluation of ML methods and CCVA algorithms on South African VA narratives data, using data from Agincourt Health and Demographic Surveillance Site (HDSS) with physicians' classifications as the gold standard. The data were collected from 1993 to 2015 and have 16,338 cases. The random forest and extreme gradient boosting classifiers outperformed the other classifiers on the combined dataset, attaining accuracy of 96% respectively, with significant statistical differences in algorithmic performance (p < 0.0001). All our models attained Area Under Receiver Operating Characteristics (AUROC) of greater than 0.884. The InterVA CCVA attained 83% Cause Specific Mortality Fraction accuracy and an Overall Chance-Corrected Concordance of 0.36. We demonstrate that ML models could accurately determine the cause of death from VA narratives. Additionally, through mortality trends and pattern analysis, we discovered that in the first decade of the civil registration system in South Africa, the average life expectancy was approximately 50 years. However, in the second decade, life expectancy significantly dropped, and the population was dying at a much younger average age of 40 years, mostly from the leading HIV related causes. Interestingly, in the third decade, we see a gradual improvement in life expectancy, possibly attributed to effective health intervention programmes. Through a structure and semantic analysis of narratives where experts disagree, we also demonstrate the most frequent terms of traditional healer consultations and visits. The comparative approach also makes this study a baseline that can be used for future research enforcing generalization and comparability. Future study will entail exploring deep learning models for CoD classification.
format Online
Article
Text
id pubmed-9552851
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-95528512022-10-12 Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa Mapundu, Michael T. Kabudula, Chodziwadziwa W. Musenge, Eustasius Olago, Victor Celik, Turgay Front Public Health Public Health Computer Coded Verbal Autopsy (CCVA) algorithms are commonly used to determine the cause of death (CoD) from questionnaire responses extracted from verbal autopsies (VAs). However, they can only operate on structured data and cannot effectively harness information from unstructured VA narratives. Machine Learning (ML) algorithms have also been applied successfully in determining the CoD from VA narratives, allowing the use of auxiliary information that CCVA algorithms cannot directly utilize. However, most ML-based studies only use responses from the structured questionnaire, and the results lack generalisability and comparability across studies. We present a comparative performance evaluation of ML methods and CCVA algorithms on South African VA narratives data, using data from Agincourt Health and Demographic Surveillance Site (HDSS) with physicians' classifications as the gold standard. The data were collected from 1993 to 2015 and have 16,338 cases. The random forest and extreme gradient boosting classifiers outperformed the other classifiers on the combined dataset, attaining accuracy of 96% respectively, with significant statistical differences in algorithmic performance (p < 0.0001). All our models attained Area Under Receiver Operating Characteristics (AUROC) of greater than 0.884. The InterVA CCVA attained 83% Cause Specific Mortality Fraction accuracy and an Overall Chance-Corrected Concordance of 0.36. We demonstrate that ML models could accurately determine the cause of death from VA narratives. Additionally, through mortality trends and pattern analysis, we discovered that in the first decade of the civil registration system in South Africa, the average life expectancy was approximately 50 years. However, in the second decade, life expectancy significantly dropped, and the population was dying at a much younger average age of 40 years, mostly from the leading HIV related causes. Interestingly, in the third decade, we see a gradual improvement in life expectancy, possibly attributed to effective health intervention programmes. Through a structure and semantic analysis of narratives where experts disagree, we also demonstrate the most frequent terms of traditional healer consultations and visits. The comparative approach also makes this study a baseline that can be used for future research enforcing generalization and comparability. Future study will entail exploring deep learning models for CoD classification. Frontiers Media S.A. 2022-09-27 /pmc/articles/PMC9552851/ /pubmed/36238252 http://dx.doi.org/10.3389/fpubh.2022.990838 Text en Copyright © 2022 Mapundu, Kabudula, Musenge, Olago and Celik. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Public Health
Mapundu, Michael T.
Kabudula, Chodziwadziwa W.
Musenge, Eustasius
Olago, Victor
Celik, Turgay
Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa
title Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa
title_full Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa
title_fullStr Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa
title_full_unstemmed Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa
title_short Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa
title_sort performance evaluation of machine learning and computer coded verbal autopsy (ccva) algorithms for cause of death determination: a comparative analysis of data from rural south africa
topic Public Health
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9552851/
https://www.ncbi.nlm.nih.gov/pubmed/36238252
http://dx.doi.org/10.3389/fpubh.2022.990838
work_keys_str_mv AT mapundumichaelt performanceevaluationofmachinelearningandcomputercodedverbalautopsyccvaalgorithmsforcauseofdeathdeterminationacomparativeanalysisofdatafromruralsouthafrica
AT kabudulachodziwadziwaw performanceevaluationofmachinelearningandcomputercodedverbalautopsyccvaalgorithmsforcauseofdeathdeterminationacomparativeanalysisofdatafromruralsouthafrica
AT musengeeustasius performanceevaluationofmachinelearningandcomputercodedverbalautopsyccvaalgorithmsforcauseofdeathdeterminationacomparativeanalysisofdatafromruralsouthafrica
AT olagovictor performanceevaluationofmachinelearningandcomputercodedverbalautopsyccvaalgorithmsforcauseofdeathdeterminationacomparativeanalysisofdatafromruralsouthafrica
AT celikturgay performanceevaluationofmachinelearningandcomputercodedverbalautopsyccvaalgorithmsforcauseofdeathdeterminationacomparativeanalysisofdatafromruralsouthafrica