Cargando…
Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa
Computer Coded Verbal Autopsy (CCVA) algorithms are commonly used to determine the cause of death (CoD) from questionnaire responses extracted from verbal autopsies (VAs). However, they can only operate on structured data and cannot effectively harness information from unstructured VA narratives. Ma...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9552851/ https://www.ncbi.nlm.nih.gov/pubmed/36238252 http://dx.doi.org/10.3389/fpubh.2022.990838 |
_version_ | 1784806339963256832 |
---|---|
author | Mapundu, Michael T. Kabudula, Chodziwadziwa W. Musenge, Eustasius Olago, Victor Celik, Turgay |
author_facet | Mapundu, Michael T. Kabudula, Chodziwadziwa W. Musenge, Eustasius Olago, Victor Celik, Turgay |
author_sort | Mapundu, Michael T. |
collection | PubMed |
description | Computer Coded Verbal Autopsy (CCVA) algorithms are commonly used to determine the cause of death (CoD) from questionnaire responses extracted from verbal autopsies (VAs). However, they can only operate on structured data and cannot effectively harness information from unstructured VA narratives. Machine Learning (ML) algorithms have also been applied successfully in determining the CoD from VA narratives, allowing the use of auxiliary information that CCVA algorithms cannot directly utilize. However, most ML-based studies only use responses from the structured questionnaire, and the results lack generalisability and comparability across studies. We present a comparative performance evaluation of ML methods and CCVA algorithms on South African VA narratives data, using data from Agincourt Health and Demographic Surveillance Site (HDSS) with physicians' classifications as the gold standard. The data were collected from 1993 to 2015 and have 16,338 cases. The random forest and extreme gradient boosting classifiers outperformed the other classifiers on the combined dataset, attaining accuracy of 96% respectively, with significant statistical differences in algorithmic performance (p < 0.0001). All our models attained Area Under Receiver Operating Characteristics (AUROC) of greater than 0.884. The InterVA CCVA attained 83% Cause Specific Mortality Fraction accuracy and an Overall Chance-Corrected Concordance of 0.36. We demonstrate that ML models could accurately determine the cause of death from VA narratives. Additionally, through mortality trends and pattern analysis, we discovered that in the first decade of the civil registration system in South Africa, the average life expectancy was approximately 50 years. However, in the second decade, life expectancy significantly dropped, and the population was dying at a much younger average age of 40 years, mostly from the leading HIV related causes. Interestingly, in the third decade, we see a gradual improvement in life expectancy, possibly attributed to effective health intervention programmes. Through a structure and semantic analysis of narratives where experts disagree, we also demonstrate the most frequent terms of traditional healer consultations and visits. The comparative approach also makes this study a baseline that can be used for future research enforcing generalization and comparability. Future study will entail exploring deep learning models for CoD classification. |
format | Online Article Text |
id | pubmed-9552851 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-95528512022-10-12 Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa Mapundu, Michael T. Kabudula, Chodziwadziwa W. Musenge, Eustasius Olago, Victor Celik, Turgay Front Public Health Public Health Computer Coded Verbal Autopsy (CCVA) algorithms are commonly used to determine the cause of death (CoD) from questionnaire responses extracted from verbal autopsies (VAs). However, they can only operate on structured data and cannot effectively harness information from unstructured VA narratives. Machine Learning (ML) algorithms have also been applied successfully in determining the CoD from VA narratives, allowing the use of auxiliary information that CCVA algorithms cannot directly utilize. However, most ML-based studies only use responses from the structured questionnaire, and the results lack generalisability and comparability across studies. We present a comparative performance evaluation of ML methods and CCVA algorithms on South African VA narratives data, using data from Agincourt Health and Demographic Surveillance Site (HDSS) with physicians' classifications as the gold standard. The data were collected from 1993 to 2015 and have 16,338 cases. The random forest and extreme gradient boosting classifiers outperformed the other classifiers on the combined dataset, attaining accuracy of 96% respectively, with significant statistical differences in algorithmic performance (p < 0.0001). All our models attained Area Under Receiver Operating Characteristics (AUROC) of greater than 0.884. The InterVA CCVA attained 83% Cause Specific Mortality Fraction accuracy and an Overall Chance-Corrected Concordance of 0.36. We demonstrate that ML models could accurately determine the cause of death from VA narratives. Additionally, through mortality trends and pattern analysis, we discovered that in the first decade of the civil registration system in South Africa, the average life expectancy was approximately 50 years. However, in the second decade, life expectancy significantly dropped, and the population was dying at a much younger average age of 40 years, mostly from the leading HIV related causes. Interestingly, in the third decade, we see a gradual improvement in life expectancy, possibly attributed to effective health intervention programmes. Through a structure and semantic analysis of narratives where experts disagree, we also demonstrate the most frequent terms of traditional healer consultations and visits. The comparative approach also makes this study a baseline that can be used for future research enforcing generalization and comparability. Future study will entail exploring deep learning models for CoD classification. Frontiers Media S.A. 2022-09-27 /pmc/articles/PMC9552851/ /pubmed/36238252 http://dx.doi.org/10.3389/fpubh.2022.990838 Text en Copyright © 2022 Mapundu, Kabudula, Musenge, Olago and Celik. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Public Health Mapundu, Michael T. Kabudula, Chodziwadziwa W. Musenge, Eustasius Olago, Victor Celik, Turgay Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa |
title | Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa |
title_full | Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa |
title_fullStr | Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa |
title_full_unstemmed | Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa |
title_short | Performance evaluation of machine learning and Computer Coded Verbal Autopsy (CCVA) algorithms for cause of death determination: A comparative analysis of data from rural South Africa |
title_sort | performance evaluation of machine learning and computer coded verbal autopsy (ccva) algorithms for cause of death determination: a comparative analysis of data from rural south africa |
topic | Public Health |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9552851/ https://www.ncbi.nlm.nih.gov/pubmed/36238252 http://dx.doi.org/10.3389/fpubh.2022.990838 |
work_keys_str_mv | AT mapundumichaelt performanceevaluationofmachinelearningandcomputercodedverbalautopsyccvaalgorithmsforcauseofdeathdeterminationacomparativeanalysisofdatafromruralsouthafrica AT kabudulachodziwadziwaw performanceevaluationofmachinelearningandcomputercodedverbalautopsyccvaalgorithmsforcauseofdeathdeterminationacomparativeanalysisofdatafromruralsouthafrica AT musengeeustasius performanceevaluationofmachinelearningandcomputercodedverbalautopsyccvaalgorithmsforcauseofdeathdeterminationacomparativeanalysisofdatafromruralsouthafrica AT olagovictor performanceevaluationofmachinelearningandcomputercodedverbalautopsyccvaalgorithmsforcauseofdeathdeterminationacomparativeanalysisofdatafromruralsouthafrica AT celikturgay performanceevaluationofmachinelearningandcomputercodedverbalautopsyccvaalgorithmsforcauseofdeathdeterminationacomparativeanalysisofdatafromruralsouthafrica |