Cargando…

Automatically determining cause of death from verbal autopsy narratives

BACKGROUND: A verbal autopsy (VA) is a post-hoc written interview report of the symptoms preceding a person’s death in cases where no official cause of death (CoD) was determined by a physician. Current leading automated VA coding methods primarily use structured data from VAs to assign a CoD catego...

Descripción completa

Detalles Bibliográficos
Autores principales: Jeblee, Serena, Gomes, Mireille, Jha, Prabhat, Rudzicz, Frank, Hirst, Graeme
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6617656/
https://www.ncbi.nlm.nih.gov/pubmed/31288814
http://dx.doi.org/10.1186/s12911-019-0841-9
_version_ 1783433740635078656
author Jeblee, Serena
Gomes, Mireille
Jha, Prabhat
Rudzicz, Frank
Hirst, Graeme
author_facet Jeblee, Serena
Gomes, Mireille
Jha, Prabhat
Rudzicz, Frank
Hirst, Graeme
author_sort Jeblee, Serena
collection PubMed
description BACKGROUND: A verbal autopsy (VA) is a post-hoc written interview report of the symptoms preceding a person’s death in cases where no official cause of death (CoD) was determined by a physician. Current leading automated VA coding methods primarily use structured data from VAs to assign a CoD category. We present a method to automatically determine CoD categories from VA free-text narratives alone. METHODS: After preprocessing and spelling correction, our method extracts word frequency counts from the narratives and uses them as input to four different machine learning classifiers: naïve Bayes, random forest, support vector machines, and a neural network. RESULTS: For individual CoD classification, our best classifier achieves a sensitivity of.770 for adult deaths for 15 CoD categories (as compared to the current best reported sensitivity of.57), and.662 with 48 WHO categories. When predicting the CoD distribution at the population level, our best classifier achieves.962 cause-specific mortality fraction accuracy for 15 categories and.908 for 48 categories, which is on par with leading CoD distribution estimation methods. CONCLUSIONS: Our narrative-based machine learning classifier performs as well as classifiers based on structured data at the individual level. Moreover, our method demonstrates that VA narratives provide important information that can be used by a machine learning system for automated CoD classification. Unlike the structured questionnaire-based methods, this method can be applied to any verbal autopsy dataset, regardless of the collection process or country of origin. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-019-0841-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6617656
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-66176562019-07-18 Automatically determining cause of death from verbal autopsy narratives Jeblee, Serena Gomes, Mireille Jha, Prabhat Rudzicz, Frank Hirst, Graeme BMC Med Inform Decis Mak Research Article BACKGROUND: A verbal autopsy (VA) is a post-hoc written interview report of the symptoms preceding a person’s death in cases where no official cause of death (CoD) was determined by a physician. Current leading automated VA coding methods primarily use structured data from VAs to assign a CoD category. We present a method to automatically determine CoD categories from VA free-text narratives alone. METHODS: After preprocessing and spelling correction, our method extracts word frequency counts from the narratives and uses them as input to four different machine learning classifiers: naïve Bayes, random forest, support vector machines, and a neural network. RESULTS: For individual CoD classification, our best classifier achieves a sensitivity of.770 for adult deaths for 15 CoD categories (as compared to the current best reported sensitivity of.57), and.662 with 48 WHO categories. When predicting the CoD distribution at the population level, our best classifier achieves.962 cause-specific mortality fraction accuracy for 15 categories and.908 for 48 categories, which is on par with leading CoD distribution estimation methods. CONCLUSIONS: Our narrative-based machine learning classifier performs as well as classifiers based on structured data at the individual level. Moreover, our method demonstrates that VA narratives provide important information that can be used by a machine learning system for automated CoD classification. Unlike the structured questionnaire-based methods, this method can be applied to any verbal autopsy dataset, regardless of the collection process or country of origin. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12911-019-0841-9) contains supplementary material, which is available to authorized users. BioMed Central 2019-07-09 /pmc/articles/PMC6617656/ /pubmed/31288814 http://dx.doi.org/10.1186/s12911-019-0841-9 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Jeblee, Serena
Gomes, Mireille
Jha, Prabhat
Rudzicz, Frank
Hirst, Graeme
Automatically determining cause of death from verbal autopsy narratives
title Automatically determining cause of death from verbal autopsy narratives
title_full Automatically determining cause of death from verbal autopsy narratives
title_fullStr Automatically determining cause of death from verbal autopsy narratives
title_full_unstemmed Automatically determining cause of death from verbal autopsy narratives
title_short Automatically determining cause of death from verbal autopsy narratives
title_sort automatically determining cause of death from verbal autopsy narratives
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6617656/
https://www.ncbi.nlm.nih.gov/pubmed/31288814
http://dx.doi.org/10.1186/s12911-019-0841-9
work_keys_str_mv AT jebleeserena automaticallydeterminingcauseofdeathfromverbalautopsynarratives
AT gomesmireille automaticallydeterminingcauseofdeathfromverbalautopsynarratives
AT jhaprabhat automaticallydeterminingcauseofdeathfromverbalautopsynarratives
AT rudziczfrank automaticallydeterminingcauseofdeathfromverbalautopsynarratives
AT hirstgraeme automaticallydeterminingcauseofdeathfromverbalautopsynarratives