Cargando…

Using Natural Language Processing to Predict Fatal Drug Overdose From Autopsy Narrative Text: Algorithm Development and Validation Study

BACKGROUND: Fatal drug overdose surveillance informs prevention but is often delayed because of autopsy report processing and death certificate coding. Autopsy reports contain narrative text describing scene evidence and medical history (similar to preliminary death scene investigation reports) and...

Descripción completa

Detalles Bibliográficos
Autores principales: Tang, Leigh Anne, Korona-Bailey, Jessica, Zaras, Dimitrios, Roberts, Allison, Mukhopadhyay, Sutapa, Espy, Stephen, Walsh, Colin G
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10238956/
https://www.ncbi.nlm.nih.gov/pubmed/37204824
http://dx.doi.org/10.2196/45246
_version_ 1785053393616633856
author Tang, Leigh Anne
Korona-Bailey, Jessica
Zaras, Dimitrios
Roberts, Allison
Mukhopadhyay, Sutapa
Espy, Stephen
Walsh, Colin G
author_facet Tang, Leigh Anne
Korona-Bailey, Jessica
Zaras, Dimitrios
Roberts, Allison
Mukhopadhyay, Sutapa
Espy, Stephen
Walsh, Colin G
author_sort Tang, Leigh Anne
collection PubMed
description BACKGROUND: Fatal drug overdose surveillance informs prevention but is often delayed because of autopsy report processing and death certificate coding. Autopsy reports contain narrative text describing scene evidence and medical history (similar to preliminary death scene investigation reports) and may serve as early data sources for identifying fatal drug overdoses. To facilitate timely fatal overdose reporting, natural language processing was applied to narrative texts from autopsies. OBJECTIVE: This study aimed to develop a natural language processing–based model that predicts the likelihood that an autopsy report narrative describes an accidental or undetermined fatal drug overdose. METHODS: Autopsy reports of all manners of death (2019-2021) were obtained from the Tennessee Office of the State Chief Medical Examiner. The text was extracted from autopsy reports (PDFs) using optical character recognition. Three common narrative text sections were identified, concatenated, and preprocessed (bag-of-words) using term frequency–inverse document frequency scoring. Logistic regression, support vector machine (SVM), random forest, and gradient boosted tree classifiers were developed and validated. Models were trained and calibrated using autopsies from 2019 to 2020 and tested using those from 2021. Model discrimination was evaluated using the area under the receiver operating characteristic, precision, recall, F(1)-score, and F(2)-score (prioritizes recall over precision). Calibration was performed using logistic regression (Platt scaling) and evaluated using the Spiegelhalter z test. Shapley additive explanations values were generated for models compatible with this method. In a post hoc subgroup analysis of the random forest classifier, model discrimination was evaluated by forensic center, race, age, sex, and education level. RESULTS: A total of 17,342 autopsies (n=5934, 34.22% cases) were used for model development and validation. The training set included 10,215 autopsies (n=3342, 32.72% cases), the calibration set included 538 autopsies (n=183, 34.01% cases), and the test set included 6589 autopsies (n=2409, 36.56% cases). The vocabulary set contained 4002 terms. All models showed excellent performance (area under the receiver operating characteristic ≥0.95, precision ≥0.94, recall ≥0.92, F(1)-score ≥0.94, and F(2)-score ≥0.92). The SVM and random forest classifiers achieved the highest F(2)-scores (0.948 and 0.947, respectively). The logistic regression and random forest were calibrated (P=.95 and P=.85, respectively), whereas the SVM and gradient boosted tree classifiers were miscalibrated (P=.03 and P<.001, respectively). “Fentanyl” and “accident” had the highest Shapley additive explanations values. Post hoc subgroup analyses revealed lower F(2)-scores for autopsies from forensic centers D and E. Lower F(2)-score were observed for the American Indian, Asian, ≤14 years, and ≥65 years subgroups, but larger sample sizes are needed to validate these findings. CONCLUSIONS: The random forest classifier may be suitable for identifying potential accidental and undetermined fatal overdose autopsies. Further validation studies should be conducted to ensure early detection of accidental and undetermined fatal drug overdoses across all subgroups.
format Online
Article
Text
id pubmed-10238956
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-102389562023-06-04 Using Natural Language Processing to Predict Fatal Drug Overdose From Autopsy Narrative Text: Algorithm Development and Validation Study Tang, Leigh Anne Korona-Bailey, Jessica Zaras, Dimitrios Roberts, Allison Mukhopadhyay, Sutapa Espy, Stephen Walsh, Colin G JMIR Public Health Surveill Original Paper BACKGROUND: Fatal drug overdose surveillance informs prevention but is often delayed because of autopsy report processing and death certificate coding. Autopsy reports contain narrative text describing scene evidence and medical history (similar to preliminary death scene investigation reports) and may serve as early data sources for identifying fatal drug overdoses. To facilitate timely fatal overdose reporting, natural language processing was applied to narrative texts from autopsies. OBJECTIVE: This study aimed to develop a natural language processing–based model that predicts the likelihood that an autopsy report narrative describes an accidental or undetermined fatal drug overdose. METHODS: Autopsy reports of all manners of death (2019-2021) were obtained from the Tennessee Office of the State Chief Medical Examiner. The text was extracted from autopsy reports (PDFs) using optical character recognition. Three common narrative text sections were identified, concatenated, and preprocessed (bag-of-words) using term frequency–inverse document frequency scoring. Logistic regression, support vector machine (SVM), random forest, and gradient boosted tree classifiers were developed and validated. Models were trained and calibrated using autopsies from 2019 to 2020 and tested using those from 2021. Model discrimination was evaluated using the area under the receiver operating characteristic, precision, recall, F(1)-score, and F(2)-score (prioritizes recall over precision). Calibration was performed using logistic regression (Platt scaling) and evaluated using the Spiegelhalter z test. Shapley additive explanations values were generated for models compatible with this method. In a post hoc subgroup analysis of the random forest classifier, model discrimination was evaluated by forensic center, race, age, sex, and education level. RESULTS: A total of 17,342 autopsies (n=5934, 34.22% cases) were used for model development and validation. The training set included 10,215 autopsies (n=3342, 32.72% cases), the calibration set included 538 autopsies (n=183, 34.01% cases), and the test set included 6589 autopsies (n=2409, 36.56% cases). The vocabulary set contained 4002 terms. All models showed excellent performance (area under the receiver operating characteristic ≥0.95, precision ≥0.94, recall ≥0.92, F(1)-score ≥0.94, and F(2)-score ≥0.92). The SVM and random forest classifiers achieved the highest F(2)-scores (0.948 and 0.947, respectively). The logistic regression and random forest were calibrated (P=.95 and P=.85, respectively), whereas the SVM and gradient boosted tree classifiers were miscalibrated (P=.03 and P<.001, respectively). “Fentanyl” and “accident” had the highest Shapley additive explanations values. Post hoc subgroup analyses revealed lower F(2)-scores for autopsies from forensic centers D and E. Lower F(2)-score were observed for the American Indian, Asian, ≤14 years, and ≥65 years subgroups, but larger sample sizes are needed to validate these findings. CONCLUSIONS: The random forest classifier may be suitable for identifying potential accidental and undetermined fatal overdose autopsies. Further validation studies should be conducted to ensure early detection of accidental and undetermined fatal drug overdoses across all subgroups. JMIR Publications 2023-05-19 /pmc/articles/PMC10238956/ /pubmed/37204824 http://dx.doi.org/10.2196/45246 Text en ©Leigh Anne Tang, Jessica Korona-Bailey, Dimitrios Zaras, Allison Roberts, Sutapa Mukhopadhyay, Stephen Espy, Colin G Walsh. Originally published in JMIR Public Health and Surveillance (https://publichealth.jmir.org), 19.05.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Public Health and Surveillance, is properly cited. The complete bibliographic information, a link to the original publication on https://publichealth.jmir.org, as well as this copyright and license information must be included.
spellingShingle Original Paper
Tang, Leigh Anne
Korona-Bailey, Jessica
Zaras, Dimitrios
Roberts, Allison
Mukhopadhyay, Sutapa
Espy, Stephen
Walsh, Colin G
Using Natural Language Processing to Predict Fatal Drug Overdose From Autopsy Narrative Text: Algorithm Development and Validation Study
title Using Natural Language Processing to Predict Fatal Drug Overdose From Autopsy Narrative Text: Algorithm Development and Validation Study
title_full Using Natural Language Processing to Predict Fatal Drug Overdose From Autopsy Narrative Text: Algorithm Development and Validation Study
title_fullStr Using Natural Language Processing to Predict Fatal Drug Overdose From Autopsy Narrative Text: Algorithm Development and Validation Study
title_full_unstemmed Using Natural Language Processing to Predict Fatal Drug Overdose From Autopsy Narrative Text: Algorithm Development and Validation Study
title_short Using Natural Language Processing to Predict Fatal Drug Overdose From Autopsy Narrative Text: Algorithm Development and Validation Study
title_sort using natural language processing to predict fatal drug overdose from autopsy narrative text: algorithm development and validation study
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10238956/
https://www.ncbi.nlm.nih.gov/pubmed/37204824
http://dx.doi.org/10.2196/45246
work_keys_str_mv AT tangleighanne usingnaturallanguageprocessingtopredictfataldrugoverdosefromautopsynarrativetextalgorithmdevelopmentandvalidationstudy
AT koronabaileyjessica usingnaturallanguageprocessingtopredictfataldrugoverdosefromautopsynarrativetextalgorithmdevelopmentandvalidationstudy
AT zarasdimitrios usingnaturallanguageprocessingtopredictfataldrugoverdosefromautopsynarrativetextalgorithmdevelopmentandvalidationstudy
AT robertsallison usingnaturallanguageprocessingtopredictfataldrugoverdosefromautopsynarrativetextalgorithmdevelopmentandvalidationstudy
AT mukhopadhyaysutapa usingnaturallanguageprocessingtopredictfataldrugoverdosefromautopsynarrativetextalgorithmdevelopmentandvalidationstudy
AT espystephen usingnaturallanguageprocessingtopredictfataldrugoverdosefromautopsynarrativetextalgorithmdevelopmentandvalidationstudy
AT walshcoling usingnaturallanguageprocessingtopredictfataldrugoverdosefromautopsynarrativetextalgorithmdevelopmentandvalidationstudy