Cargando…

Automated detection of substance use information from electronic health records for a pediatric population

OBJECTIVE: Substance use screening in adolescence is unstandardized and often documented in clinical notes, rather than in structured electronic health records (EHRs). The objective of this study was to integrate logic rules with state-of-the-art natural language processing (NLP) and machine learnin...

Descripción completa

Detalles Bibliográficos
Autores principales: Ni, Yizhao, Bachtel, Alycia, Nause, Katie, Beal, Sarah
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8449626/
https://www.ncbi.nlm.nih.gov/pubmed/34333636
http://dx.doi.org/10.1093/jamia/ocab116
_version_ 1784569454671167488
author Ni, Yizhao
Bachtel, Alycia
Nause, Katie
Beal, Sarah
author_facet Ni, Yizhao
Bachtel, Alycia
Nause, Katie
Beal, Sarah
author_sort Ni, Yizhao
collection PubMed
description OBJECTIVE: Substance use screening in adolescence is unstandardized and often documented in clinical notes, rather than in structured electronic health records (EHRs). The objective of this study was to integrate logic rules with state-of-the-art natural language processing (NLP) and machine learning technologies to detect substance use information from both structured and unstructured EHR data. MATERIALS AND METHODS: Pediatric patients (10-20 years of age) with any encounter between July 1, 2012, and October 31, 2017, were included (n = 3890 patients; 19 478 encounters). EHR data were extracted at each encounter, manually reviewed for substance use (alcohol, tobacco, marijuana, opiate, any use), and coded as lifetime use, current use, or family use. Logic rules mapped structured EHR indicators to screening results. A knowledge-based NLP system and a deep learning model detected substance use information from unstructured clinical narratives. System performance was evaluated using positive predictive value, sensitivity, negative predictive value, specificity, and area under the receiver-operating characteristic curve (AUC). RESULTS: The dataset included 17 235 structured indicators and 27 141 clinical narratives. Manual review of clinical narratives captured 94.0% of positive screening results, while structured EHR data captured 22.0%. Logic rules detected screening results from structured data with 1.0 and 0.99 for sensitivity and specificity, respectively. The knowledge-based system detected substance use information from clinical narratives with 0.86, 0.79, and 0.88 for AUC, sensitivity, and specificity, respectively. The deep learning model further improved detection capacity, achieving 0.88, 0.81, and 0.85 for AUC, sensitivity, and specificity, respectively. Finally, integrating predictions from structured and unstructured data achieved high detection capacity across all cases (0.96, 0.85, and 0.87 for AUC, sensitivity, and specificity, respectively). CONCLUSIONS: It is feasible to detect substance use screening and results among pediatric patients using logic rules, NLP, and machine learning technologies.
format Online
Article
Text
id pubmed-8449626
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-84496262021-09-20 Automated detection of substance use information from electronic health records for a pediatric population Ni, Yizhao Bachtel, Alycia Nause, Katie Beal, Sarah J Am Med Inform Assoc Research and Applications OBJECTIVE: Substance use screening in adolescence is unstandardized and often documented in clinical notes, rather than in structured electronic health records (EHRs). The objective of this study was to integrate logic rules with state-of-the-art natural language processing (NLP) and machine learning technologies to detect substance use information from both structured and unstructured EHR data. MATERIALS AND METHODS: Pediatric patients (10-20 years of age) with any encounter between July 1, 2012, and October 31, 2017, were included (n = 3890 patients; 19 478 encounters). EHR data were extracted at each encounter, manually reviewed for substance use (alcohol, tobacco, marijuana, opiate, any use), and coded as lifetime use, current use, or family use. Logic rules mapped structured EHR indicators to screening results. A knowledge-based NLP system and a deep learning model detected substance use information from unstructured clinical narratives. System performance was evaluated using positive predictive value, sensitivity, negative predictive value, specificity, and area under the receiver-operating characteristic curve (AUC). RESULTS: The dataset included 17 235 structured indicators and 27 141 clinical narratives. Manual review of clinical narratives captured 94.0% of positive screening results, while structured EHR data captured 22.0%. Logic rules detected screening results from structured data with 1.0 and 0.99 for sensitivity and specificity, respectively. The knowledge-based system detected substance use information from clinical narratives with 0.86, 0.79, and 0.88 for AUC, sensitivity, and specificity, respectively. The deep learning model further improved detection capacity, achieving 0.88, 0.81, and 0.85 for AUC, sensitivity, and specificity, respectively. Finally, integrating predictions from structured and unstructured data achieved high detection capacity across all cases (0.96, 0.85, and 0.87 for AUC, sensitivity, and specificity, respectively). CONCLUSIONS: It is feasible to detect substance use screening and results among pediatric patients using logic rules, NLP, and machine learning technologies. Oxford University Press 2021-08-01 /pmc/articles/PMC8449626/ /pubmed/34333636 http://dx.doi.org/10.1093/jamia/ocab116 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research and Applications
Ni, Yizhao
Bachtel, Alycia
Nause, Katie
Beal, Sarah
Automated detection of substance use information from electronic health records for a pediatric population
title Automated detection of substance use information from electronic health records for a pediatric population
title_full Automated detection of substance use information from electronic health records for a pediatric population
title_fullStr Automated detection of substance use information from electronic health records for a pediatric population
title_full_unstemmed Automated detection of substance use information from electronic health records for a pediatric population
title_short Automated detection of substance use information from electronic health records for a pediatric population
title_sort automated detection of substance use information from electronic health records for a pediatric population
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8449626/
https://www.ncbi.nlm.nih.gov/pubmed/34333636
http://dx.doi.org/10.1093/jamia/ocab116
work_keys_str_mv AT niyizhao automateddetectionofsubstanceuseinformationfromelectronichealthrecordsforapediatricpopulation
AT bachtelalycia automateddetectionofsubstanceuseinformationfromelectronichealthrecordsforapediatricpopulation
AT nausekatie automateddetectionofsubstanceuseinformationfromelectronichealthrecordsforapediatricpopulation
AT bealsarah automateddetectionofsubstanceuseinformationfromelectronichealthrecordsforapediatricpopulation