Cargando…

Using Natural Language Processing and Machine Learning to Identify Opioids in Electronic Health Record Data

PURPOSE: This study evaluates the utility of machine learning (ML) and natural language processing (NLP) in the processing and initial analysis of data within the electronic health record (EHR). We present and evaluate a method to classify medication names as either opioids or non-opioids using ML a...

Descripción completa

Detalles Bibliográficos
Autores principales: McDermott, Sean P, Wasan, Ajay D
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Dove 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10290467/
https://www.ncbi.nlm.nih.gov/pubmed/37361429
http://dx.doi.org/10.2147/JPR.S389160
_version_ 1785062503817936896
author McDermott, Sean P
Wasan, Ajay D
author_facet McDermott, Sean P
Wasan, Ajay D
author_sort McDermott, Sean P
collection PubMed
description PURPOSE: This study evaluates the utility of machine learning (ML) and natural language processing (NLP) in the processing and initial analysis of data within the electronic health record (EHR). We present and evaluate a method to classify medication names as either opioids or non-opioids using ML and NLP. PATIENTS AND METHODS: A total of 4216 distinct medication entries were obtained from the EHR and were initially labeled by human reviewers as opioid or non-opioid medications. An approach incorporating bag-of-words NLP and supervised ML classification was implemented in MATLAB and used to automatically classify medications. The automated method was trained on 60% of the input data, evaluated on the remaining 40%, and compared to manual classification results. RESULTS: A total of 3991 medication strings were classified as non-opioid medications (94.7%), and 225 were classified as opioid medications by the human reviewers (5.3%). The algorithm achieved a 99.6% accuracy, 97.8% sensitivity, 94.6% positive predictive value, F1 value of 0.96, and a receiver operating characteristic (ROC) curve with 0.998 area under the curve (AUC). A secondary analysis indicated that approximately 15–20 opioids (and 80–100 non-opioids) were needed to achieve accuracy, sensitivity, and AUC values of above 90–95%. CONCLUSION: The automated approach achieved excellent performance in classifying opioids or non-opioids, even with a practical number of human reviewed training examples. This will allow a significant reduction in manual chart review and improve data structuring for retrospective analyses in pain studies. The approach may also be adapted to further analysis and predictive analytics of EHR and other “big data” studies.
format Online
Article
Text
id pubmed-10290467
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Dove
record_format MEDLINE/PubMed
spelling pubmed-102904672023-06-25 Using Natural Language Processing and Machine Learning to Identify Opioids in Electronic Health Record Data McDermott, Sean P Wasan, Ajay D J Pain Res Original Research PURPOSE: This study evaluates the utility of machine learning (ML) and natural language processing (NLP) in the processing and initial analysis of data within the electronic health record (EHR). We present and evaluate a method to classify medication names as either opioids or non-opioids using ML and NLP. PATIENTS AND METHODS: A total of 4216 distinct medication entries were obtained from the EHR and were initially labeled by human reviewers as opioid or non-opioid medications. An approach incorporating bag-of-words NLP and supervised ML classification was implemented in MATLAB and used to automatically classify medications. The automated method was trained on 60% of the input data, evaluated on the remaining 40%, and compared to manual classification results. RESULTS: A total of 3991 medication strings were classified as non-opioid medications (94.7%), and 225 were classified as opioid medications by the human reviewers (5.3%). The algorithm achieved a 99.6% accuracy, 97.8% sensitivity, 94.6% positive predictive value, F1 value of 0.96, and a receiver operating characteristic (ROC) curve with 0.998 area under the curve (AUC). A secondary analysis indicated that approximately 15–20 opioids (and 80–100 non-opioids) were needed to achieve accuracy, sensitivity, and AUC values of above 90–95%. CONCLUSION: The automated approach achieved excellent performance in classifying opioids or non-opioids, even with a practical number of human reviewed training examples. This will allow a significant reduction in manual chart review and improve data structuring for retrospective analyses in pain studies. The approach may also be adapted to further analysis and predictive analytics of EHR and other “big data” studies. Dove 2023-06-20 /pmc/articles/PMC10290467/ /pubmed/37361429 http://dx.doi.org/10.2147/JPR.S389160 Text en © 2023 McDermott and Wasan. https://creativecommons.org/licenses/by-nc/3.0/This work is published and licensed by Dove Medical Press Limited. The full terms of this license are available at https://www.dovepress.com/terms.php and incorporate the Creative Commons Attribution – Non Commercial (unported, v3.0) License (http://creativecommons.org/licenses/by-nc/3.0/ (https://creativecommons.org/licenses/by-nc/3.0/) ). By accessing the work you hereby accept the Terms. Non-commercial uses of the work are permitted without any further permission from Dove Medical Press Limited, provided the work is properly attributed. For permission for commercial use of this work, please see paragraphs 4.2 and 5 of our Terms (https://www.dovepress.com/terms.php).
spellingShingle Original Research
McDermott, Sean P
Wasan, Ajay D
Using Natural Language Processing and Machine Learning to Identify Opioids in Electronic Health Record Data
title Using Natural Language Processing and Machine Learning to Identify Opioids in Electronic Health Record Data
title_full Using Natural Language Processing and Machine Learning to Identify Opioids in Electronic Health Record Data
title_fullStr Using Natural Language Processing and Machine Learning to Identify Opioids in Electronic Health Record Data
title_full_unstemmed Using Natural Language Processing and Machine Learning to Identify Opioids in Electronic Health Record Data
title_short Using Natural Language Processing and Machine Learning to Identify Opioids in Electronic Health Record Data
title_sort using natural language processing and machine learning to identify opioids in electronic health record data
topic Original Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10290467/
https://www.ncbi.nlm.nih.gov/pubmed/37361429
http://dx.doi.org/10.2147/JPR.S389160
work_keys_str_mv AT mcdermottseanp usingnaturallanguageprocessingandmachinelearningtoidentifyopioidsinelectronichealthrecorddata
AT wasanajayd usingnaturallanguageprocessingandmachinelearningtoidentifyopioidsinelectronichealthrecorddata