Cargando…

Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients

BACKGROUND: Automated de-identification methods for removing protected health information (PHI) from the source notes of the electronic health record (EHR) rely on building systems to recognize mentions of PHI in text, but they remain inadequate at ensuring perfect PHI removal. As an alternative to...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sharma, Brihat, Dligach, Dmitriy, Swope, Kristin, Salisbury-Afshar, Elizabeth, Karnik, Niranjan S., Joyce, Cara, Afshar, Majid
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2020
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7191715/ https://www.ncbi.nlm.nih.gov/pubmed/32349766 http://dx.doi.org/10.1186/s12911-020-1099-y

_version_	1783527897157337088
author	Sharma, Brihat Dligach, Dmitriy Swope, Kristin Salisbury-Afshar, Elizabeth Karnik, Niranjan S. Joyce, Cara Afshar, Majid
author_facet	Sharma, Brihat Dligach, Dmitriy Swope, Kristin Salisbury-Afshar, Elizabeth Karnik, Niranjan S. Joyce, Cara Afshar, Majid
author_sort	Sharma, Brihat
collection	PubMed
description	BACKGROUND: Automated de-identification methods for removing protected health information (PHI) from the source notes of the electronic health record (EHR) rely on building systems to recognize mentions of PHI in text, but they remain inadequate at ensuring perfect PHI removal. As an alternative to relying on de-identification systems, we propose the following solutions: (1) Mapping the corpus of documents to standardized medical vocabulary (concept unique identifier [CUI] codes mapped from the Unified Medical Language System) thus eliminating PHI as inputs to a machine learning model; and (2) training character-based machine learning models that obviate the need for a dictionary containing input words/n-grams. We aim to test the performance of models with and without PHI in a use-case for an opioid misuse classifier. METHODS: An observational cohort sampled from adult hospital inpatient encounters at a health system between 2007 and 2017. A case-control stratified sampling (n = 1000) was performed to build an annotated dataset for a reference standard of cases and non-cases of opioid misuse. Models for training and testing included CUI codes, character-based, and n-gram features. Models applied were machine learning with neural network and logistic regression as well as expert consensus with a rule-based model for opioid misuse. The area under the receiver operating characteristic curves (AUROC) were compared between models for discrimination. The Hosmer-Lemeshow test and visual plots measured model fit and calibration. RESULTS: Machine learning models with CUI codes performed similarly to n-gram models with PHI. The top performing models with AUROCs > 0.90 included CUI codes as inputs to a convolutional neural network, max pooling network, and logistic regression model. The top calibrated models with the best model fit were the CUI-based convolutional neural network and max pooling network. The top weighted CUI codes in logistic regression has the related terms ‘Heroin’ and ‘Victim of abuse’. CONCLUSIONS: We demonstrate good test characteristics for an opioid misuse computable phenotype that is void of any PHI and performs similarly to models that use PHI. Herein we share a PHI-free, trained opioid misuse classifier for other researchers and health systems to use and benchmark to overcome privacy and security concerns.
format	Online Article Text
id	pubmed-7191715
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-71917152020-05-04 Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients Sharma, Brihat Dligach, Dmitriy Swope, Kristin Salisbury-Afshar, Elizabeth Karnik, Niranjan S. Joyce, Cara Afshar, Majid BMC Med Inform Decis Mak Research Article BACKGROUND: Automated de-identification methods for removing protected health information (PHI) from the source notes of the electronic health record (EHR) rely on building systems to recognize mentions of PHI in text, but they remain inadequate at ensuring perfect PHI removal. As an alternative to relying on de-identification systems, we propose the following solutions: (1) Mapping the corpus of documents to standardized medical vocabulary (concept unique identifier [CUI] codes mapped from the Unified Medical Language System) thus eliminating PHI as inputs to a machine learning model; and (2) training character-based machine learning models that obviate the need for a dictionary containing input words/n-grams. We aim to test the performance of models with and without PHI in a use-case for an opioid misuse classifier. METHODS: An observational cohort sampled from adult hospital inpatient encounters at a health system between 2007 and 2017. A case-control stratified sampling (n = 1000) was performed to build an annotated dataset for a reference standard of cases and non-cases of opioid misuse. Models for training and testing included CUI codes, character-based, and n-gram features. Models applied were machine learning with neural network and logistic regression as well as expert consensus with a rule-based model for opioid misuse. The area under the receiver operating characteristic curves (AUROC) were compared between models for discrimination. The Hosmer-Lemeshow test and visual plots measured model fit and calibration. RESULTS: Machine learning models with CUI codes performed similarly to n-gram models with PHI. The top performing models with AUROCs > 0.90 included CUI codes as inputs to a convolutional neural network, max pooling network, and logistic regression model. The top calibrated models with the best model fit were the CUI-based convolutional neural network and max pooling network. The top weighted CUI codes in logistic regression has the related terms ‘Heroin’ and ‘Victim of abuse’. CONCLUSIONS: We demonstrate good test characteristics for an opioid misuse computable phenotype that is void of any PHI and performs similarly to models that use PHI. Herein we share a PHI-free, trained opioid misuse classifier for other researchers and health systems to use and benchmark to overcome privacy and security concerns. BioMed Central 2020-04-29 /pmc/articles/PMC7191715/ /pubmed/32349766 http://dx.doi.org/10.1186/s12911-020-1099-y Text en © The Author(s). 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Article Sharma, Brihat Dligach, Dmitriy Swope, Kristin Salisbury-Afshar, Elizabeth Karnik, Niranjan S. Joyce, Cara Afshar, Majid Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients
title	Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients
title_full	Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients
title_fullStr	Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients
title_full_unstemmed	Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients
title_short	Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients
title_sort	publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7191715/ https://www.ncbi.nlm.nih.gov/pubmed/32349766 http://dx.doi.org/10.1186/s12911-020-1099-y
work_keys_str_mv	AT sharmabrihat publiclyavailablemachinelearningmodelsforidentifyingopioidmisusefromtheclinicalnotesofhospitalizedpatients AT dligachdmitriy publiclyavailablemachinelearningmodelsforidentifyingopioidmisusefromtheclinicalnotesofhospitalizedpatients AT swopekristin publiclyavailablemachinelearningmodelsforidentifyingopioidmisusefromtheclinicalnotesofhospitalizedpatients AT salisburyafsharelizabeth publiclyavailablemachinelearningmodelsforidentifyingopioidmisusefromtheclinicalnotesofhospitalizedpatients AT karnikniranjans publiclyavailablemachinelearningmodelsforidentifyingopioidmisusefromtheclinicalnotesofhospitalizedpatients AT joycecara publiclyavailablemachinelearningmodelsforidentifyingopioidmisusefromtheclinicalnotesofhospitalizedpatients AT afsharmajid publiclyavailablemachinelearningmodelsforidentifyingopioidmisusefromtheclinicalnotesofhospitalizedpatients

Publicly available machine learning models for identifying opioid misuse from the clinical notes of hospitalized patients

Ejemplares similares