Cargando…

Applying Machine Learning Techniques to Identify Undiagnosed Patients with Exocrine Pancreatic Insufficiency

BACKGROUND: Exocrine pancreatic insufficiency (EPI) is a serious condition characterized by a lack of functional exocrine pancreatic enzymes and the resultant inability to properly digest nutrients. EPI can be caused by a variety of disorders, including chronic pancreatitis, pancreatic cancer, and c...

Descripción completa

Detalles Bibliográficos
Autores principales: Pyenson, Bruce, Alston, Maggie, Gomberg, Jeffrey, Han, Feng, Khandelwal, Nikhil, Dei, Motoharu, Son, Monica, Vora, Jaime
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Columbia Data Analytics, LLC 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7299452/
https://www.ncbi.nlm.nih.gov/pubmed/32685578
http://dx.doi.org/10.36469/9727
_version_ 1783547390113873920
author Pyenson, Bruce
Alston, Maggie
Gomberg, Jeffrey
Han, Feng
Khandelwal, Nikhil
Dei, Motoharu
Son, Monica
Vora, Jaime
author_facet Pyenson, Bruce
Alston, Maggie
Gomberg, Jeffrey
Han, Feng
Khandelwal, Nikhil
Dei, Motoharu
Son, Monica
Vora, Jaime
author_sort Pyenson, Bruce
collection PubMed
description BACKGROUND: Exocrine pancreatic insufficiency (EPI) is a serious condition characterized by a lack of functional exocrine pancreatic enzymes and the resultant inability to properly digest nutrients. EPI can be caused by a variety of disorders, including chronic pancreatitis, pancreatic cancer, and celiac disease. EPI remains underdiagnosed because of the nonspecific nature of clinical symptoms, lack of an ideal diagnostic test, and the inability to easily identify affected patients using administrative claims data. OBJECTIVES: To develop a machine learning model that identifies patients in a commercial medical claims database who likely have EPI but are undiagnosed. METHODS: A machine learning algorithm was developed in Scikit-learn, a Python module. The study population, selected from the 2014 Truven MarketScan® Commercial Claims Database, consisted of patients with EPI-prone conditions. Patients were labeled with 290 condition category flags and split into actual positive EPI cases, actual negative EPI cases, and unlabeled cases. The study population was then randomly divided into a training subset and a testing subset. The training subset was used to determine the performance metrics of 27 models and to select the highest performing model, and the testing subset was used to evaluate performance of the best machine learning model. RESULTS: The study population consisted of 2088 actual positive EPI cases, 1077 actual negative EPI cases, and 437 530 unlabeled cases. In the best performing model, the precision, recall, and accuracy were 0.91, 0.80, and 0.86, respectively. The best-performing model estimated that the number of patients likely to have EPI was about 12 times the number of patients directly identified as EPI-positive through a claims analysis in the study population. The most important features in assigning EPI probability were the presence or absence of diagnosis codes related to pancreatic and digestive conditions. CONCLUSIONS: Machine learning techniques demonstrated high predictive power in identifying patients with EPI and could facilitate an enhanced understanding of its etiology and help to identify patients for possible diagnosis and treatment.
format Online
Article
Text
id pubmed-7299452
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Columbia Data Analytics, LLC
record_format MEDLINE/PubMed
spelling pubmed-72994522020-07-16 Applying Machine Learning Techniques to Identify Undiagnosed Patients with Exocrine Pancreatic Insufficiency Pyenson, Bruce Alston, Maggie Gomberg, Jeffrey Han, Feng Khandelwal, Nikhil Dei, Motoharu Son, Monica Vora, Jaime J Health Econ Outcomes Res Methodology and Health Care Policy BACKGROUND: Exocrine pancreatic insufficiency (EPI) is a serious condition characterized by a lack of functional exocrine pancreatic enzymes and the resultant inability to properly digest nutrients. EPI can be caused by a variety of disorders, including chronic pancreatitis, pancreatic cancer, and celiac disease. EPI remains underdiagnosed because of the nonspecific nature of clinical symptoms, lack of an ideal diagnostic test, and the inability to easily identify affected patients using administrative claims data. OBJECTIVES: To develop a machine learning model that identifies patients in a commercial medical claims database who likely have EPI but are undiagnosed. METHODS: A machine learning algorithm was developed in Scikit-learn, a Python module. The study population, selected from the 2014 Truven MarketScan® Commercial Claims Database, consisted of patients with EPI-prone conditions. Patients were labeled with 290 condition category flags and split into actual positive EPI cases, actual negative EPI cases, and unlabeled cases. The study population was then randomly divided into a training subset and a testing subset. The training subset was used to determine the performance metrics of 27 models and to select the highest performing model, and the testing subset was used to evaluate performance of the best machine learning model. RESULTS: The study population consisted of 2088 actual positive EPI cases, 1077 actual negative EPI cases, and 437 530 unlabeled cases. In the best performing model, the precision, recall, and accuracy were 0.91, 0.80, and 0.86, respectively. The best-performing model estimated that the number of patients likely to have EPI was about 12 times the number of patients directly identified as EPI-positive through a claims analysis in the study population. The most important features in assigning EPI probability were the presence or absence of diagnosis codes related to pancreatic and digestive conditions. CONCLUSIONS: Machine learning techniques demonstrated high predictive power in identifying patients with EPI and could facilitate an enhanced understanding of its etiology and help to identify patients for possible diagnosis and treatment. Columbia Data Analytics, LLC 2019-02-14 /pmc/articles/PMC7299452/ /pubmed/32685578 http://dx.doi.org/10.36469/9727 Text en This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (CCBY-4.0). View this license’s legal deed at http://creativecommons.org/licenses/by/4.0 and legal code at http://creativecommons.org/licenses/by/4.0/legalcode for more information.
spellingShingle Methodology and Health Care Policy
Pyenson, Bruce
Alston, Maggie
Gomberg, Jeffrey
Han, Feng
Khandelwal, Nikhil
Dei, Motoharu
Son, Monica
Vora, Jaime
Applying Machine Learning Techniques to Identify Undiagnosed Patients with Exocrine Pancreatic Insufficiency
title Applying Machine Learning Techniques to Identify Undiagnosed Patients with Exocrine Pancreatic Insufficiency
title_full Applying Machine Learning Techniques to Identify Undiagnosed Patients with Exocrine Pancreatic Insufficiency
title_fullStr Applying Machine Learning Techniques to Identify Undiagnosed Patients with Exocrine Pancreatic Insufficiency
title_full_unstemmed Applying Machine Learning Techniques to Identify Undiagnosed Patients with Exocrine Pancreatic Insufficiency
title_short Applying Machine Learning Techniques to Identify Undiagnosed Patients with Exocrine Pancreatic Insufficiency
title_sort applying machine learning techniques to identify undiagnosed patients with exocrine pancreatic insufficiency
topic Methodology and Health Care Policy
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7299452/
https://www.ncbi.nlm.nih.gov/pubmed/32685578
http://dx.doi.org/10.36469/9727
work_keys_str_mv AT pyensonbruce applyingmachinelearningtechniquestoidentifyundiagnosedpatientswithexocrinepancreaticinsufficiency
AT alstonmaggie applyingmachinelearningtechniquestoidentifyundiagnosedpatientswithexocrinepancreaticinsufficiency
AT gombergjeffrey applyingmachinelearningtechniquestoidentifyundiagnosedpatientswithexocrinepancreaticinsufficiency
AT hanfeng applyingmachinelearningtechniquestoidentifyundiagnosedpatientswithexocrinepancreaticinsufficiency
AT khandelwalnikhil applyingmachinelearningtechniquestoidentifyundiagnosedpatientswithexocrinepancreaticinsufficiency
AT deimotoharu applyingmachinelearningtechniquestoidentifyundiagnosedpatientswithexocrinepancreaticinsufficiency
AT sonmonica applyingmachinelearningtechniquestoidentifyundiagnosedpatientswithexocrinepancreaticinsufficiency
AT vorajaime applyingmachinelearningtechniquestoidentifyundiagnosedpatientswithexocrinepancreaticinsufficiency