Cargando…

Development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis

OBJECTIVE: To develop a computer model to predict patients with nonalcoholic steatohepatitis (NASH) using machine learning (ML). MATERIALS AND METHODS: This retrospective study utilized two databases: a) the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) nonalcoholic fatty...

Descripción completa

Detalles Bibliográficos
Autores principales: Docherty, Matt, Regnier, Stephane A, Capkun, Gorana, Balp, Maria-Magdalena, Ye, Qin, Janssens, Nico, Tietz, Andreas, Löffler, Jürgen, Cai, Jennifer, Pedrosa, Marcos C, Schattenberg, Jörn M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8200272/
https://www.ncbi.nlm.nih.gov/pubmed/33684933
http://dx.doi.org/10.1093/jamia/ocab003
_version_ 1783707570717851648
author Docherty, Matt
Regnier, Stephane A
Capkun, Gorana
Balp, Maria-Magdalena
Ye, Qin
Janssens, Nico
Tietz, Andreas
Löffler, Jürgen
Cai, Jennifer
Pedrosa, Marcos C
Schattenberg, Jörn M
author_facet Docherty, Matt
Regnier, Stephane A
Capkun, Gorana
Balp, Maria-Magdalena
Ye, Qin
Janssens, Nico
Tietz, Andreas
Löffler, Jürgen
Cai, Jennifer
Pedrosa, Marcos C
Schattenberg, Jörn M
author_sort Docherty, Matt
collection PubMed
description OBJECTIVE: To develop a computer model to predict patients with nonalcoholic steatohepatitis (NASH) using machine learning (ML). MATERIALS AND METHODS: This retrospective study utilized two databases: a) the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) nonalcoholic fatty liver disease (NAFLD) adult database (2004-2009), and b) the Optum(®) de-identified Electronic Health Record dataset (2007-2018), a real-world dataset representative of common electronic health records in the United States. We developed an ML model to predict NASH, using confirmed NASH and non-NASH based on liver histology results in the NIDDK dataset to train the model. RESULTS: Models were trained and tested on NIDDK NAFLD data (704 patients) and the best-performing models evaluated on Optum data (~3,000,000 patients). An eXtreme Gradient Boosting model (XGBoost) consisting of 14 features exhibited high performance as measured by area under the curve (0.82), sensitivity (81%), and precision (81%) in predicting NASH. Slightly reduced performance was observed with an abbreviated feature set of 5 variables (0.79, 80%, 80%, respectively). The full model demonstrated good performance (AUC 0.76) to predict NASH in Optum data. DISCUSSION: The proposed model, named NASHmap, is the first ML model developed with confirmed NASH and non-NASH cases as determined through liver biopsy and validated on a large, real-world patient dataset. Both the 14 and 5-feature versions exhibit high performance. CONCLUSION: The NASHmap model is a convenient and high performing tool that could be used to identify patients likely to have NASH in clinical settings, allowing better patient management and optimal allocation of clinical resources.
format Online
Article
Text
id pubmed-8200272
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-82002722021-06-14 Development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis Docherty, Matt Regnier, Stephane A Capkun, Gorana Balp, Maria-Magdalena Ye, Qin Janssens, Nico Tietz, Andreas Löffler, Jürgen Cai, Jennifer Pedrosa, Marcos C Schattenberg, Jörn M J Am Med Inform Assoc Research and Applications OBJECTIVE: To develop a computer model to predict patients with nonalcoholic steatohepatitis (NASH) using machine learning (ML). MATERIALS AND METHODS: This retrospective study utilized two databases: a) the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) nonalcoholic fatty liver disease (NAFLD) adult database (2004-2009), and b) the Optum(®) de-identified Electronic Health Record dataset (2007-2018), a real-world dataset representative of common electronic health records in the United States. We developed an ML model to predict NASH, using confirmed NASH and non-NASH based on liver histology results in the NIDDK dataset to train the model. RESULTS: Models were trained and tested on NIDDK NAFLD data (704 patients) and the best-performing models evaluated on Optum data (~3,000,000 patients). An eXtreme Gradient Boosting model (XGBoost) consisting of 14 features exhibited high performance as measured by area under the curve (0.82), sensitivity (81%), and precision (81%) in predicting NASH. Slightly reduced performance was observed with an abbreviated feature set of 5 variables (0.79, 80%, 80%, respectively). The full model demonstrated good performance (AUC 0.76) to predict NASH in Optum data. DISCUSSION: The proposed model, named NASHmap, is the first ML model developed with confirmed NASH and non-NASH cases as determined through liver biopsy and validated on a large, real-world patient dataset. Both the 14 and 5-feature versions exhibit high performance. CONCLUSION: The NASHmap model is a convenient and high performing tool that could be used to identify patients likely to have NASH in clinical settings, allowing better patient management and optimal allocation of clinical resources. Oxford University Press 2021-03-04 /pmc/articles/PMC8200272/ /pubmed/33684933 http://dx.doi.org/10.1093/jamia/ocab003 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research and Applications
Docherty, Matt
Regnier, Stephane A
Capkun, Gorana
Balp, Maria-Magdalena
Ye, Qin
Janssens, Nico
Tietz, Andreas
Löffler, Jürgen
Cai, Jennifer
Pedrosa, Marcos C
Schattenberg, Jörn M
Development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis
title Development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis
title_full Development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis
title_fullStr Development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis
title_full_unstemmed Development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis
title_short Development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis
title_sort development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8200272/
https://www.ncbi.nlm.nih.gov/pubmed/33684933
http://dx.doi.org/10.1093/jamia/ocab003
work_keys_str_mv AT dochertymatt developmentofanovelmachinelearningmodeltopredictpresenceofnonalcoholicsteatohepatitis
AT regnierstephanea developmentofanovelmachinelearningmodeltopredictpresenceofnonalcoholicsteatohepatitis
AT capkungorana developmentofanovelmachinelearningmodeltopredictpresenceofnonalcoholicsteatohepatitis
AT balpmariamagdalena developmentofanovelmachinelearningmodeltopredictpresenceofnonalcoholicsteatohepatitis
AT yeqin developmentofanovelmachinelearningmodeltopredictpresenceofnonalcoholicsteatohepatitis
AT janssensnico developmentofanovelmachinelearningmodeltopredictpresenceofnonalcoholicsteatohepatitis
AT tietzandreas developmentofanovelmachinelearningmodeltopredictpresenceofnonalcoholicsteatohepatitis
AT lofflerjurgen developmentofanovelmachinelearningmodeltopredictpresenceofnonalcoholicsteatohepatitis
AT caijennifer developmentofanovelmachinelearningmodeltopredictpresenceofnonalcoholicsteatohepatitis
AT pedrosamarcosc developmentofanovelmachinelearningmodeltopredictpresenceofnonalcoholicsteatohepatitis
AT schattenbergjornm developmentofanovelmachinelearningmodeltopredictpresenceofnonalcoholicsteatohepatitis