Cargando…
Development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis
OBJECTIVE: To develop a computer model to predict patients with nonalcoholic steatohepatitis (NASH) using machine learning (ML). MATERIALS AND METHODS: This retrospective study utilized two databases: a) the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) nonalcoholic fatty...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8200272/ https://www.ncbi.nlm.nih.gov/pubmed/33684933 http://dx.doi.org/10.1093/jamia/ocab003 |
_version_ | 1783707570717851648 |
---|---|
author | Docherty, Matt Regnier, Stephane A Capkun, Gorana Balp, Maria-Magdalena Ye, Qin Janssens, Nico Tietz, Andreas Löffler, Jürgen Cai, Jennifer Pedrosa, Marcos C Schattenberg, Jörn M |
author_facet | Docherty, Matt Regnier, Stephane A Capkun, Gorana Balp, Maria-Magdalena Ye, Qin Janssens, Nico Tietz, Andreas Löffler, Jürgen Cai, Jennifer Pedrosa, Marcos C Schattenberg, Jörn M |
author_sort | Docherty, Matt |
collection | PubMed |
description | OBJECTIVE: To develop a computer model to predict patients with nonalcoholic steatohepatitis (NASH) using machine learning (ML). MATERIALS AND METHODS: This retrospective study utilized two databases: a) the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) nonalcoholic fatty liver disease (NAFLD) adult database (2004-2009), and b) the Optum(®) de-identified Electronic Health Record dataset (2007-2018), a real-world dataset representative of common electronic health records in the United States. We developed an ML model to predict NASH, using confirmed NASH and non-NASH based on liver histology results in the NIDDK dataset to train the model. RESULTS: Models were trained and tested on NIDDK NAFLD data (704 patients) and the best-performing models evaluated on Optum data (~3,000,000 patients). An eXtreme Gradient Boosting model (XGBoost) consisting of 14 features exhibited high performance as measured by area under the curve (0.82), sensitivity (81%), and precision (81%) in predicting NASH. Slightly reduced performance was observed with an abbreviated feature set of 5 variables (0.79, 80%, 80%, respectively). The full model demonstrated good performance (AUC 0.76) to predict NASH in Optum data. DISCUSSION: The proposed model, named NASHmap, is the first ML model developed with confirmed NASH and non-NASH cases as determined through liver biopsy and validated on a large, real-world patient dataset. Both the 14 and 5-feature versions exhibit high performance. CONCLUSION: The NASHmap model is a convenient and high performing tool that could be used to identify patients likely to have NASH in clinical settings, allowing better patient management and optimal allocation of clinical resources. |
format | Online Article Text |
id | pubmed-8200272 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-82002722021-06-14 Development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis Docherty, Matt Regnier, Stephane A Capkun, Gorana Balp, Maria-Magdalena Ye, Qin Janssens, Nico Tietz, Andreas Löffler, Jürgen Cai, Jennifer Pedrosa, Marcos C Schattenberg, Jörn M J Am Med Inform Assoc Research and Applications OBJECTIVE: To develop a computer model to predict patients with nonalcoholic steatohepatitis (NASH) using machine learning (ML). MATERIALS AND METHODS: This retrospective study utilized two databases: a) the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) nonalcoholic fatty liver disease (NAFLD) adult database (2004-2009), and b) the Optum(®) de-identified Electronic Health Record dataset (2007-2018), a real-world dataset representative of common electronic health records in the United States. We developed an ML model to predict NASH, using confirmed NASH and non-NASH based on liver histology results in the NIDDK dataset to train the model. RESULTS: Models were trained and tested on NIDDK NAFLD data (704 patients) and the best-performing models evaluated on Optum data (~3,000,000 patients). An eXtreme Gradient Boosting model (XGBoost) consisting of 14 features exhibited high performance as measured by area under the curve (0.82), sensitivity (81%), and precision (81%) in predicting NASH. Slightly reduced performance was observed with an abbreviated feature set of 5 variables (0.79, 80%, 80%, respectively). The full model demonstrated good performance (AUC 0.76) to predict NASH in Optum data. DISCUSSION: The proposed model, named NASHmap, is the first ML model developed with confirmed NASH and non-NASH cases as determined through liver biopsy and validated on a large, real-world patient dataset. Both the 14 and 5-feature versions exhibit high performance. CONCLUSION: The NASHmap model is a convenient and high performing tool that could be used to identify patients likely to have NASH in clinical settings, allowing better patient management and optimal allocation of clinical resources. Oxford University Press 2021-03-04 /pmc/articles/PMC8200272/ /pubmed/33684933 http://dx.doi.org/10.1093/jamia/ocab003 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Research and Applications Docherty, Matt Regnier, Stephane A Capkun, Gorana Balp, Maria-Magdalena Ye, Qin Janssens, Nico Tietz, Andreas Löffler, Jürgen Cai, Jennifer Pedrosa, Marcos C Schattenberg, Jörn M Development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis |
title | Development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis |
title_full | Development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis |
title_fullStr | Development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis |
title_full_unstemmed | Development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis |
title_short | Development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis |
title_sort | development of a novel machine learning model to predict presence of nonalcoholic steatohepatitis |
topic | Research and Applications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8200272/ https://www.ncbi.nlm.nih.gov/pubmed/33684933 http://dx.doi.org/10.1093/jamia/ocab003 |
work_keys_str_mv | AT dochertymatt developmentofanovelmachinelearningmodeltopredictpresenceofnonalcoholicsteatohepatitis AT regnierstephanea developmentofanovelmachinelearningmodeltopredictpresenceofnonalcoholicsteatohepatitis AT capkungorana developmentofanovelmachinelearningmodeltopredictpresenceofnonalcoholicsteatohepatitis AT balpmariamagdalena developmentofanovelmachinelearningmodeltopredictpresenceofnonalcoholicsteatohepatitis AT yeqin developmentofanovelmachinelearningmodeltopredictpresenceofnonalcoholicsteatohepatitis AT janssensnico developmentofanovelmachinelearningmodeltopredictpresenceofnonalcoholicsteatohepatitis AT tietzandreas developmentofanovelmachinelearningmodeltopredictpresenceofnonalcoholicsteatohepatitis AT lofflerjurgen developmentofanovelmachinelearningmodeltopredictpresenceofnonalcoholicsteatohepatitis AT caijennifer developmentofanovelmachinelearningmodeltopredictpresenceofnonalcoholicsteatohepatitis AT pedrosamarcosc developmentofanovelmachinelearningmodeltopredictpresenceofnonalcoholicsteatohepatitis AT schattenbergjornm developmentofanovelmachinelearningmodeltopredictpresenceofnonalcoholicsteatohepatitis |