Cargando…

Machine learning models for predicting non-alcoholic fatty liver disease in the general United States population: NHANES database

BACKGROUND: Non-alcoholic fatty liver disease (NAFLD) is the most common chronic liver disease, affecting over 30% of the United States population. Early patient identification using a simple method is highly desirable. AIM: To create machine learning models for predicting NAFLD in the general Unite...

Descripción completa

Detalles Bibliográficos
Autores principales:	Atsawarungruangkit, Amporn, Laoveeravat, Passisd, Promrat, Kittichai
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Baishideng Publishing Group Inc 2021
Materias:	Retrospective Study
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8568572/ https://www.ncbi.nlm.nih.gov/pubmed/34786176 http://dx.doi.org/10.4254/wjh.v13.i10.1417

_version_	1784594470915801088
author	Atsawarungruangkit, Amporn Laoveeravat, Passisd Promrat, Kittichai
author_facet	Atsawarungruangkit, Amporn Laoveeravat, Passisd Promrat, Kittichai
author_sort	Atsawarungruangkit, Amporn
collection	PubMed
description	BACKGROUND: Non-alcoholic fatty liver disease (NAFLD) is the most common chronic liver disease, affecting over 30% of the United States population. Early patient identification using a simple method is highly desirable. AIM: To create machine learning models for predicting NAFLD in the general United States population. METHODS: Using the NHANES 1988-1994. Thirty NAFLD-related factors were included. The dataset was divided into the training (70%) and testing (30%) datasets. Twenty-four machine learning algorithms were applied to the training dataset. The best-performing models and another interpretable model (i.e., coarse trees) were tested using the testing dataset. RESULTS: There were 3235 participants (n = 3235) that met the inclusion criteria. In the training phase, the ensemble of random undersampling (RUS) boosted trees had the highest F1 (0.53). In the testing phase, we compared selective machine learning models and NAFLD indices. Based on F1, the ensemble of RUS boosted trees remained the top performer (accuracy 71.1% and F1 0.56) followed by the fatty liver index (accuracy 68.8% and F1 0.52). A simple model (coarse trees) had an accuracy of 74.9% and an F1 of 0.33. CONCLUSION: Not every machine learning model is complex. Using a simpler model such as coarse trees, we can create an interpretable model for predicting NAFLD with only two predictors: fasting C-peptide and waist circumference. Although the simpler model does not have the best performance, its simplicity is useful in clinical practice.
format	Online Article Text
id	pubmed-8568572
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Baishideng Publishing Group Inc
record_format	MEDLINE/PubMed
spelling	pubmed-85685722021-11-15 Machine learning models for predicting non-alcoholic fatty liver disease in the general United States population: NHANES database Atsawarungruangkit, Amporn Laoveeravat, Passisd Promrat, Kittichai World J Hepatol Retrospective Study BACKGROUND: Non-alcoholic fatty liver disease (NAFLD) is the most common chronic liver disease, affecting over 30% of the United States population. Early patient identification using a simple method is highly desirable. AIM: To create machine learning models for predicting NAFLD in the general United States population. METHODS: Using the NHANES 1988-1994. Thirty NAFLD-related factors were included. The dataset was divided into the training (70%) and testing (30%) datasets. Twenty-four machine learning algorithms were applied to the training dataset. The best-performing models and another interpretable model (i.e., coarse trees) were tested using the testing dataset. RESULTS: There were 3235 participants (n = 3235) that met the inclusion criteria. In the training phase, the ensemble of random undersampling (RUS) boosted trees had the highest F1 (0.53). In the testing phase, we compared selective machine learning models and NAFLD indices. Based on F1, the ensemble of RUS boosted trees remained the top performer (accuracy 71.1% and F1 0.56) followed by the fatty liver index (accuracy 68.8% and F1 0.52). A simple model (coarse trees) had an accuracy of 74.9% and an F1 of 0.33. CONCLUSION: Not every machine learning model is complex. Using a simpler model such as coarse trees, we can create an interpretable model for predicting NAFLD with only two predictors: fasting C-peptide and waist circumference. Although the simpler model does not have the best performance, its simplicity is useful in clinical practice. Baishideng Publishing Group Inc 2021-10-27 2021-10-27 /pmc/articles/PMC8568572/ /pubmed/34786176 http://dx.doi.org/10.4254/wjh.v13.i10.1417 Text en ©The Author(s) 2021. Published by Baishideng Publishing Group Inc. All rights reserved. https://creativecommons.org/licenses/by-nc/4.0/This article is an open-access article that was selected by an in-house editor and fully peer-reviewed by external reviewers. It is distributed in accordance with the Creative Commons Attribution NonCommercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/Licenses/by-nc/4.0/
spellingShingle	Retrospective Study Atsawarungruangkit, Amporn Laoveeravat, Passisd Promrat, Kittichai Machine learning models for predicting non-alcoholic fatty liver disease in the general United States population: NHANES database
title	Machine learning models for predicting non-alcoholic fatty liver disease in the general United States population: NHANES database
title_full	Machine learning models for predicting non-alcoholic fatty liver disease in the general United States population: NHANES database
title_fullStr	Machine learning models for predicting non-alcoholic fatty liver disease in the general United States population: NHANES database
title_full_unstemmed	Machine learning models for predicting non-alcoholic fatty liver disease in the general United States population: NHANES database
title_short	Machine learning models for predicting non-alcoholic fatty liver disease in the general United States population: NHANES database
title_sort	machine learning models for predicting non-alcoholic fatty liver disease in the general united states population: nhanes database
topic	Retrospective Study
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8568572/ https://www.ncbi.nlm.nih.gov/pubmed/34786176 http://dx.doi.org/10.4254/wjh.v13.i10.1417
work_keys_str_mv	AT atsawarungruangkitamporn machinelearningmodelsforpredictingnonalcoholicfattyliverdiseaseinthegeneralunitedstatespopulationnhanesdatabase AT laoveeravatpassisd machinelearningmodelsforpredictingnonalcoholicfattyliverdiseaseinthegeneralunitedstatespopulationnhanesdatabase AT promratkittichai machinelearningmodelsforpredictingnonalcoholicfattyliverdiseaseinthegeneralunitedstatespopulationnhanesdatabase

Machine learning models for predicting non-alcoholic fatty liver disease in the general United States population: NHANES database

Ejemplares similares