Cargando…

Using Machine Learning to Predict Obesity Based on Genome-Wide and Epigenome-Wide Gene–Gene and Gene–Diet Interactions

Obesity is associated with many chronic diseases that impair healthy aging and is governed by genetic, epigenetic, and environmental factors and their complex interactions. This study aimed to develop a model that predicts an individual’s risk of obesity by better characterizing these complex relati...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Yu-Chi, Christensen, Jacob J., Parnell, Laurence D., Smith, Caren E., Shao, Jonathan, McKeown, Nicola M., Ordovás, José M., Lai, Chao-Qiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8763388/
https://www.ncbi.nlm.nih.gov/pubmed/35047011
http://dx.doi.org/10.3389/fgene.2021.783845
_version_ 1784633924588142592
author Lee, Yu-Chi
Christensen, Jacob J.
Parnell, Laurence D.
Smith, Caren E.
Shao, Jonathan
McKeown, Nicola M.
Ordovás, José M.
Lai, Chao-Qiang
author_facet Lee, Yu-Chi
Christensen, Jacob J.
Parnell, Laurence D.
Smith, Caren E.
Shao, Jonathan
McKeown, Nicola M.
Ordovás, José M.
Lai, Chao-Qiang
author_sort Lee, Yu-Chi
collection PubMed
description Obesity is associated with many chronic diseases that impair healthy aging and is governed by genetic, epigenetic, and environmental factors and their complex interactions. This study aimed to develop a model that predicts an individual’s risk of obesity by better characterizing these complex relations and interactions focusing on dietary factors. For this purpose, we conducted a combined genome-wide and epigenome-wide scan for body mass index (BMI) and up to three-way interactions among 402,793 single nucleotide polymorphisms (SNPs), 415,202 DNA methylation sites (DMSs), and 397 dietary and lifestyle factors using the generalized multifactor dimensionality reduction (GMDR) method. The training set consisted of 1,573 participants in exam 8 of the Framingham Offspring Study (FOS) cohort. After identifying genetic, epigenetic, and dietary factors that passed statistical significance, we applied machine learning (ML) algorithms to predict participants’ obesity status in the test set, taken as a subset of independent samples (n = 394) from the same cohort. The quality and accuracy of prediction models were evaluated using the area under the receiver operating characteristic curve (ROC-AUC). GMDR identified 213 SNPs, 530 DMSs, and 49 dietary and lifestyle factors as significant predictors of obesity. Comparing several ML algorithms, we found that the stochastic gradient boosting model provided the best prediction accuracy for obesity with an overall accuracy of 70%, with ROC-AUC of 0.72 in test set samples. Top predictors of the best-fit model were 21 SNPs, 230 DMSs in genes such as CPT1A, ABCG1, SLC7A11, RNF145, and SREBF1, and 26 dietary factors, including processed meat, diet soda, French fries, high-fat dairy, artificial sweeteners, alcohol intake, and specific nutrients and food components, such as calcium and flavonols. In conclusion, we developed an integrated approach with ML to predict obesity using omics and dietary data. This extends our knowledge of the drivers of obesity, which can inform precision nutrition strategies for the prevention and treatment of obesity. Clinical Trial Registration: [www.ClinicalTrials.gov], the Framingham Heart Study (FHS), [NCT00005121].
format Online
Article
Text
id pubmed-8763388
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-87633882022-01-18 Using Machine Learning to Predict Obesity Based on Genome-Wide and Epigenome-Wide Gene–Gene and Gene–Diet Interactions Lee, Yu-Chi Christensen, Jacob J. Parnell, Laurence D. Smith, Caren E. Shao, Jonathan McKeown, Nicola M. Ordovás, José M. Lai, Chao-Qiang Front Genet Genetics Obesity is associated with many chronic diseases that impair healthy aging and is governed by genetic, epigenetic, and environmental factors and their complex interactions. This study aimed to develop a model that predicts an individual’s risk of obesity by better characterizing these complex relations and interactions focusing on dietary factors. For this purpose, we conducted a combined genome-wide and epigenome-wide scan for body mass index (BMI) and up to three-way interactions among 402,793 single nucleotide polymorphisms (SNPs), 415,202 DNA methylation sites (DMSs), and 397 dietary and lifestyle factors using the generalized multifactor dimensionality reduction (GMDR) method. The training set consisted of 1,573 participants in exam 8 of the Framingham Offspring Study (FOS) cohort. After identifying genetic, epigenetic, and dietary factors that passed statistical significance, we applied machine learning (ML) algorithms to predict participants’ obesity status in the test set, taken as a subset of independent samples (n = 394) from the same cohort. The quality and accuracy of prediction models were evaluated using the area under the receiver operating characteristic curve (ROC-AUC). GMDR identified 213 SNPs, 530 DMSs, and 49 dietary and lifestyle factors as significant predictors of obesity. Comparing several ML algorithms, we found that the stochastic gradient boosting model provided the best prediction accuracy for obesity with an overall accuracy of 70%, with ROC-AUC of 0.72 in test set samples. Top predictors of the best-fit model were 21 SNPs, 230 DMSs in genes such as CPT1A, ABCG1, SLC7A11, RNF145, and SREBF1, and 26 dietary factors, including processed meat, diet soda, French fries, high-fat dairy, artificial sweeteners, alcohol intake, and specific nutrients and food components, such as calcium and flavonols. In conclusion, we developed an integrated approach with ML to predict obesity using omics and dietary data. This extends our knowledge of the drivers of obesity, which can inform precision nutrition strategies for the prevention and treatment of obesity. Clinical Trial Registration: [www.ClinicalTrials.gov], the Framingham Heart Study (FHS), [NCT00005121]. Frontiers Media S.A. 2022-01-03 /pmc/articles/PMC8763388/ /pubmed/35047011 http://dx.doi.org/10.3389/fgene.2021.783845 Text en Copyright © 2022 Lee, Christensen, Parnell, Smith, Shao, McKeown, Ordovás and Lai. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Lee, Yu-Chi
Christensen, Jacob J.
Parnell, Laurence D.
Smith, Caren E.
Shao, Jonathan
McKeown, Nicola M.
Ordovás, José M.
Lai, Chao-Qiang
Using Machine Learning to Predict Obesity Based on Genome-Wide and Epigenome-Wide Gene–Gene and Gene–Diet Interactions
title Using Machine Learning to Predict Obesity Based on Genome-Wide and Epigenome-Wide Gene–Gene and Gene–Diet Interactions
title_full Using Machine Learning to Predict Obesity Based on Genome-Wide and Epigenome-Wide Gene–Gene and Gene–Diet Interactions
title_fullStr Using Machine Learning to Predict Obesity Based on Genome-Wide and Epigenome-Wide Gene–Gene and Gene–Diet Interactions
title_full_unstemmed Using Machine Learning to Predict Obesity Based on Genome-Wide and Epigenome-Wide Gene–Gene and Gene–Diet Interactions
title_short Using Machine Learning to Predict Obesity Based on Genome-Wide and Epigenome-Wide Gene–Gene and Gene–Diet Interactions
title_sort using machine learning to predict obesity based on genome-wide and epigenome-wide gene–gene and gene–diet interactions
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8763388/
https://www.ncbi.nlm.nih.gov/pubmed/35047011
http://dx.doi.org/10.3389/fgene.2021.783845
work_keys_str_mv AT leeyuchi usingmachinelearningtopredictobesitybasedongenomewideandepigenomewidegenegeneandgenedietinteractions
AT christensenjacobj usingmachinelearningtopredictobesitybasedongenomewideandepigenomewidegenegeneandgenedietinteractions
AT parnelllaurenced usingmachinelearningtopredictobesitybasedongenomewideandepigenomewidegenegeneandgenedietinteractions
AT smithcarene usingmachinelearningtopredictobesitybasedongenomewideandepigenomewidegenegeneandgenedietinteractions
AT shaojonathan usingmachinelearningtopredictobesitybasedongenomewideandepigenomewidegenegeneandgenedietinteractions
AT mckeownnicolam usingmachinelearningtopredictobesitybasedongenomewideandepigenomewidegenegeneandgenedietinteractions
AT ordovasjosem usingmachinelearningtopredictobesitybasedongenomewideandepigenomewidegenegeneandgenedietinteractions
AT laichaoqiang usingmachinelearningtopredictobesitybasedongenomewideandepigenomewidegenegeneandgenedietinteractions