Cargando…
Using Machine Learning to Predict Obesity Based on Genome-Wide and Epigenome-Wide Gene–Gene and Gene–Diet Interactions
Obesity is associated with many chronic diseases that impair healthy aging and is governed by genetic, epigenetic, and environmental factors and their complex interactions. This study aimed to develop a model that predicts an individual’s risk of obesity by better characterizing these complex relati...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8763388/ https://www.ncbi.nlm.nih.gov/pubmed/35047011 http://dx.doi.org/10.3389/fgene.2021.783845 |
_version_ | 1784633924588142592 |
---|---|
author | Lee, Yu-Chi Christensen, Jacob J. Parnell, Laurence D. Smith, Caren E. Shao, Jonathan McKeown, Nicola M. Ordovás, José M. Lai, Chao-Qiang |
author_facet | Lee, Yu-Chi Christensen, Jacob J. Parnell, Laurence D. Smith, Caren E. Shao, Jonathan McKeown, Nicola M. Ordovás, José M. Lai, Chao-Qiang |
author_sort | Lee, Yu-Chi |
collection | PubMed |
description | Obesity is associated with many chronic diseases that impair healthy aging and is governed by genetic, epigenetic, and environmental factors and their complex interactions. This study aimed to develop a model that predicts an individual’s risk of obesity by better characterizing these complex relations and interactions focusing on dietary factors. For this purpose, we conducted a combined genome-wide and epigenome-wide scan for body mass index (BMI) and up to three-way interactions among 402,793 single nucleotide polymorphisms (SNPs), 415,202 DNA methylation sites (DMSs), and 397 dietary and lifestyle factors using the generalized multifactor dimensionality reduction (GMDR) method. The training set consisted of 1,573 participants in exam 8 of the Framingham Offspring Study (FOS) cohort. After identifying genetic, epigenetic, and dietary factors that passed statistical significance, we applied machine learning (ML) algorithms to predict participants’ obesity status in the test set, taken as a subset of independent samples (n = 394) from the same cohort. The quality and accuracy of prediction models were evaluated using the area under the receiver operating characteristic curve (ROC-AUC). GMDR identified 213 SNPs, 530 DMSs, and 49 dietary and lifestyle factors as significant predictors of obesity. Comparing several ML algorithms, we found that the stochastic gradient boosting model provided the best prediction accuracy for obesity with an overall accuracy of 70%, with ROC-AUC of 0.72 in test set samples. Top predictors of the best-fit model were 21 SNPs, 230 DMSs in genes such as CPT1A, ABCG1, SLC7A11, RNF145, and SREBF1, and 26 dietary factors, including processed meat, diet soda, French fries, high-fat dairy, artificial sweeteners, alcohol intake, and specific nutrients and food components, such as calcium and flavonols. In conclusion, we developed an integrated approach with ML to predict obesity using omics and dietary data. This extends our knowledge of the drivers of obesity, which can inform precision nutrition strategies for the prevention and treatment of obesity. Clinical Trial Registration: [www.ClinicalTrials.gov], the Framingham Heart Study (FHS), [NCT00005121]. |
format | Online Article Text |
id | pubmed-8763388 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-87633882022-01-18 Using Machine Learning to Predict Obesity Based on Genome-Wide and Epigenome-Wide Gene–Gene and Gene–Diet Interactions Lee, Yu-Chi Christensen, Jacob J. Parnell, Laurence D. Smith, Caren E. Shao, Jonathan McKeown, Nicola M. Ordovás, José M. Lai, Chao-Qiang Front Genet Genetics Obesity is associated with many chronic diseases that impair healthy aging and is governed by genetic, epigenetic, and environmental factors and their complex interactions. This study aimed to develop a model that predicts an individual’s risk of obesity by better characterizing these complex relations and interactions focusing on dietary factors. For this purpose, we conducted a combined genome-wide and epigenome-wide scan for body mass index (BMI) and up to three-way interactions among 402,793 single nucleotide polymorphisms (SNPs), 415,202 DNA methylation sites (DMSs), and 397 dietary and lifestyle factors using the generalized multifactor dimensionality reduction (GMDR) method. The training set consisted of 1,573 participants in exam 8 of the Framingham Offspring Study (FOS) cohort. After identifying genetic, epigenetic, and dietary factors that passed statistical significance, we applied machine learning (ML) algorithms to predict participants’ obesity status in the test set, taken as a subset of independent samples (n = 394) from the same cohort. The quality and accuracy of prediction models were evaluated using the area under the receiver operating characteristic curve (ROC-AUC). GMDR identified 213 SNPs, 530 DMSs, and 49 dietary and lifestyle factors as significant predictors of obesity. Comparing several ML algorithms, we found that the stochastic gradient boosting model provided the best prediction accuracy for obesity with an overall accuracy of 70%, with ROC-AUC of 0.72 in test set samples. Top predictors of the best-fit model were 21 SNPs, 230 DMSs in genes such as CPT1A, ABCG1, SLC7A11, RNF145, and SREBF1, and 26 dietary factors, including processed meat, diet soda, French fries, high-fat dairy, artificial sweeteners, alcohol intake, and specific nutrients and food components, such as calcium and flavonols. In conclusion, we developed an integrated approach with ML to predict obesity using omics and dietary data. This extends our knowledge of the drivers of obesity, which can inform precision nutrition strategies for the prevention and treatment of obesity. Clinical Trial Registration: [www.ClinicalTrials.gov], the Framingham Heart Study (FHS), [NCT00005121]. Frontiers Media S.A. 2022-01-03 /pmc/articles/PMC8763388/ /pubmed/35047011 http://dx.doi.org/10.3389/fgene.2021.783845 Text en Copyright © 2022 Lee, Christensen, Parnell, Smith, Shao, McKeown, Ordovás and Lai. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Lee, Yu-Chi Christensen, Jacob J. Parnell, Laurence D. Smith, Caren E. Shao, Jonathan McKeown, Nicola M. Ordovás, José M. Lai, Chao-Qiang Using Machine Learning to Predict Obesity Based on Genome-Wide and Epigenome-Wide Gene–Gene and Gene–Diet Interactions |
title | Using Machine Learning to Predict Obesity Based on Genome-Wide and Epigenome-Wide Gene–Gene and Gene–Diet Interactions |
title_full | Using Machine Learning to Predict Obesity Based on Genome-Wide and Epigenome-Wide Gene–Gene and Gene–Diet Interactions |
title_fullStr | Using Machine Learning to Predict Obesity Based on Genome-Wide and Epigenome-Wide Gene–Gene and Gene–Diet Interactions |
title_full_unstemmed | Using Machine Learning to Predict Obesity Based on Genome-Wide and Epigenome-Wide Gene–Gene and Gene–Diet Interactions |
title_short | Using Machine Learning to Predict Obesity Based on Genome-Wide and Epigenome-Wide Gene–Gene and Gene–Diet Interactions |
title_sort | using machine learning to predict obesity based on genome-wide and epigenome-wide gene–gene and gene–diet interactions |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8763388/ https://www.ncbi.nlm.nih.gov/pubmed/35047011 http://dx.doi.org/10.3389/fgene.2021.783845 |
work_keys_str_mv | AT leeyuchi usingmachinelearningtopredictobesitybasedongenomewideandepigenomewidegenegeneandgenedietinteractions AT christensenjacobj usingmachinelearningtopredictobesitybasedongenomewideandepigenomewidegenegeneandgenedietinteractions AT parnelllaurenced usingmachinelearningtopredictobesitybasedongenomewideandepigenomewidegenegeneandgenedietinteractions AT smithcarene usingmachinelearningtopredictobesitybasedongenomewideandepigenomewidegenegeneandgenedietinteractions AT shaojonathan usingmachinelearningtopredictobesitybasedongenomewideandepigenomewidegenegeneandgenedietinteractions AT mckeownnicolam usingmachinelearningtopredictobesitybasedongenomewideandepigenomewidegenegeneandgenedietinteractions AT ordovasjosem usingmachinelearningtopredictobesitybasedongenomewideandepigenomewidegenegeneandgenedietinteractions AT laichaoqiang usingmachinelearningtopredictobesitybasedongenomewideandepigenomewidegenegeneandgenedietinteractions |