Cargando…

Prediction of Incident Diabetes in the Jackson Heart Study Using High-Dimensional Machine Learning

Statistical models to predict incident diabetes are often based on limited variables. Here we pursued two main goals: 1) investigate the relative performance of a machine learning method such as Random Forests (RF) for detecting incident diabetes in a high-dimensional setting defined by a large set...

Descripción completa

Detalles Bibliográficos
Autores principales: Casanova, Ramon, Saldana, Santiago, Simpson, Sean L., Lacy, Mary E., Subauste, Angela R., Blackshear, Chad, Wagenknecht, Lynne, Bertoni, Alain G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5058485/
https://www.ncbi.nlm.nih.gov/pubmed/27727289
http://dx.doi.org/10.1371/journal.pone.0163942
_version_ 1782459246526332928
author Casanova, Ramon
Saldana, Santiago
Simpson, Sean L.
Lacy, Mary E.
Subauste, Angela R.
Blackshear, Chad
Wagenknecht, Lynne
Bertoni, Alain G.
author_facet Casanova, Ramon
Saldana, Santiago
Simpson, Sean L.
Lacy, Mary E.
Subauste, Angela R.
Blackshear, Chad
Wagenknecht, Lynne
Bertoni, Alain G.
author_sort Casanova, Ramon
collection PubMed
description Statistical models to predict incident diabetes are often based on limited variables. Here we pursued two main goals: 1) investigate the relative performance of a machine learning method such as Random Forests (RF) for detecting incident diabetes in a high-dimensional setting defined by a large set of observational data, and 2) uncover potential predictors of diabetes. The Jackson Heart Study collected data at baseline and in two follow-up visits from 5,301 African Americans. We excluded those with baseline diabetes and no follow-up, leaving 3,633 individuals for analyses. Over a mean 8-year follow-up, 584 participants developed diabetes. The full RF model evaluated 93 variables including demographic, anthropometric, blood biomarker, medical history, and echocardiogram data. We also used RF metrics of variable importance to rank variables according to their contribution to diabetes prediction. We implemented other models based on logistic regression and RF where features were preselected. The RF full model performance was similar (AUC = 0.82) to those more parsimonious models. The top-ranked variables according to RF included hemoglobin A1C(,) fasting plasma glucose, waist circumference, adiponectin, c-reactive protein, triglycerides, leptin, left ventricular mass, high-density lipoprotein cholesterol, and aldosterone. This work shows the potential of RF for incident diabetes prediction while dealing with high-dimensional data.
format Online
Article
Text
id pubmed-5058485
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-50584852016-10-27 Prediction of Incident Diabetes in the Jackson Heart Study Using High-Dimensional Machine Learning Casanova, Ramon Saldana, Santiago Simpson, Sean L. Lacy, Mary E. Subauste, Angela R. Blackshear, Chad Wagenknecht, Lynne Bertoni, Alain G. PLoS One Research Article Statistical models to predict incident diabetes are often based on limited variables. Here we pursued two main goals: 1) investigate the relative performance of a machine learning method such as Random Forests (RF) for detecting incident diabetes in a high-dimensional setting defined by a large set of observational data, and 2) uncover potential predictors of diabetes. The Jackson Heart Study collected data at baseline and in two follow-up visits from 5,301 African Americans. We excluded those with baseline diabetes and no follow-up, leaving 3,633 individuals for analyses. Over a mean 8-year follow-up, 584 participants developed diabetes. The full RF model evaluated 93 variables including demographic, anthropometric, blood biomarker, medical history, and echocardiogram data. We also used RF metrics of variable importance to rank variables according to their contribution to diabetes prediction. We implemented other models based on logistic regression and RF where features were preselected. The RF full model performance was similar (AUC = 0.82) to those more parsimonious models. The top-ranked variables according to RF included hemoglobin A1C(,) fasting plasma glucose, waist circumference, adiponectin, c-reactive protein, triglycerides, leptin, left ventricular mass, high-density lipoprotein cholesterol, and aldosterone. This work shows the potential of RF for incident diabetes prediction while dealing with high-dimensional data. Public Library of Science 2016-10-11 /pmc/articles/PMC5058485/ /pubmed/27727289 http://dx.doi.org/10.1371/journal.pone.0163942 Text en © 2016 Casanova et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Casanova, Ramon
Saldana, Santiago
Simpson, Sean L.
Lacy, Mary E.
Subauste, Angela R.
Blackshear, Chad
Wagenknecht, Lynne
Bertoni, Alain G.
Prediction of Incident Diabetes in the Jackson Heart Study Using High-Dimensional Machine Learning
title Prediction of Incident Diabetes in the Jackson Heart Study Using High-Dimensional Machine Learning
title_full Prediction of Incident Diabetes in the Jackson Heart Study Using High-Dimensional Machine Learning
title_fullStr Prediction of Incident Diabetes in the Jackson Heart Study Using High-Dimensional Machine Learning
title_full_unstemmed Prediction of Incident Diabetes in the Jackson Heart Study Using High-Dimensional Machine Learning
title_short Prediction of Incident Diabetes in the Jackson Heart Study Using High-Dimensional Machine Learning
title_sort prediction of incident diabetes in the jackson heart study using high-dimensional machine learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5058485/
https://www.ncbi.nlm.nih.gov/pubmed/27727289
http://dx.doi.org/10.1371/journal.pone.0163942
work_keys_str_mv AT casanovaramon predictionofincidentdiabetesinthejacksonheartstudyusinghighdimensionalmachinelearning
AT saldanasantiago predictionofincidentdiabetesinthejacksonheartstudyusinghighdimensionalmachinelearning
AT simpsonseanl predictionofincidentdiabetesinthejacksonheartstudyusinghighdimensionalmachinelearning
AT lacymarye predictionofincidentdiabetesinthejacksonheartstudyusinghighdimensionalmachinelearning
AT subausteangelar predictionofincidentdiabetesinthejacksonheartstudyusinghighdimensionalmachinelearning
AT blackshearchad predictionofincidentdiabetesinthejacksonheartstudyusinghighdimensionalmachinelearning
AT wagenknechtlynne predictionofincidentdiabetesinthejacksonheartstudyusinghighdimensionalmachinelearning
AT bertonialaing predictionofincidentdiabetesinthejacksonheartstudyusinghighdimensionalmachinelearning