Cargando…
Prediction of Incident Diabetes in the Jackson Heart Study Using High-Dimensional Machine Learning
Statistical models to predict incident diabetes are often based on limited variables. Here we pursued two main goals: 1) investigate the relative performance of a machine learning method such as Random Forests (RF) for detecting incident diabetes in a high-dimensional setting defined by a large set...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5058485/ https://www.ncbi.nlm.nih.gov/pubmed/27727289 http://dx.doi.org/10.1371/journal.pone.0163942 |
_version_ | 1782459246526332928 |
---|---|
author | Casanova, Ramon Saldana, Santiago Simpson, Sean L. Lacy, Mary E. Subauste, Angela R. Blackshear, Chad Wagenknecht, Lynne Bertoni, Alain G. |
author_facet | Casanova, Ramon Saldana, Santiago Simpson, Sean L. Lacy, Mary E. Subauste, Angela R. Blackshear, Chad Wagenknecht, Lynne Bertoni, Alain G. |
author_sort | Casanova, Ramon |
collection | PubMed |
description | Statistical models to predict incident diabetes are often based on limited variables. Here we pursued two main goals: 1) investigate the relative performance of a machine learning method such as Random Forests (RF) for detecting incident diabetes in a high-dimensional setting defined by a large set of observational data, and 2) uncover potential predictors of diabetes. The Jackson Heart Study collected data at baseline and in two follow-up visits from 5,301 African Americans. We excluded those with baseline diabetes and no follow-up, leaving 3,633 individuals for analyses. Over a mean 8-year follow-up, 584 participants developed diabetes. The full RF model evaluated 93 variables including demographic, anthropometric, blood biomarker, medical history, and echocardiogram data. We also used RF metrics of variable importance to rank variables according to their contribution to diabetes prediction. We implemented other models based on logistic regression and RF where features were preselected. The RF full model performance was similar (AUC = 0.82) to those more parsimonious models. The top-ranked variables according to RF included hemoglobin A1C(,) fasting plasma glucose, waist circumference, adiponectin, c-reactive protein, triglycerides, leptin, left ventricular mass, high-density lipoprotein cholesterol, and aldosterone. This work shows the potential of RF for incident diabetes prediction while dealing with high-dimensional data. |
format | Online Article Text |
id | pubmed-5058485 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-50584852016-10-27 Prediction of Incident Diabetes in the Jackson Heart Study Using High-Dimensional Machine Learning Casanova, Ramon Saldana, Santiago Simpson, Sean L. Lacy, Mary E. Subauste, Angela R. Blackshear, Chad Wagenknecht, Lynne Bertoni, Alain G. PLoS One Research Article Statistical models to predict incident diabetes are often based on limited variables. Here we pursued two main goals: 1) investigate the relative performance of a machine learning method such as Random Forests (RF) for detecting incident diabetes in a high-dimensional setting defined by a large set of observational data, and 2) uncover potential predictors of diabetes. The Jackson Heart Study collected data at baseline and in two follow-up visits from 5,301 African Americans. We excluded those with baseline diabetes and no follow-up, leaving 3,633 individuals for analyses. Over a mean 8-year follow-up, 584 participants developed diabetes. The full RF model evaluated 93 variables including demographic, anthropometric, blood biomarker, medical history, and echocardiogram data. We also used RF metrics of variable importance to rank variables according to their contribution to diabetes prediction. We implemented other models based on logistic regression and RF where features were preselected. The RF full model performance was similar (AUC = 0.82) to those more parsimonious models. The top-ranked variables according to RF included hemoglobin A1C(,) fasting plasma glucose, waist circumference, adiponectin, c-reactive protein, triglycerides, leptin, left ventricular mass, high-density lipoprotein cholesterol, and aldosterone. This work shows the potential of RF for incident diabetes prediction while dealing with high-dimensional data. Public Library of Science 2016-10-11 /pmc/articles/PMC5058485/ /pubmed/27727289 http://dx.doi.org/10.1371/journal.pone.0163942 Text en © 2016 Casanova et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Casanova, Ramon Saldana, Santiago Simpson, Sean L. Lacy, Mary E. Subauste, Angela R. Blackshear, Chad Wagenknecht, Lynne Bertoni, Alain G. Prediction of Incident Diabetes in the Jackson Heart Study Using High-Dimensional Machine Learning |
title | Prediction of Incident Diabetes in the Jackson Heart Study Using High-Dimensional Machine Learning |
title_full | Prediction of Incident Diabetes in the Jackson Heart Study Using High-Dimensional Machine Learning |
title_fullStr | Prediction of Incident Diabetes in the Jackson Heart Study Using High-Dimensional Machine Learning |
title_full_unstemmed | Prediction of Incident Diabetes in the Jackson Heart Study Using High-Dimensional Machine Learning |
title_short | Prediction of Incident Diabetes in the Jackson Heart Study Using High-Dimensional Machine Learning |
title_sort | prediction of incident diabetes in the jackson heart study using high-dimensional machine learning |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5058485/ https://www.ncbi.nlm.nih.gov/pubmed/27727289 http://dx.doi.org/10.1371/journal.pone.0163942 |
work_keys_str_mv | AT casanovaramon predictionofincidentdiabetesinthejacksonheartstudyusinghighdimensionalmachinelearning AT saldanasantiago predictionofincidentdiabetesinthejacksonheartstudyusinghighdimensionalmachinelearning AT simpsonseanl predictionofincidentdiabetesinthejacksonheartstudyusinghighdimensionalmachinelearning AT lacymarye predictionofincidentdiabetesinthejacksonheartstudyusinghighdimensionalmachinelearning AT subausteangelar predictionofincidentdiabetesinthejacksonheartstudyusinghighdimensionalmachinelearning AT blackshearchad predictionofincidentdiabetesinthejacksonheartstudyusinghighdimensionalmachinelearning AT wagenknechtlynne predictionofincidentdiabetesinthejacksonheartstudyusinghighdimensionalmachinelearning AT bertonialaing predictionofincidentdiabetesinthejacksonheartstudyusinghighdimensionalmachinelearning |