Cargando…

Pairwise Correlation Analysis of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) Dataset Reveals Significant Feature Correlation

The Alzheimer’s Disease Neuroimaging Initiative (ADNI) contains extensive patient measurements (e.g., magnetic resonance imaging [MRI], biometrics, RNA expression, etc.) from Alzheimer’s disease (AD) cases and controls that have recently been used by machine learning algorithms to evaluate AD onset...

Descripción completa

Detalles Bibliográficos
Autores principales: Huckvale, Erik D., Hodgman, Matthew W., Greenwood, Brianna B., Stucki, Devorah O., Ward, Katrisa M., Ebbert, Mark T. W., Kauwe, John S. K., Miller, Justin B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8619902/
https://www.ncbi.nlm.nih.gov/pubmed/34828267
http://dx.doi.org/10.3390/genes12111661
_version_ 1784605096929132544
author Huckvale, Erik D.
Hodgman, Matthew W.
Greenwood, Brianna B.
Stucki, Devorah O.
Ward, Katrisa M.
Ebbert, Mark T. W.
Kauwe, John S. K.
Miller, Justin B.
author_facet Huckvale, Erik D.
Hodgman, Matthew W.
Greenwood, Brianna B.
Stucki, Devorah O.
Ward, Katrisa M.
Ebbert, Mark T. W.
Kauwe, John S. K.
Miller, Justin B.
author_sort Huckvale, Erik D.
collection PubMed
description The Alzheimer’s Disease Neuroimaging Initiative (ADNI) contains extensive patient measurements (e.g., magnetic resonance imaging [MRI], biometrics, RNA expression, etc.) from Alzheimer’s disease (AD) cases and controls that have recently been used by machine learning algorithms to evaluate AD onset and progression. While using a variety of biomarkers is essential to AD research, highly correlated input features can significantly decrease machine learning model generalizability and performance. Additionally, redundant features unnecessarily increase computational time and resources necessary to train predictive models. Therefore, we used 49,288 biomarkers and 793,600 extracted MRI features to assess feature correlation within the ADNI dataset to determine the extent to which this issue might impact large scale analyses using these data. We found that 93.457% of biomarkers, 92.549% of the gene expression values, and 100% of MRI features were strongly correlated with at least one other feature in ADNI based on our Bonferroni corrected α (p-value ≤ 1.40754 × 10(−13)). We provide a comprehensive mapping of all ADNI biomarkers to highly correlated features within the dataset. Additionally, we show that significant correlation within the ADNI dataset should be resolved before performing bulk data analyses, and we provide recommendations to address these issues. We anticipate that these recommendations and resources will help guide researchers utilizing the ADNI dataset to increase model performance and reduce the cost and complexity of their analyses.
format Online
Article
Text
id pubmed-8619902
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-86199022021-11-27 Pairwise Correlation Analysis of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) Dataset Reveals Significant Feature Correlation Huckvale, Erik D. Hodgman, Matthew W. Greenwood, Brianna B. Stucki, Devorah O. Ward, Katrisa M. Ebbert, Mark T. W. Kauwe, John S. K. Miller, Justin B. Genes (Basel) Article The Alzheimer’s Disease Neuroimaging Initiative (ADNI) contains extensive patient measurements (e.g., magnetic resonance imaging [MRI], biometrics, RNA expression, etc.) from Alzheimer’s disease (AD) cases and controls that have recently been used by machine learning algorithms to evaluate AD onset and progression. While using a variety of biomarkers is essential to AD research, highly correlated input features can significantly decrease machine learning model generalizability and performance. Additionally, redundant features unnecessarily increase computational time and resources necessary to train predictive models. Therefore, we used 49,288 biomarkers and 793,600 extracted MRI features to assess feature correlation within the ADNI dataset to determine the extent to which this issue might impact large scale analyses using these data. We found that 93.457% of biomarkers, 92.549% of the gene expression values, and 100% of MRI features were strongly correlated with at least one other feature in ADNI based on our Bonferroni corrected α (p-value ≤ 1.40754 × 10(−13)). We provide a comprehensive mapping of all ADNI biomarkers to highly correlated features within the dataset. Additionally, we show that significant correlation within the ADNI dataset should be resolved before performing bulk data analyses, and we provide recommendations to address these issues. We anticipate that these recommendations and resources will help guide researchers utilizing the ADNI dataset to increase model performance and reduce the cost and complexity of their analyses. MDPI 2021-10-21 /pmc/articles/PMC8619902/ /pubmed/34828267 http://dx.doi.org/10.3390/genes12111661 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Huckvale, Erik D.
Hodgman, Matthew W.
Greenwood, Brianna B.
Stucki, Devorah O.
Ward, Katrisa M.
Ebbert, Mark T. W.
Kauwe, John S. K.
Miller, Justin B.
Pairwise Correlation Analysis of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) Dataset Reveals Significant Feature Correlation
title Pairwise Correlation Analysis of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) Dataset Reveals Significant Feature Correlation
title_full Pairwise Correlation Analysis of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) Dataset Reveals Significant Feature Correlation
title_fullStr Pairwise Correlation Analysis of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) Dataset Reveals Significant Feature Correlation
title_full_unstemmed Pairwise Correlation Analysis of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) Dataset Reveals Significant Feature Correlation
title_short Pairwise Correlation Analysis of the Alzheimer’s Disease Neuroimaging Initiative (ADNI) Dataset Reveals Significant Feature Correlation
title_sort pairwise correlation analysis of the alzheimer’s disease neuroimaging initiative (adni) dataset reveals significant feature correlation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8619902/
https://www.ncbi.nlm.nih.gov/pubmed/34828267
http://dx.doi.org/10.3390/genes12111661
work_keys_str_mv AT huckvaleerikd pairwisecorrelationanalysisofthealzheimersdiseaseneuroimaginginitiativeadnidatasetrevealssignificantfeaturecorrelation
AT hodgmanmattheww pairwisecorrelationanalysisofthealzheimersdiseaseneuroimaginginitiativeadnidatasetrevealssignificantfeaturecorrelation
AT greenwoodbriannab pairwisecorrelationanalysisofthealzheimersdiseaseneuroimaginginitiativeadnidatasetrevealssignificantfeaturecorrelation
AT stuckidevoraho pairwisecorrelationanalysisofthealzheimersdiseaseneuroimaginginitiativeadnidatasetrevealssignificantfeaturecorrelation
AT wardkatrisam pairwisecorrelationanalysisofthealzheimersdiseaseneuroimaginginitiativeadnidatasetrevealssignificantfeaturecorrelation
AT ebbertmarktw pairwisecorrelationanalysisofthealzheimersdiseaseneuroimaginginitiativeadnidatasetrevealssignificantfeaturecorrelation
AT kauwejohnsk pairwisecorrelationanalysisofthealzheimersdiseaseneuroimaginginitiativeadnidatasetrevealssignificantfeaturecorrelation
AT pairwisecorrelationanalysisofthealzheimersdiseaseneuroimaginginitiativeadnidatasetrevealssignificantfeaturecorrelation
AT pairwisecorrelationanalysisofthealzheimersdiseaseneuroimaginginitiativeadnidatasetrevealssignificantfeaturecorrelation
AT millerjustinb pairwisecorrelationanalysisofthealzheimersdiseaseneuroimaginginitiativeadnidatasetrevealssignificantfeaturecorrelation