Cargando…

Resample aggregating improves the generalizability of connectome predictive modeling

It is a longstanding goal of neuroimaging to produce reliable, generalizable models of brain behavior relationships. More recently, data driven predictive models have become popular. However, overfitting is a common problem with statistical models, which impedes model generalization. Cross validatio...

Descripción completa

Detalles Bibliográficos
Autores principales: O’Connor, David, Lake, Evelyn M.R., Scheinost, Dustin, Constable, R. Todd
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8282199/
https://www.ncbi.nlm.nih.gov/pubmed/33848621
http://dx.doi.org/10.1016/j.neuroimage.2021.118044
_version_ 1783722968370642944
author O’Connor, David
Lake, Evelyn M.R.
Scheinost, Dustin
Constable, R. Todd
author_facet O’Connor, David
Lake, Evelyn M.R.
Scheinost, Dustin
Constable, R. Todd
author_sort O’Connor, David
collection PubMed
description It is a longstanding goal of neuroimaging to produce reliable, generalizable models of brain behavior relationships. More recently, data driven predictive models have become popular. However, overfitting is a common problem with statistical models, which impedes model generalization. Cross validation (CV) is often used to estimate expected model performance within sample. Yet, the best way to generate brain behavior models, and apply them out-of-sample, on an unseen dataset, is unclear. As a solution, this study proposes an ensemble learning method, in this case resample aggregating, encompassing both model parameter estimation and feature selection. Here we investigate the use of resampled aggregated models when used to estimate fluid intelligence (fIQ) from fMRI based functional connectivity (FC) data. We take advantage of two large openly available datasets, the Human Connectome Project (HCP), and the Philadelphia Neurodevelopmental Cohort (PNC). We generate aggregated and non-aggregated models of fIQ in the HCP, using the Connectome Prediction Modelling (CPM) framework. Over various test-train splits, these models are evaluated in sample, on left-out HCP data, and out-of-sample, on PNC data. We find that a resample aggregated model performs best both within- and out-of-sample. We also find that feature selection can vary substantially within-sample. More robust feature selection methods, as detailed here, are needed to improve cross sample performance of CPM based brain behavior models.
format Online
Article
Text
id pubmed-8282199
institution National Center for Biotechnology Information
language English
publishDate 2021
record_format MEDLINE/PubMed
spelling pubmed-82821992021-08-01 Resample aggregating improves the generalizability of connectome predictive modeling O’Connor, David Lake, Evelyn M.R. Scheinost, Dustin Constable, R. Todd Neuroimage Article It is a longstanding goal of neuroimaging to produce reliable, generalizable models of brain behavior relationships. More recently, data driven predictive models have become popular. However, overfitting is a common problem with statistical models, which impedes model generalization. Cross validation (CV) is often used to estimate expected model performance within sample. Yet, the best way to generate brain behavior models, and apply them out-of-sample, on an unseen dataset, is unclear. As a solution, this study proposes an ensemble learning method, in this case resample aggregating, encompassing both model parameter estimation and feature selection. Here we investigate the use of resampled aggregated models when used to estimate fluid intelligence (fIQ) from fMRI based functional connectivity (FC) data. We take advantage of two large openly available datasets, the Human Connectome Project (HCP), and the Philadelphia Neurodevelopmental Cohort (PNC). We generate aggregated and non-aggregated models of fIQ in the HCP, using the Connectome Prediction Modelling (CPM) framework. Over various test-train splits, these models are evaluated in sample, on left-out HCP data, and out-of-sample, on PNC data. We find that a resample aggregated model performs best both within- and out-of-sample. We also find that feature selection can vary substantially within-sample. More robust feature selection methods, as detailed here, are needed to improve cross sample performance of CPM based brain behavior models. 2021-04-10 2021-08-01 /pmc/articles/PMC8282199/ /pubmed/33848621 http://dx.doi.org/10.1016/j.neuroimage.2021.118044 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) )
spellingShingle Article
O’Connor, David
Lake, Evelyn M.R.
Scheinost, Dustin
Constable, R. Todd
Resample aggregating improves the generalizability of connectome predictive modeling
title Resample aggregating improves the generalizability of connectome predictive modeling
title_full Resample aggregating improves the generalizability of connectome predictive modeling
title_fullStr Resample aggregating improves the generalizability of connectome predictive modeling
title_full_unstemmed Resample aggregating improves the generalizability of connectome predictive modeling
title_short Resample aggregating improves the generalizability of connectome predictive modeling
title_sort resample aggregating improves the generalizability of connectome predictive modeling
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8282199/
https://www.ncbi.nlm.nih.gov/pubmed/33848621
http://dx.doi.org/10.1016/j.neuroimage.2021.118044
work_keys_str_mv AT oconnordavid resampleaggregatingimprovesthegeneralizabilityofconnectomepredictivemodeling
AT lakeevelynmr resampleaggregatingimprovesthegeneralizabilityofconnectomepredictivemodeling
AT scheinostdustin resampleaggregatingimprovesthegeneralizabilityofconnectomepredictivemodeling
AT constablertodd resampleaggregatingimprovesthegeneralizabilityofconnectomepredictivemodeling