Cargando…

oFVSD: a Python package of optimized forward variable selection decoder for high-dimensional neuroimaging data

The complexity and high dimensionality of neuroimaging data pose problems for decoding information with machine learning (ML) models because the number of features is often much larger than the number of observations. Feature selection is one of the crucial steps for determining meaningful target fe...

Descripción completa

Detalles Bibliográficos
Autores principales: Dang, Tung, Fermin, Alan S. R., Machizawa, Maro G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10566623/
https://www.ncbi.nlm.nih.gov/pubmed/37829329
http://dx.doi.org/10.3389/fninf.2023.1266713
_version_ 1785118950259949568
author Dang, Tung
Fermin, Alan S. R.
Machizawa, Maro G.
author_facet Dang, Tung
Fermin, Alan S. R.
Machizawa, Maro G.
author_sort Dang, Tung
collection PubMed
description The complexity and high dimensionality of neuroimaging data pose problems for decoding information with machine learning (ML) models because the number of features is often much larger than the number of observations. Feature selection is one of the crucial steps for determining meaningful target features in decoding; however, optimizing the feature selection from such high-dimensional neuroimaging data has been challenging using conventional ML models. Here, we introduce an efficient and high-performance decoding package incorporating a forward variable selection (FVS) algorithm and hyper-parameter optimization that automatically identifies the best feature pairs for both classification and regression models, where a total of 18 ML models are implemented by default. First, the FVS algorithm evaluates the goodness-of-fit across different models using the k-fold cross-validation step that identifies the best subset of features based on a predefined criterion for each model. Next, the hyperparameters of each ML model are optimized at each forward iteration. Final outputs highlight an optimized number of selected features (brain regions of interest) for each model with its accuracy. Furthermore, the toolbox can be executed in a parallel environment for efficient computation on a typical personal computer. With the optimized forward variable selection decoder (oFVSD) pipeline, we verified the effectiveness of decoding sex classification and age range regression on 1,113 structural magnetic resonance imaging (MRI) datasets. Compared to ML models without the FVS algorithm and with the Boruta algorithm as a variable selection counterpart, we demonstrate that the oFVSD significantly outperformed across all of the ML models over the counterpart models without FVS (approximately 0.20 increase in correlation coefficient, r, with regression models and 8% increase in classification models on average) and with Boruta variable selection algorithm (approximately 0.07 improvement in regression and 4% in classification models). Furthermore, we confirmed the use of parallel computation considerably reduced the computational burden for the high-dimensional MRI data. Altogether, the oFVSD toolbox efficiently and effectively improves the performance of both classification and regression ML models, providing a use case example on MRI datasets. With its flexibility, oFVSD has the potential for many other modalities in neuroimaging. This open-source and freely available Python package makes it a valuable toolbox for research communities seeking improved decoding accuracy.
format Online
Article
Text
id pubmed-10566623
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-105666232023-10-12 oFVSD: a Python package of optimized forward variable selection decoder for high-dimensional neuroimaging data Dang, Tung Fermin, Alan S. R. Machizawa, Maro G. Front Neuroinform Neuroscience The complexity and high dimensionality of neuroimaging data pose problems for decoding information with machine learning (ML) models because the number of features is often much larger than the number of observations. Feature selection is one of the crucial steps for determining meaningful target features in decoding; however, optimizing the feature selection from such high-dimensional neuroimaging data has been challenging using conventional ML models. Here, we introduce an efficient and high-performance decoding package incorporating a forward variable selection (FVS) algorithm and hyper-parameter optimization that automatically identifies the best feature pairs for both classification and regression models, where a total of 18 ML models are implemented by default. First, the FVS algorithm evaluates the goodness-of-fit across different models using the k-fold cross-validation step that identifies the best subset of features based on a predefined criterion for each model. Next, the hyperparameters of each ML model are optimized at each forward iteration. Final outputs highlight an optimized number of selected features (brain regions of interest) for each model with its accuracy. Furthermore, the toolbox can be executed in a parallel environment for efficient computation on a typical personal computer. With the optimized forward variable selection decoder (oFVSD) pipeline, we verified the effectiveness of decoding sex classification and age range regression on 1,113 structural magnetic resonance imaging (MRI) datasets. Compared to ML models without the FVS algorithm and with the Boruta algorithm as a variable selection counterpart, we demonstrate that the oFVSD significantly outperformed across all of the ML models over the counterpart models without FVS (approximately 0.20 increase in correlation coefficient, r, with regression models and 8% increase in classification models on average) and with Boruta variable selection algorithm (approximately 0.07 improvement in regression and 4% in classification models). Furthermore, we confirmed the use of parallel computation considerably reduced the computational burden for the high-dimensional MRI data. Altogether, the oFVSD toolbox efficiently and effectively improves the performance of both classification and regression ML models, providing a use case example on MRI datasets. With its flexibility, oFVSD has the potential for many other modalities in neuroimaging. This open-source and freely available Python package makes it a valuable toolbox for research communities seeking improved decoding accuracy. Frontiers Media S.A. 2023-09-26 /pmc/articles/PMC10566623/ /pubmed/37829329 http://dx.doi.org/10.3389/fninf.2023.1266713 Text en Copyright © 2023 Dang, Fermin and Machizawa. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Neuroscience
Dang, Tung
Fermin, Alan S. R.
Machizawa, Maro G.
oFVSD: a Python package of optimized forward variable selection decoder for high-dimensional neuroimaging data
title oFVSD: a Python package of optimized forward variable selection decoder for high-dimensional neuroimaging data
title_full oFVSD: a Python package of optimized forward variable selection decoder for high-dimensional neuroimaging data
title_fullStr oFVSD: a Python package of optimized forward variable selection decoder for high-dimensional neuroimaging data
title_full_unstemmed oFVSD: a Python package of optimized forward variable selection decoder for high-dimensional neuroimaging data
title_short oFVSD: a Python package of optimized forward variable selection decoder for high-dimensional neuroimaging data
title_sort ofvsd: a python package of optimized forward variable selection decoder for high-dimensional neuroimaging data
topic Neuroscience
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10566623/
https://www.ncbi.nlm.nih.gov/pubmed/37829329
http://dx.doi.org/10.3389/fninf.2023.1266713
work_keys_str_mv AT dangtung ofvsdapythonpackageofoptimizedforwardvariableselectiondecoderforhighdimensionalneuroimagingdata
AT ferminalansr ofvsdapythonpackageofoptimizedforwardvariableselectiondecoderforhighdimensionalneuroimagingdata
AT machizawamarog ofvsdapythonpackageofoptimizedforwardvariableselectiondecoderforhighdimensionalneuroimagingdata