Cargando…

Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables

The selection of a descriptor, X, is crucial for improving the interpretation and prediction accuracy of a regression model. In this study, the prediction accuracy of models constructed using the selected X was determined and the results of variable selection, according to the number of selected X a...

Descripción completa

Detalles Bibliográficos
Autor principal: Kaneko, Hiromasa
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8237311/
https://www.ncbi.nlm.nih.gov/pubmed/34195450
http://dx.doi.org/10.1016/j.heliyon.2021.e07356
_version_ 1783714705835032576
author Kaneko, Hiromasa
author_facet Kaneko, Hiromasa
author_sort Kaneko, Hiromasa
collection PubMed
description The selection of a descriptor, X, is crucial for improving the interpretation and prediction accuracy of a regression model. In this study, the prediction accuracy of models constructed using the selected X was determined and the results of variable selection, according to the number of selected X and number of selected variables that are unrelated to an objective variable, such as activities and properties (y), were investigated to evaluate the variable or feature selection methods. Variable selection methods include least absolute shrinkage and selection operator, genetic algorithm-based partial least squares, genetic algorithm-based support vector regression, and Boruta. Several regression analysis methods were used to test the prediction accuracy of the model constructed using the selected X. The characteristics of each variable selection method were analyzed using eight datasets. The results showed that even when variables unrelated to y were selected by variable selection and the number of unrelated variables was the same as the number of the original variables, a regression model with good accuracy, which ignores the influence of such noise variables, can be constructed by applying various regression analysis methods. Additionally, the variables related to y must not to be deleted. These findings provide a basis for improving the variable selection methods.
format Online
Article
Text
id pubmed-8237311
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-82373112021-06-29 Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables Kaneko, Hiromasa Heliyon Research Article The selection of a descriptor, X, is crucial for improving the interpretation and prediction accuracy of a regression model. In this study, the prediction accuracy of models constructed using the selected X was determined and the results of variable selection, according to the number of selected X and number of selected variables that are unrelated to an objective variable, such as activities and properties (y), were investigated to evaluate the variable or feature selection methods. Variable selection methods include least absolute shrinkage and selection operator, genetic algorithm-based partial least squares, genetic algorithm-based support vector regression, and Boruta. Several regression analysis methods were used to test the prediction accuracy of the model constructed using the selected X. The characteristics of each variable selection method were analyzed using eight datasets. The results showed that even when variables unrelated to y were selected by variable selection and the number of unrelated variables was the same as the number of the original variables, a regression model with good accuracy, which ignores the influence of such noise variables, can be constructed by applying various regression analysis methods. Additionally, the variables related to y must not to be deleted. These findings provide a basis for improving the variable selection methods. Elsevier 2021-06-18 /pmc/articles/PMC8237311/ /pubmed/34195450 http://dx.doi.org/10.1016/j.heliyon.2021.e07356 Text en © 2021 The Author(s) https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Research Article
Kaneko, Hiromasa
Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables
title Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables
title_full Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables
title_fullStr Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables
title_full_unstemmed Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables
title_short Examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables
title_sort examining variable selection methods for the predictive performance of regression models and the proportion of selected variables and selected random variables
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8237311/
https://www.ncbi.nlm.nih.gov/pubmed/34195450
http://dx.doi.org/10.1016/j.heliyon.2021.e07356
work_keys_str_mv AT kanekohiromasa examiningvariableselectionmethodsforthepredictiveperformanceofregressionmodelsandtheproportionofselectedvariablesandselectedrandomvariables