Cargando…

Evaluation of Prediction-Oriented Model Selection Metrics for Extended Redundancy Analysis

Extended redundancy analysis (ERA) is a statistical method that relates multiple sets of predictors to response variables. In ERA, the conventional approach of model evaluation tends to overestimate the performance of a model since the performance is assessed using the same sample used for model dev...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Sunmee, Hwang, Heungsun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9035639/
https://www.ncbi.nlm.nih.gov/pubmed/35478763
http://dx.doi.org/10.3389/fpsyg.2022.821897
Descripción
Sumario:Extended redundancy analysis (ERA) is a statistical method that relates multiple sets of predictors to response variables. In ERA, the conventional approach of model evaluation tends to overestimate the performance of a model since the performance is assessed using the same sample used for model development. To avoid the overly optimistic assessment, we introduce a new model evaluation approach for ERA, which utilizes computer-intensive resampling methods to assess how well a model performs on unseen data. Specifically, we suggest several new model evaluation metrics for ERA that compute a model’s performance on out-of-sample data, i.e., data not used for model development. Although considerable work has been done in machine learning and statistics to examine the utility of cross-validation and bootstrap variants for assessing such out-of-sample predictive performance, to date, no research has been carried out in the context of ERA. We use simulated and real data examples to compare the proposed model evaluation approach with the conventional one. Results show the conventional approach always favor more complex ERA models, thereby failing to prevent the problem of overfitting in model selection. Conversely, the proposed approach can select the true ERA model among many mis-specified (i.e., underfitted and overfitted) models.