Cargando…

Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge Ensembles

Feature bagging is a well-established ensembling method which aims to reduce prediction variance by combining predictions of many estimators trained on subsets or projections of features. Here, we develop a theory of feature-bagging in noisy least-squares ridge ensembles and simplify the resulting l...

Descripción completa

Detalles Bibliográficos
Autores principales: Ruben, Benjamin S., Pehlevan, Cengiz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cornell University 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10350086/
https://www.ncbi.nlm.nih.gov/pubmed/37461424
_version_ 1785074056182104064
author Ruben, Benjamin S.
Pehlevan, Cengiz
author_facet Ruben, Benjamin S.
Pehlevan, Cengiz
author_sort Ruben, Benjamin S.
collection PubMed
description Feature bagging is a well-established ensembling method which aims to reduce prediction variance by combining predictions of many estimators trained on subsets or projections of features. Here, we develop a theory of feature-bagging in noisy least-squares ridge ensembles and simplify the resulting learning curves in the special case of equicorrelated data. Using analytical learning curves, we demonstrate that subsampling shifts the double-descent peak of a linear predictor. This leads us to introduce heterogeneous feature ensembling, with estimators built on varying numbers of feature dimensions, as a computationally efficient method to mitigate double-descent. Then, we compare the performance of a feature-subsampling ensemble to a single linear predictor, describing a trade-off between noise amplification due to subsampling and noise reduction due to ensembling. Our qualitative insights carry over to linear classifiers applied to image classification tasks with realistic datasets constructed using a state-of-the-art deep learning feature map.
format Online
Article
Text
id pubmed-10350086
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cornell University
record_format MEDLINE/PubMed
spelling pubmed-103500862023-07-17 Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge Ensembles Ruben, Benjamin S. Pehlevan, Cengiz ArXiv Article Feature bagging is a well-established ensembling method which aims to reduce prediction variance by combining predictions of many estimators trained on subsets or projections of features. Here, we develop a theory of feature-bagging in noisy least-squares ridge ensembles and simplify the resulting learning curves in the special case of equicorrelated data. Using analytical learning curves, we demonstrate that subsampling shifts the double-descent peak of a linear predictor. This leads us to introduce heterogeneous feature ensembling, with estimators built on varying numbers of feature dimensions, as a computationally efficient method to mitigate double-descent. Then, we compare the performance of a feature-subsampling ensemble to a single linear predictor, describing a trade-off between noise amplification due to subsampling and noise reduction due to ensembling. Our qualitative insights carry over to linear classifiers applied to image classification tasks with realistic datasets constructed using a state-of-the-art deep learning feature map. Cornell University 2023-10-31 /pmc/articles/PMC10350086/ /pubmed/37461424 Text en https://creativecommons.org/licenses/by-sa/4.0/This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License (https://creativecommons.org/licenses/by-sa/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. If you remix, adapt, or build upon the material, you must license the modified material under identical terms.
spellingShingle Article
Ruben, Benjamin S.
Pehlevan, Cengiz
Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge Ensembles
title Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge Ensembles
title_full Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge Ensembles
title_fullStr Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge Ensembles
title_full_unstemmed Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge Ensembles
title_short Learning Curves for Noisy Heterogeneous Feature-Subsampled Ridge Ensembles
title_sort learning curves for noisy heterogeneous feature-subsampled ridge ensembles
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10350086/
https://www.ncbi.nlm.nih.gov/pubmed/37461424
work_keys_str_mv AT rubenbenjamins learningcurvesfornoisyheterogeneousfeaturesubsampledridgeensembles
AT pehlevancengiz learningcurvesfornoisyheterogeneousfeaturesubsampledridgeensembles