Cargando…

Is Bagging Effective in the Classification of Small-Sample Genomic and Proteomic Data?

There has been considerable interest recently in the application of bagging in the classification of both gene-expression data and protein-abundance mass spectrometry data. The approach is often justified by the improvement it produces on the performance of unstable, overfitting classification rules...

Descripción completa

Detalles Bibliográficos
Autores principales: Vu, TT, Braga-Neto, UM
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3171418/
https://www.ncbi.nlm.nih.gov/pubmed/19390645
http://dx.doi.org/10.1155/2009/158368
_version_ 1782211758159560704
author Vu, TT
Braga-Neto, UM
author_facet Vu, TT
Braga-Neto, UM
author_sort Vu, TT
collection PubMed
description There has been considerable interest recently in the application of bagging in the classification of both gene-expression data and protein-abundance mass spectrometry data. The approach is often justified by the improvement it produces on the performance of unstable, overfitting classification rules under small-sample situations. However, the question of real practical interest is whether the ensemble scheme will improve performance of those classifiers sufficiently to beat the performance of single stable, nonoverfitting classifiers, in the case of small-sample genomic and proteomic data sets. To investigate that question, we conducted a detailed empirical study, using publicly-available data sets from published genomic and proteomic studies. We observed that, under t-test and RELIEF filter-based feature selection, bagging generally does a good job of improving the performance of unstable, overfitting classifiers, such as CART decision trees and neural networks, but that improvement was not sufficient to beat the performance of single stable, nonoverfitting classifiers, such as diagonal and plain linear discriminant analysis, or 3-nearest neighbors. Furthermore, as expected, the ensemble method did not improve the performance of these classifiers significantly. Representative experimental results are presented and discussed in this work.
format Online
Article
Text
id pubmed-3171418
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Springer
record_format MEDLINE/PubMed
spelling pubmed-31714182011-09-13 Is Bagging Effective in the Classification of Small-Sample Genomic and Proteomic Data? Vu, TT Braga-Neto, UM EURASIP J Bioinform Syst Biol Research Article There has been considerable interest recently in the application of bagging in the classification of both gene-expression data and protein-abundance mass spectrometry data. The approach is often justified by the improvement it produces on the performance of unstable, overfitting classification rules under small-sample situations. However, the question of real practical interest is whether the ensemble scheme will improve performance of those classifiers sufficiently to beat the performance of single stable, nonoverfitting classifiers, in the case of small-sample genomic and proteomic data sets. To investigate that question, we conducted a detailed empirical study, using publicly-available data sets from published genomic and proteomic studies. We observed that, under t-test and RELIEF filter-based feature selection, bagging generally does a good job of improving the performance of unstable, overfitting classifiers, such as CART decision trees and neural networks, but that improvement was not sufficient to beat the performance of single stable, nonoverfitting classifiers, such as diagonal and plain linear discriminant analysis, or 3-nearest neighbors. Furthermore, as expected, the ensemble method did not improve the performance of these classifiers significantly. Representative experimental results are presented and discussed in this work. Springer 2009-02-24 /pmc/articles/PMC3171418/ /pubmed/19390645 http://dx.doi.org/10.1155/2009/158368 Text en Copyright © 2009 T. T. Vu and U. M. Braga-Neto. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Vu, TT
Braga-Neto, UM
Is Bagging Effective in the Classification of Small-Sample Genomic and Proteomic Data?
title Is Bagging Effective in the Classification of Small-Sample Genomic and Proteomic Data?
title_full Is Bagging Effective in the Classification of Small-Sample Genomic and Proteomic Data?
title_fullStr Is Bagging Effective in the Classification of Small-Sample Genomic and Proteomic Data?
title_full_unstemmed Is Bagging Effective in the Classification of Small-Sample Genomic and Proteomic Data?
title_short Is Bagging Effective in the Classification of Small-Sample Genomic and Proteomic Data?
title_sort is bagging effective in the classification of small-sample genomic and proteomic data?
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3171418/
https://www.ncbi.nlm.nih.gov/pubmed/19390645
http://dx.doi.org/10.1155/2009/158368
work_keys_str_mv AT vutt isbaggingeffectiveintheclassificationofsmallsamplegenomicandproteomicdata
AT braganetoum isbaggingeffectiveintheclassificationofsmallsamplegenomicandproteomicdata