Cargando…

To aggregate or not to aggregate high-dimensional classifiers

BACKGROUND: High-throughput functional genomics technologies generate large amount of data with hundreds or thousands of measurements per sample. The number of sample is usually much smaller in the order of ten or hundred. This poses statistical challenges and calls for appropriate solutions for the...

Descripción completa

Detalles Bibliográficos
Autores principales: Xu, Cheng-Jian, Hoefsloot, Huub CJ, Smilde, Age K
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3113942/
https://www.ncbi.nlm.nih.gov/pubmed/21569498
http://dx.doi.org/10.1186/1471-2105-12-153
_version_ 1782206008947376128
author Xu, Cheng-Jian
Hoefsloot, Huub CJ
Smilde, Age K
author_facet Xu, Cheng-Jian
Hoefsloot, Huub CJ
Smilde, Age K
author_sort Xu, Cheng-Jian
collection PubMed
description BACKGROUND: High-throughput functional genomics technologies generate large amount of data with hundreds or thousands of measurements per sample. The number of sample is usually much smaller in the order of ten or hundred. This poses statistical challenges and calls for appropriate solutions for the analysis of this kind of data. RESULTS: Principal component discriminant analysis (PCDA), an adaptation of classical linear discriminant analysis (LDA) for high-dimensional data, has been selected as an example of a base learner. The multiple versions of PCDA models from repeated double cross-validation were aggregated, and the final classification was performed by majority voting. The performance of this approach was evaluated by simulation, genomics, proteomics and metabolomics data sets. CONCLUSIONS: The aggregating PCDA learner can improve the prediction performance, provide more stable result, and help to know the variability of the models. The disadvantage and limitations of aggregating were also discussed.
format Online
Article
Text
id pubmed-3113942
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31139422011-06-14 To aggregate or not to aggregate high-dimensional classifiers Xu, Cheng-Jian Hoefsloot, Huub CJ Smilde, Age K BMC Bioinformatics Research Article BACKGROUND: High-throughput functional genomics technologies generate large amount of data with hundreds or thousands of measurements per sample. The number of sample is usually much smaller in the order of ten or hundred. This poses statistical challenges and calls for appropriate solutions for the analysis of this kind of data. RESULTS: Principal component discriminant analysis (PCDA), an adaptation of classical linear discriminant analysis (LDA) for high-dimensional data, has been selected as an example of a base learner. The multiple versions of PCDA models from repeated double cross-validation were aggregated, and the final classification was performed by majority voting. The performance of this approach was evaluated by simulation, genomics, proteomics and metabolomics data sets. CONCLUSIONS: The aggregating PCDA learner can improve the prediction performance, provide more stable result, and help to know the variability of the models. The disadvantage and limitations of aggregating were also discussed. BioMed Central 2011-05-13 /pmc/articles/PMC3113942/ /pubmed/21569498 http://dx.doi.org/10.1186/1471-2105-12-153 Text en Copyright ©2011 Xu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Xu, Cheng-Jian
Hoefsloot, Huub CJ
Smilde, Age K
To aggregate or not to aggregate high-dimensional classifiers
title To aggregate or not to aggregate high-dimensional classifiers
title_full To aggregate or not to aggregate high-dimensional classifiers
title_fullStr To aggregate or not to aggregate high-dimensional classifiers
title_full_unstemmed To aggregate or not to aggregate high-dimensional classifiers
title_short To aggregate or not to aggregate high-dimensional classifiers
title_sort to aggregate or not to aggregate high-dimensional classifiers
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3113942/
https://www.ncbi.nlm.nih.gov/pubmed/21569498
http://dx.doi.org/10.1186/1471-2105-12-153
work_keys_str_mv AT xuchengjian toaggregateornottoaggregatehighdimensionalclassifiers
AT hoefsloothuubcj toaggregateornottoaggregatehighdimensionalclassifiers
AT smildeagek toaggregateornottoaggregatehighdimensionalclassifiers