Cargando…
To aggregate or not to aggregate high-dimensional classifiers
BACKGROUND: High-throughput functional genomics technologies generate large amount of data with hundreds or thousands of measurements per sample. The number of sample is usually much smaller in the order of ten or hundred. This poses statistical challenges and calls for appropriate solutions for the...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3113942/ https://www.ncbi.nlm.nih.gov/pubmed/21569498 http://dx.doi.org/10.1186/1471-2105-12-153 |
_version_ | 1782206008947376128 |
---|---|
author | Xu, Cheng-Jian Hoefsloot, Huub CJ Smilde, Age K |
author_facet | Xu, Cheng-Jian Hoefsloot, Huub CJ Smilde, Age K |
author_sort | Xu, Cheng-Jian |
collection | PubMed |
description | BACKGROUND: High-throughput functional genomics technologies generate large amount of data with hundreds or thousands of measurements per sample. The number of sample is usually much smaller in the order of ten or hundred. This poses statistical challenges and calls for appropriate solutions for the analysis of this kind of data. RESULTS: Principal component discriminant analysis (PCDA), an adaptation of classical linear discriminant analysis (LDA) for high-dimensional data, has been selected as an example of a base learner. The multiple versions of PCDA models from repeated double cross-validation were aggregated, and the final classification was performed by majority voting. The performance of this approach was evaluated by simulation, genomics, proteomics and metabolomics data sets. CONCLUSIONS: The aggregating PCDA learner can improve the prediction performance, provide more stable result, and help to know the variability of the models. The disadvantage and limitations of aggregating were also discussed. |
format | Online Article Text |
id | pubmed-3113942 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-31139422011-06-14 To aggregate or not to aggregate high-dimensional classifiers Xu, Cheng-Jian Hoefsloot, Huub CJ Smilde, Age K BMC Bioinformatics Research Article BACKGROUND: High-throughput functional genomics technologies generate large amount of data with hundreds or thousands of measurements per sample. The number of sample is usually much smaller in the order of ten or hundred. This poses statistical challenges and calls for appropriate solutions for the analysis of this kind of data. RESULTS: Principal component discriminant analysis (PCDA), an adaptation of classical linear discriminant analysis (LDA) for high-dimensional data, has been selected as an example of a base learner. The multiple versions of PCDA models from repeated double cross-validation were aggregated, and the final classification was performed by majority voting. The performance of this approach was evaluated by simulation, genomics, proteomics and metabolomics data sets. CONCLUSIONS: The aggregating PCDA learner can improve the prediction performance, provide more stable result, and help to know the variability of the models. The disadvantage and limitations of aggregating were also discussed. BioMed Central 2011-05-13 /pmc/articles/PMC3113942/ /pubmed/21569498 http://dx.doi.org/10.1186/1471-2105-12-153 Text en Copyright ©2011 Xu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Xu, Cheng-Jian Hoefsloot, Huub CJ Smilde, Age K To aggregate or not to aggregate high-dimensional classifiers |
title | To aggregate or not to aggregate high-dimensional classifiers |
title_full | To aggregate or not to aggregate high-dimensional classifiers |
title_fullStr | To aggregate or not to aggregate high-dimensional classifiers |
title_full_unstemmed | To aggregate or not to aggregate high-dimensional classifiers |
title_short | To aggregate or not to aggregate high-dimensional classifiers |
title_sort | to aggregate or not to aggregate high-dimensional classifiers |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3113942/ https://www.ncbi.nlm.nih.gov/pubmed/21569498 http://dx.doi.org/10.1186/1471-2105-12-153 |
work_keys_str_mv | AT xuchengjian toaggregateornottoaggregatehighdimensionalclassifiers AT hoefsloothuubcj toaggregateornottoaggregatehighdimensionalclassifiers AT smildeagek toaggregateornottoaggregatehighdimensionalclassifiers |