Cargando…

Comparative evaluation of set-level techniques in predictive classification of gene expression samples

BACKGROUND: Analysis of gene expression data in terms of a priori-defined gene sets has recently received significant attention as this approach typically yields more compact and interpretable results than those produced by traditional methods that rely on individual genes. The set-level strategy ca...

Descripción completa

Detalles Bibliográficos
Autores principales:	Holec, Matěj, Kléma, Jiří, Železný, Filip, Tolar, Jakub
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3382436/ https://www.ncbi.nlm.nih.gov/pubmed/22759420 http://dx.doi.org/10.1186/1471-2105-13-S10-S15

_version_	1782236499547258880
author	Holec, Matěj Kléma, Jiří Železný, Filip Tolar, Jakub
author_facet	Holec, Matěj Kléma, Jiří Železný, Filip Tolar, Jakub
author_sort	Holec, Matěj
collection	PubMed
description	BACKGROUND: Analysis of gene expression data in terms of a priori-defined gene sets has recently received significant attention as this approach typically yields more compact and interpretable results than those produced by traditional methods that rely on individual genes. The set-level strategy can also be adopted with similar benefits in predictive classification tasks accomplished with machine learning algorithms. Initial studies into the predictive performance of set-level classifiers have yielded rather controversial results. The goal of this study is to provide a more conclusive evaluation by testing various components of the set-level framework within a large collection of machine learning experiments. RESULTS: Genuine curated gene sets constitute better features for classification than sets assembled without biological relevance. For identifying the best gene sets for classification, the Global test outperforms the gene-set methods GSEA and SAM-GS as well as two generic feature selection methods. To aggregate expressions of genes into a feature value, the singular value decomposition (SVD) method as well as the SetSig technique improve on simple arithmetic averaging. Set-level classifiers learned with 10 features constituted by the Global test slightly outperform baseline gene-level classifiers learned with all original data features although they are slightly less accurate than gene-level classifiers learned with a prior feature-selection step. CONCLUSION: Set-level classifiers do not boost predictive accuracy, however, they do achieve competitive accuracy if learned with the right combination of ingredients. AVAILABILITY: Open-source, publicly available software was used for classifier learning and testing. The gene expression datasets and the gene set database used are also publicly available. The full tabulation of experimental results is available at http://ida.felk.cvut.cz/CESLT.
format	Online Article Text
id	pubmed-3382436
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-33824362012-06-28 Comparative evaluation of set-level techniques in predictive classification of gene expression samples Holec, Matěj Kléma, Jiří Železný, Filip Tolar, Jakub BMC Bioinformatics Proceedings BACKGROUND: Analysis of gene expression data in terms of a priori-defined gene sets has recently received significant attention as this approach typically yields more compact and interpretable results than those produced by traditional methods that rely on individual genes. The set-level strategy can also be adopted with similar benefits in predictive classification tasks accomplished with machine learning algorithms. Initial studies into the predictive performance of set-level classifiers have yielded rather controversial results. The goal of this study is to provide a more conclusive evaluation by testing various components of the set-level framework within a large collection of machine learning experiments. RESULTS: Genuine curated gene sets constitute better features for classification than sets assembled without biological relevance. For identifying the best gene sets for classification, the Global test outperforms the gene-set methods GSEA and SAM-GS as well as two generic feature selection methods. To aggregate expressions of genes into a feature value, the singular value decomposition (SVD) method as well as the SetSig technique improve on simple arithmetic averaging. Set-level classifiers learned with 10 features constituted by the Global test slightly outperform baseline gene-level classifiers learned with all original data features although they are slightly less accurate than gene-level classifiers learned with a prior feature-selection step. CONCLUSION: Set-level classifiers do not boost predictive accuracy, however, they do achieve competitive accuracy if learned with the right combination of ingredients. AVAILABILITY: Open-source, publicly available software was used for classifier learning and testing. The gene expression datasets and the gene set database used are also publicly available. The full tabulation of experimental results is available at http://ida.felk.cvut.cz/CESLT. BioMed Central 2012-06-25 /pmc/articles/PMC3382436/ /pubmed/22759420 http://dx.doi.org/10.1186/1471-2105-13-S10-S15 Text en Copyright ©2012 Holec et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Holec, Matěj Kléma, Jiří Železný, Filip Tolar, Jakub Comparative evaluation of set-level techniques in predictive classification of gene expression samples
title	Comparative evaluation of set-level techniques in predictive classification of gene expression samples
title_full	Comparative evaluation of set-level techniques in predictive classification of gene expression samples
title_fullStr	Comparative evaluation of set-level techniques in predictive classification of gene expression samples
title_full_unstemmed	Comparative evaluation of set-level techniques in predictive classification of gene expression samples
title_short	Comparative evaluation of set-level techniques in predictive classification of gene expression samples
title_sort	comparative evaluation of set-level techniques in predictive classification of gene expression samples
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3382436/ https://www.ncbi.nlm.nih.gov/pubmed/22759420 http://dx.doi.org/10.1186/1471-2105-13-S10-S15
work_keys_str_mv	AT holecmatej comparativeevaluationofsetleveltechniquesinpredictiveclassificationofgeneexpressionsamples AT klemajiri comparativeevaluationofsetleveltechniquesinpredictiveclassificationofgeneexpressionsamples AT zeleznyfilip comparativeevaluationofsetleveltechniquesinpredictiveclassificationofgeneexpressionsamples AT tolarjakub comparativeevaluationofsetleveltechniquesinpredictiveclassificationofgeneexpressionsamples

Comparative evaluation of set-level techniques in predictive classification of gene expression samples

Ejemplares similares