Cargando…

Gene set enrichment meta-learning analysis: next- generation sequencing versus microarrays

BACKGROUND: Reproducibility of results can have a significant impact on the acceptance of new technologies in gene expression analysis. With the recent introduction of the so-called next-generation sequencing (NGS) technology and established microarrays, one is able to choose between two completely...

Descripción completa

Detalles Bibliográficos
Autores principales: Stiglic, Gregor, Bajgot, Mateja, Kokol, Peter
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098055/
https://www.ncbi.nlm.nih.gov/pubmed/20377890
http://dx.doi.org/10.1186/1471-2105-11-176
_version_ 1782203906958295040
author Stiglic, Gregor
Bajgot, Mateja
Kokol, Peter
author_facet Stiglic, Gregor
Bajgot, Mateja
Kokol, Peter
author_sort Stiglic, Gregor
collection PubMed
description BACKGROUND: Reproducibility of results can have a significant impact on the acceptance of new technologies in gene expression analysis. With the recent introduction of the so-called next-generation sequencing (NGS) technology and established microarrays, one is able to choose between two completely different platforms for gene expression measurements. This study introduces a novel methodology for gene-ranking stability analysis that is applied to the evaluation of gene-ranking reproducibility on NGS and microarray data. RESULTS: The same data used in a well-known MicroArray Quality Control (MAQC) study was also used in this study to compare ranked lists of genes from MAQC samples A and B, obtained from Affymetrix HG-U133 Plus 2.0 and Roche 454 Genome Sequencer FLX platforms. An initial evaluation, where the percentage of overlapping genes was observed, demonstrates higher reproducibility on microarray data in 10 out of 11 gene-ranking methods. A gene set enrichment analysis shows similar enrichment of top gene sets when NGS is compared with microarrays on a pathway level. Our novel approach demonstrates high accuracy of decision trees when used for knowledge extraction from multiple bootstrapped gene set enrichment analysis runs. A comparison of the two approaches in sample preparation for high-throughput sequencing shows that alternating decision trees represent the optimal knowledge representation method in comparison with classical decision trees. CONCLUSIONS: Usual reproducibility measurements are mostly based on statistical techniques that offer very limited biological insights into the studied gene expression data sets. This paper introduces the meta-learning-based gene set enrichment analysis that can be used to complement the analysis of gene-ranking stability estimation techniques such as percentage of overlapping genes or classic gene set enrichment analysis. It is useful and practical when reproducibility of gene ranking results or different gene selection techniques is observed. The proposed method reveals very accurate descriptive models that capture the co-enrichment of gene sets which are differently enriched in the compared data sets.
format Text
id pubmed-3098055
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30980552011-05-20 Gene set enrichment meta-learning analysis: next- generation sequencing versus microarrays Stiglic, Gregor Bajgot, Mateja Kokol, Peter BMC Bioinformatics Methodology Article BACKGROUND: Reproducibility of results can have a significant impact on the acceptance of new technologies in gene expression analysis. With the recent introduction of the so-called next-generation sequencing (NGS) technology and established microarrays, one is able to choose between two completely different platforms for gene expression measurements. This study introduces a novel methodology for gene-ranking stability analysis that is applied to the evaluation of gene-ranking reproducibility on NGS and microarray data. RESULTS: The same data used in a well-known MicroArray Quality Control (MAQC) study was also used in this study to compare ranked lists of genes from MAQC samples A and B, obtained from Affymetrix HG-U133 Plus 2.0 and Roche 454 Genome Sequencer FLX platforms. An initial evaluation, where the percentage of overlapping genes was observed, demonstrates higher reproducibility on microarray data in 10 out of 11 gene-ranking methods. A gene set enrichment analysis shows similar enrichment of top gene sets when NGS is compared with microarrays on a pathway level. Our novel approach demonstrates high accuracy of decision trees when used for knowledge extraction from multiple bootstrapped gene set enrichment analysis runs. A comparison of the two approaches in sample preparation for high-throughput sequencing shows that alternating decision trees represent the optimal knowledge representation method in comparison with classical decision trees. CONCLUSIONS: Usual reproducibility measurements are mostly based on statistical techniques that offer very limited biological insights into the studied gene expression data sets. This paper introduces the meta-learning-based gene set enrichment analysis that can be used to complement the analysis of gene-ranking stability estimation techniques such as percentage of overlapping genes or classic gene set enrichment analysis. It is useful and practical when reproducibility of gene ranking results or different gene selection techniques is observed. The proposed method reveals very accurate descriptive models that capture the co-enrichment of gene sets which are differently enriched in the compared data sets. BioMed Central 2010-04-08 /pmc/articles/PMC3098055/ /pubmed/20377890 http://dx.doi.org/10.1186/1471-2105-11-176 Text en Copyright ©2010 Stiglic et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Stiglic, Gregor
Bajgot, Mateja
Kokol, Peter
Gene set enrichment meta-learning analysis: next- generation sequencing versus microarrays
title Gene set enrichment meta-learning analysis: next- generation sequencing versus microarrays
title_full Gene set enrichment meta-learning analysis: next- generation sequencing versus microarrays
title_fullStr Gene set enrichment meta-learning analysis: next- generation sequencing versus microarrays
title_full_unstemmed Gene set enrichment meta-learning analysis: next- generation sequencing versus microarrays
title_short Gene set enrichment meta-learning analysis: next- generation sequencing versus microarrays
title_sort gene set enrichment meta-learning analysis: next- generation sequencing versus microarrays
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098055/
https://www.ncbi.nlm.nih.gov/pubmed/20377890
http://dx.doi.org/10.1186/1471-2105-11-176
work_keys_str_mv AT stiglicgregor genesetenrichmentmetalearninganalysisnextgenerationsequencingversusmicroarrays
AT bajgotmateja genesetenrichmentmetalearninganalysisnextgenerationsequencingversusmicroarrays
AT kokolpeter genesetenrichmentmetalearninganalysisnextgenerationsequencingversusmicroarrays