Cargando…

Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline

Transcriptome sequencing (RNA-seq) is gradually replacing microarrays for high-throughput studies of gene expression. The main challenge of analyzing microarray data is not in finding differentially expressed genes, but in gaining insights into the biological processes underlying phenotypic differen...

Descripción completa

Detalles Bibliográficos
Autores principales: Rahmatallah, Yasir, Emmert-Streib, Frank, Glazko, Galina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4870397/
https://www.ncbi.nlm.nih.gov/pubmed/26342128
http://dx.doi.org/10.1093/bib/bbv069
_version_ 1782432435734052864
author Rahmatallah, Yasir
Emmert-Streib, Frank
Glazko, Galina
author_facet Rahmatallah, Yasir
Emmert-Streib, Frank
Glazko, Galina
author_sort Rahmatallah, Yasir
collection PubMed
description Transcriptome sequencing (RNA-seq) is gradually replacing microarrays for high-throughput studies of gene expression. The main challenge of analyzing microarray data is not in finding differentially expressed genes, but in gaining insights into the biological processes underlying phenotypic differences. To interpret experimental results from microarrays, gene set analysis (GSA) has become the method of choice, in particular because it incorporates pre-existing biological knowledge (in a form of functionally related gene sets) into the analysis. Here we provide a brief review of several statistically different GSA approaches (competitive and self-contained) that can be adapted from microarrays practice as well as those specifically designed for RNA-seq. We evaluate their performance (in terms of Type I error rate, power, robustness to the sample size and heterogeneity, as well as the sensitivity to different types of selection biases) on simulated and real RNA-seq data. Not surprisingly, the performance of various GSA approaches depends only on the statistical hypothesis they test and does not depend on whether the test was developed for microarrays or RNA-seq data. Interestingly, we found that competitive methods have lower power as well as robustness to the samples heterogeneity than self-contained methods, leading to poor results reproducibility. We also found that the power of unsupervised competitive methods depends on the balance between up- and down-regulated genes in tested gene sets. These properties of competitive methods have been overlooked before. Our evaluation provides a concise guideline for selecting GSA approaches, best performing under particular experimental settings in the context of RNA-seq.
format Online
Article
Text
id pubmed-4870397
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-48703972016-05-26 Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline Rahmatallah, Yasir Emmert-Streib, Frank Glazko, Galina Brief Bioinform Papers Transcriptome sequencing (RNA-seq) is gradually replacing microarrays for high-throughput studies of gene expression. The main challenge of analyzing microarray data is not in finding differentially expressed genes, but in gaining insights into the biological processes underlying phenotypic differences. To interpret experimental results from microarrays, gene set analysis (GSA) has become the method of choice, in particular because it incorporates pre-existing biological knowledge (in a form of functionally related gene sets) into the analysis. Here we provide a brief review of several statistically different GSA approaches (competitive and self-contained) that can be adapted from microarrays practice as well as those specifically designed for RNA-seq. We evaluate their performance (in terms of Type I error rate, power, robustness to the sample size and heterogeneity, as well as the sensitivity to different types of selection biases) on simulated and real RNA-seq data. Not surprisingly, the performance of various GSA approaches depends only on the statistical hypothesis they test and does not depend on whether the test was developed for microarrays or RNA-seq data. Interestingly, we found that competitive methods have lower power as well as robustness to the samples heterogeneity than self-contained methods, leading to poor results reproducibility. We also found that the power of unsupervised competitive methods depends on the balance between up- and down-regulated genes in tested gene sets. These properties of competitive methods have been overlooked before. Our evaluation provides a concise guideline for selecting GSA approaches, best performing under particular experimental settings in the context of RNA-seq. Oxford University Press 2016-05 2015-09-04 /pmc/articles/PMC4870397/ /pubmed/26342128 http://dx.doi.org/10.1093/bib/bbv069 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Papers
Rahmatallah, Yasir
Emmert-Streib, Frank
Glazko, Galina
Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline
title Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline
title_full Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline
title_fullStr Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline
title_full_unstemmed Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline
title_short Gene set analysis approaches for RNA-seq data: performance evaluation and application guideline
title_sort gene set analysis approaches for rna-seq data: performance evaluation and application guideline
topic Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4870397/
https://www.ncbi.nlm.nih.gov/pubmed/26342128
http://dx.doi.org/10.1093/bib/bbv069
work_keys_str_mv AT rahmatallahyasir genesetanalysisapproachesforrnaseqdataperformanceevaluationandapplicationguideline
AT emmertstreibfrank genesetanalysisapproachesforrnaseqdataperformanceevaluationandapplicationguideline
AT glazkogalina genesetanalysisapproachesforrnaseqdataperformanceevaluationandapplicationguideline