Cargando…
Evaluation of methods for differential expression analysis on multi-group RNA-seq count data
BACKGROUND: RNA-seq is a powerful tool for measuring transcriptomes, especially for identifying differentially expressed genes or transcripts (DEGs) between sample groups. A number of methods have been developed for this task, and several evaluation studies have also been reported. However, those ev...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4634584/ https://www.ncbi.nlm.nih.gov/pubmed/26538400 http://dx.doi.org/10.1186/s12859-015-0794-7 |
_version_ | 1782399383888723968 |
---|---|
author | Tang, Min Sun, Jianqiang Shimizu, Kentaro Kadota, Koji |
author_facet | Tang, Min Sun, Jianqiang Shimizu, Kentaro Kadota, Koji |
author_sort | Tang, Min |
collection | PubMed |
description | BACKGROUND: RNA-seq is a powerful tool for measuring transcriptomes, especially for identifying differentially expressed genes or transcripts (DEGs) between sample groups. A number of methods have been developed for this task, and several evaluation studies have also been reported. However, those evaluations so far have been restricted to two-group comparisons. Accumulations of comparative studies for multi-group data are also desired. METHODS: We compare 12 pipelines available in nine R packages for detecting differential expressions (DE) from multi-group RNA-seq count data, focusing on three-group data with or without replicates. We evaluate those pipelines on the basis of both simulation data and real count data. RESULTS: As a result, the pipelines in the TCC package performed comparably to or better than other pipelines under various simulation scenarios. TCC implements a multi-step normalization strategy (called DEGES) that internally uses functions provided by other representative packages (edgeR, DESeq2, and so on). We found considerably different numbers of identified DEGs (18.5 ~ 45.7 % of all genes) among the pipelines for the same real dataset but similar distributions of the classified expression patterns. We also found that DE results can roughly be estimated by the hierarchical dendrogram of sample clustering for the raw count data. CONCLUSION: We confirmed the DEGES-based pipelines implemented in TCC performed well in a three-group comparison as well as a two-group comparison. We recommend using the DEGES-based pipeline that internally uses edgeR (here called the EEE-E pipeline) for count data with replicates (especially for small sample sizes). For data without replicates, the DEGES-based pipeline with DESeq2 (called SSS-S) can be recommended. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0794-7) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4634584 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-46345842015-11-06 Evaluation of methods for differential expression analysis on multi-group RNA-seq count data Tang, Min Sun, Jianqiang Shimizu, Kentaro Kadota, Koji BMC Bioinformatics Research Article BACKGROUND: RNA-seq is a powerful tool for measuring transcriptomes, especially for identifying differentially expressed genes or transcripts (DEGs) between sample groups. A number of methods have been developed for this task, and several evaluation studies have also been reported. However, those evaluations so far have been restricted to two-group comparisons. Accumulations of comparative studies for multi-group data are also desired. METHODS: We compare 12 pipelines available in nine R packages for detecting differential expressions (DE) from multi-group RNA-seq count data, focusing on three-group data with or without replicates. We evaluate those pipelines on the basis of both simulation data and real count data. RESULTS: As a result, the pipelines in the TCC package performed comparably to or better than other pipelines under various simulation scenarios. TCC implements a multi-step normalization strategy (called DEGES) that internally uses functions provided by other representative packages (edgeR, DESeq2, and so on). We found considerably different numbers of identified DEGs (18.5 ~ 45.7 % of all genes) among the pipelines for the same real dataset but similar distributions of the classified expression patterns. We also found that DE results can roughly be estimated by the hierarchical dendrogram of sample clustering for the raw count data. CONCLUSION: We confirmed the DEGES-based pipelines implemented in TCC performed well in a three-group comparison as well as a two-group comparison. We recommend using the DEGES-based pipeline that internally uses edgeR (here called the EEE-E pipeline) for count data with replicates (especially for small sample sizes). For data without replicates, the DEGES-based pipeline with DESeq2 (called SSS-S) can be recommended. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0794-7) contains supplementary material, which is available to authorized users. BioMed Central 2015-11-04 /pmc/articles/PMC4634584/ /pubmed/26538400 http://dx.doi.org/10.1186/s12859-015-0794-7 Text en © Tang et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Tang, Min Sun, Jianqiang Shimizu, Kentaro Kadota, Koji Evaluation of methods for differential expression analysis on multi-group RNA-seq count data |
title | Evaluation of methods for differential expression analysis on multi-group RNA-seq count data |
title_full | Evaluation of methods for differential expression analysis on multi-group RNA-seq count data |
title_fullStr | Evaluation of methods for differential expression analysis on multi-group RNA-seq count data |
title_full_unstemmed | Evaluation of methods for differential expression analysis on multi-group RNA-seq count data |
title_short | Evaluation of methods for differential expression analysis on multi-group RNA-seq count data |
title_sort | evaluation of methods for differential expression analysis on multi-group rna-seq count data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4634584/ https://www.ncbi.nlm.nih.gov/pubmed/26538400 http://dx.doi.org/10.1186/s12859-015-0794-7 |
work_keys_str_mv | AT tangmin evaluationofmethodsfordifferentialexpressionanalysisonmultigrouprnaseqcountdata AT sunjianqiang evaluationofmethodsfordifferentialexpressionanalysisonmultigrouprnaseqcountdata AT shimizukentaro evaluationofmethodsfordifferentialexpressionanalysisonmultigrouprnaseqcountdata AT kadotakoji evaluationofmethodsfordifferentialexpressionanalysisonmultigrouprnaseqcountdata |