Cargando…

Evaluation of methods for differential expression analysis on multi-group RNA-seq count data

BACKGROUND: RNA-seq is a powerful tool for measuring transcriptomes, especially for identifying differentially expressed genes or transcripts (DEGs) between sample groups. A number of methods have been developed for this task, and several evaluation studies have also been reported. However, those ev...

Descripción completa

Detalles Bibliográficos
Autores principales: Tang, Min, Sun, Jianqiang, Shimizu, Kentaro, Kadota, Koji
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4634584/
https://www.ncbi.nlm.nih.gov/pubmed/26538400
http://dx.doi.org/10.1186/s12859-015-0794-7
_version_ 1782399383888723968
author Tang, Min
Sun, Jianqiang
Shimizu, Kentaro
Kadota, Koji
author_facet Tang, Min
Sun, Jianqiang
Shimizu, Kentaro
Kadota, Koji
author_sort Tang, Min
collection PubMed
description BACKGROUND: RNA-seq is a powerful tool for measuring transcriptomes, especially for identifying differentially expressed genes or transcripts (DEGs) between sample groups. A number of methods have been developed for this task, and several evaluation studies have also been reported. However, those evaluations so far have been restricted to two-group comparisons. Accumulations of comparative studies for multi-group data are also desired. METHODS: We compare 12 pipelines available in nine R packages for detecting differential expressions (DE) from multi-group RNA-seq count data, focusing on three-group data with or without replicates. We evaluate those pipelines on the basis of both simulation data and real count data. RESULTS: As a result, the pipelines in the TCC package performed comparably to or better than other pipelines under various simulation scenarios. TCC implements a multi-step normalization strategy (called DEGES) that internally uses functions provided by other representative packages (edgeR, DESeq2, and so on). We found considerably different numbers of identified DEGs (18.5 ~ 45.7 % of all genes) among the pipelines for the same real dataset but similar distributions of the classified expression patterns. We also found that DE results can roughly be estimated by the hierarchical dendrogram of sample clustering for the raw count data. CONCLUSION: We confirmed the DEGES-based pipelines implemented in TCC performed well in a three-group comparison as well as a two-group comparison. We recommend using the DEGES-based pipeline that internally uses edgeR (here called the EEE-E pipeline) for count data with replicates (especially for small sample sizes). For data without replicates, the DEGES-based pipeline with DESeq2 (called SSS-S) can be recommended. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0794-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4634584
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46345842015-11-06 Evaluation of methods for differential expression analysis on multi-group RNA-seq count data Tang, Min Sun, Jianqiang Shimizu, Kentaro Kadota, Koji BMC Bioinformatics Research Article BACKGROUND: RNA-seq is a powerful tool for measuring transcriptomes, especially for identifying differentially expressed genes or transcripts (DEGs) between sample groups. A number of methods have been developed for this task, and several evaluation studies have also been reported. However, those evaluations so far have been restricted to two-group comparisons. Accumulations of comparative studies for multi-group data are also desired. METHODS: We compare 12 pipelines available in nine R packages for detecting differential expressions (DE) from multi-group RNA-seq count data, focusing on three-group data with or without replicates. We evaluate those pipelines on the basis of both simulation data and real count data. RESULTS: As a result, the pipelines in the TCC package performed comparably to or better than other pipelines under various simulation scenarios. TCC implements a multi-step normalization strategy (called DEGES) that internally uses functions provided by other representative packages (edgeR, DESeq2, and so on). We found considerably different numbers of identified DEGs (18.5 ~ 45.7 % of all genes) among the pipelines for the same real dataset but similar distributions of the classified expression patterns. We also found that DE results can roughly be estimated by the hierarchical dendrogram of sample clustering for the raw count data. CONCLUSION: We confirmed the DEGES-based pipelines implemented in TCC performed well in a three-group comparison as well as a two-group comparison. We recommend using the DEGES-based pipeline that internally uses edgeR (here called the EEE-E pipeline) for count data with replicates (especially for small sample sizes). For data without replicates, the DEGES-based pipeline with DESeq2 (called SSS-S) can be recommended. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-015-0794-7) contains supplementary material, which is available to authorized users. BioMed Central 2015-11-04 /pmc/articles/PMC4634584/ /pubmed/26538400 http://dx.doi.org/10.1186/s12859-015-0794-7 Text en © Tang et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Tang, Min
Sun, Jianqiang
Shimizu, Kentaro
Kadota, Koji
Evaluation of methods for differential expression analysis on multi-group RNA-seq count data
title Evaluation of methods for differential expression analysis on multi-group RNA-seq count data
title_full Evaluation of methods for differential expression analysis on multi-group RNA-seq count data
title_fullStr Evaluation of methods for differential expression analysis on multi-group RNA-seq count data
title_full_unstemmed Evaluation of methods for differential expression analysis on multi-group RNA-seq count data
title_short Evaluation of methods for differential expression analysis on multi-group RNA-seq count data
title_sort evaluation of methods for differential expression analysis on multi-group rna-seq count data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4634584/
https://www.ncbi.nlm.nih.gov/pubmed/26538400
http://dx.doi.org/10.1186/s12859-015-0794-7
work_keys_str_mv AT tangmin evaluationofmethodsfordifferentialexpressionanalysisonmultigrouprnaseqcountdata
AT sunjianqiang evaluationofmethodsfordifferentialexpressionanalysisonmultigrouprnaseqcountdata
AT shimizukentaro evaluationofmethodsfordifferentialexpressionanalysisonmultigrouprnaseqcountdata
AT kadotakoji evaluationofmethodsfordifferentialexpressionanalysisonmultigrouprnaseqcountdata