Cargando…

TCC: an R package for comparing tag count data with robust normalization strategies

BACKGROUND: Differential expression analysis based on “next-generation” sequencing technologies is a fundamental means of studying RNA expression. We recently developed a multi-step normalization method (called TbT) for two-group RNA-seq data with replicates and demonstrated that the statistical met...

Descripción completa

Detalles Bibliográficos
Autores principales: Sun, Jianqiang, Nishiyama, Tomoaki, Shimizu, Kentaro, Kadota, Koji
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3716788/
https://www.ncbi.nlm.nih.gov/pubmed/23837715
http://dx.doi.org/10.1186/1471-2105-14-219
_version_ 1782277598928175104
author Sun, Jianqiang
Nishiyama, Tomoaki
Shimizu, Kentaro
Kadota, Koji
author_facet Sun, Jianqiang
Nishiyama, Tomoaki
Shimizu, Kentaro
Kadota, Koji
author_sort Sun, Jianqiang
collection PubMed
description BACKGROUND: Differential expression analysis based on “next-generation” sequencing technologies is a fundamental means of studying RNA expression. We recently developed a multi-step normalization method (called TbT) for two-group RNA-seq data with replicates and demonstrated that the statistical methods available in four R packages (edgeR, DESeq, baySeq, and NBPSeq) together with TbT can produce a well-ranked gene list in which true differentially expressed genes (DEGs) are top-ranked and non-DEGs are bottom ranked. However, the advantages of the current TbT method come at the cost of a huge computation time. Moreover, the R packages did not have normalization methods based on such a multi-step strategy. RESULTS: TCC (an acronym for Tag Count Comparison) is an R package that provides a series of functions for differential expression analysis of tag count data. The package incorporates multi-step normalization methods, whose strategy is to remove potential DEGs before performing the data normalization. The normalization function based on this DEG elimination strategy (DEGES) includes (i) the original TbT method based on DEGES for two-group data with or without replicates, (ii) much faster methods for two-group data with or without replicates, and (iii) methods for multi-group comparison. TCC provides a simple unified interface to perform such analyses with combinations of functions provided by edgeR, DESeq, and baySeq. Additionally, a function for generating simulation data under various conditions and alternative DEGES procedures consisting of functions in the existing packages are provided. Bioinformatics scientists can use TCC to evaluate their methods, and biologists familiar with other R packages can easily learn what is done in TCC. CONCLUSION: DEGES in TCC is essential for accurate normalization of tag count data, especially when up- and down-regulated DEGs in one of the samples are extremely biased in their number. TCC is useful for analyzing tag count data in various scenarios ranging from unbiased to extremely biased differential expression. TCC is available at http://www.iu.a.u-tokyo.ac.jp/~kadota/TCC/ and will appear in Bioconductor (http://bioconductor.org/) from ver. 2.13.
format Online
Article
Text
id pubmed-3716788
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-37167882013-07-23 TCC: an R package for comparing tag count data with robust normalization strategies Sun, Jianqiang Nishiyama, Tomoaki Shimizu, Kentaro Kadota, Koji BMC Bioinformatics Software BACKGROUND: Differential expression analysis based on “next-generation” sequencing technologies is a fundamental means of studying RNA expression. We recently developed a multi-step normalization method (called TbT) for two-group RNA-seq data with replicates and demonstrated that the statistical methods available in four R packages (edgeR, DESeq, baySeq, and NBPSeq) together with TbT can produce a well-ranked gene list in which true differentially expressed genes (DEGs) are top-ranked and non-DEGs are bottom ranked. However, the advantages of the current TbT method come at the cost of a huge computation time. Moreover, the R packages did not have normalization methods based on such a multi-step strategy. RESULTS: TCC (an acronym for Tag Count Comparison) is an R package that provides a series of functions for differential expression analysis of tag count data. The package incorporates multi-step normalization methods, whose strategy is to remove potential DEGs before performing the data normalization. The normalization function based on this DEG elimination strategy (DEGES) includes (i) the original TbT method based on DEGES for two-group data with or without replicates, (ii) much faster methods for two-group data with or without replicates, and (iii) methods for multi-group comparison. TCC provides a simple unified interface to perform such analyses with combinations of functions provided by edgeR, DESeq, and baySeq. Additionally, a function for generating simulation data under various conditions and alternative DEGES procedures consisting of functions in the existing packages are provided. Bioinformatics scientists can use TCC to evaluate their methods, and biologists familiar with other R packages can easily learn what is done in TCC. CONCLUSION: DEGES in TCC is essential for accurate normalization of tag count data, especially when up- and down-regulated DEGs in one of the samples are extremely biased in their number. TCC is useful for analyzing tag count data in various scenarios ranging from unbiased to extremely biased differential expression. TCC is available at http://www.iu.a.u-tokyo.ac.jp/~kadota/TCC/ and will appear in Bioconductor (http://bioconductor.org/) from ver. 2.13. BioMed Central 2013-07-09 /pmc/articles/PMC3716788/ /pubmed/23837715 http://dx.doi.org/10.1186/1471-2105-14-219 Text en Copyright © 2013 Sun et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Sun, Jianqiang
Nishiyama, Tomoaki
Shimizu, Kentaro
Kadota, Koji
TCC: an R package for comparing tag count data with robust normalization strategies
title TCC: an R package for comparing tag count data with robust normalization strategies
title_full TCC: an R package for comparing tag count data with robust normalization strategies
title_fullStr TCC: an R package for comparing tag count data with robust normalization strategies
title_full_unstemmed TCC: an R package for comparing tag count data with robust normalization strategies
title_short TCC: an R package for comparing tag count data with robust normalization strategies
title_sort tcc: an r package for comparing tag count data with robust normalization strategies
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3716788/
https://www.ncbi.nlm.nih.gov/pubmed/23837715
http://dx.doi.org/10.1186/1471-2105-14-219
work_keys_str_mv AT sunjianqiang tccanrpackageforcomparingtagcountdatawithrobustnormalizationstrategies
AT nishiyamatomoaki tccanrpackageforcomparingtagcountdatawithrobustnormalizationstrategies
AT shimizukentaro tccanrpackageforcomparingtagcountdatawithrobustnormalizationstrategies
AT kadotakoji tccanrpackageforcomparingtagcountdatawithrobustnormalizationstrategies