Cargando…
TCC: an R package for comparing tag count data with robust normalization strategies
BACKGROUND: Differential expression analysis based on “next-generation” sequencing technologies is a fundamental means of studying RNA expression. We recently developed a multi-step normalization method (called TbT) for two-group RNA-seq data with replicates and demonstrated that the statistical met...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3716788/ https://www.ncbi.nlm.nih.gov/pubmed/23837715 http://dx.doi.org/10.1186/1471-2105-14-219 |
_version_ | 1782277598928175104 |
---|---|
author | Sun, Jianqiang Nishiyama, Tomoaki Shimizu, Kentaro Kadota, Koji |
author_facet | Sun, Jianqiang Nishiyama, Tomoaki Shimizu, Kentaro Kadota, Koji |
author_sort | Sun, Jianqiang |
collection | PubMed |
description | BACKGROUND: Differential expression analysis based on “next-generation” sequencing technologies is a fundamental means of studying RNA expression. We recently developed a multi-step normalization method (called TbT) for two-group RNA-seq data with replicates and demonstrated that the statistical methods available in four R packages (edgeR, DESeq, baySeq, and NBPSeq) together with TbT can produce a well-ranked gene list in which true differentially expressed genes (DEGs) are top-ranked and non-DEGs are bottom ranked. However, the advantages of the current TbT method come at the cost of a huge computation time. Moreover, the R packages did not have normalization methods based on such a multi-step strategy. RESULTS: TCC (an acronym for Tag Count Comparison) is an R package that provides a series of functions for differential expression analysis of tag count data. The package incorporates multi-step normalization methods, whose strategy is to remove potential DEGs before performing the data normalization. The normalization function based on this DEG elimination strategy (DEGES) includes (i) the original TbT method based on DEGES for two-group data with or without replicates, (ii) much faster methods for two-group data with or without replicates, and (iii) methods for multi-group comparison. TCC provides a simple unified interface to perform such analyses with combinations of functions provided by edgeR, DESeq, and baySeq. Additionally, a function for generating simulation data under various conditions and alternative DEGES procedures consisting of functions in the existing packages are provided. Bioinformatics scientists can use TCC to evaluate their methods, and biologists familiar with other R packages can easily learn what is done in TCC. CONCLUSION: DEGES in TCC is essential for accurate normalization of tag count data, especially when up- and down-regulated DEGs in one of the samples are extremely biased in their number. TCC is useful for analyzing tag count data in various scenarios ranging from unbiased to extremely biased differential expression. TCC is available at http://www.iu.a.u-tokyo.ac.jp/~kadota/TCC/ and will appear in Bioconductor (http://bioconductor.org/) from ver. 2.13. |
format | Online Article Text |
id | pubmed-3716788 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-37167882013-07-23 TCC: an R package for comparing tag count data with robust normalization strategies Sun, Jianqiang Nishiyama, Tomoaki Shimizu, Kentaro Kadota, Koji BMC Bioinformatics Software BACKGROUND: Differential expression analysis based on “next-generation” sequencing technologies is a fundamental means of studying RNA expression. We recently developed a multi-step normalization method (called TbT) for two-group RNA-seq data with replicates and demonstrated that the statistical methods available in four R packages (edgeR, DESeq, baySeq, and NBPSeq) together with TbT can produce a well-ranked gene list in which true differentially expressed genes (DEGs) are top-ranked and non-DEGs are bottom ranked. However, the advantages of the current TbT method come at the cost of a huge computation time. Moreover, the R packages did not have normalization methods based on such a multi-step strategy. RESULTS: TCC (an acronym for Tag Count Comparison) is an R package that provides a series of functions for differential expression analysis of tag count data. The package incorporates multi-step normalization methods, whose strategy is to remove potential DEGs before performing the data normalization. The normalization function based on this DEG elimination strategy (DEGES) includes (i) the original TbT method based on DEGES for two-group data with or without replicates, (ii) much faster methods for two-group data with or without replicates, and (iii) methods for multi-group comparison. TCC provides a simple unified interface to perform such analyses with combinations of functions provided by edgeR, DESeq, and baySeq. Additionally, a function for generating simulation data under various conditions and alternative DEGES procedures consisting of functions in the existing packages are provided. Bioinformatics scientists can use TCC to evaluate their methods, and biologists familiar with other R packages can easily learn what is done in TCC. CONCLUSION: DEGES in TCC is essential for accurate normalization of tag count data, especially when up- and down-regulated DEGs in one of the samples are extremely biased in their number. TCC is useful for analyzing tag count data in various scenarios ranging from unbiased to extremely biased differential expression. TCC is available at http://www.iu.a.u-tokyo.ac.jp/~kadota/TCC/ and will appear in Bioconductor (http://bioconductor.org/) from ver. 2.13. BioMed Central 2013-07-09 /pmc/articles/PMC3716788/ /pubmed/23837715 http://dx.doi.org/10.1186/1471-2105-14-219 Text en Copyright © 2013 Sun et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Sun, Jianqiang Nishiyama, Tomoaki Shimizu, Kentaro Kadota, Koji TCC: an R package for comparing tag count data with robust normalization strategies |
title | TCC: an R package for comparing tag count data with robust normalization strategies |
title_full | TCC: an R package for comparing tag count data with robust normalization strategies |
title_fullStr | TCC: an R package for comparing tag count data with robust normalization strategies |
title_full_unstemmed | TCC: an R package for comparing tag count data with robust normalization strategies |
title_short | TCC: an R package for comparing tag count data with robust normalization strategies |
title_sort | tcc: an r package for comparing tag count data with robust normalization strategies |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3716788/ https://www.ncbi.nlm.nih.gov/pubmed/23837715 http://dx.doi.org/10.1186/1471-2105-14-219 |
work_keys_str_mv | AT sunjianqiang tccanrpackageforcomparingtagcountdatawithrobustnormalizationstrategies AT nishiyamatomoaki tccanrpackageforcomparingtagcountdatawithrobustnormalizationstrategies AT shimizukentaro tccanrpackageforcomparingtagcountdatawithrobustnormalizationstrategies AT kadotakoji tccanrpackageforcomparingtagcountdatawithrobustnormalizationstrategies |