Cargando…

A normalization strategy for comparing tag count data

BACKGROUND: High-throughput sequencing, such as ribonucleic acid sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) analyses, enables various features of organisms to be compared through tag counts. Recent studies have demonstrated that the normalization step for RNA-seq da...

Descripción completa

Detalles Bibliográficos
Autores principales: Kadota, Koji, Nishiyama, Tomoaki, Shimizu, Kentaro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3341196/
https://www.ncbi.nlm.nih.gov/pubmed/22475125
http://dx.doi.org/10.1186/1748-7188-7-5
_version_ 1782231499558354944
author Kadota, Koji
Nishiyama, Tomoaki
Shimizu, Kentaro
author_facet Kadota, Koji
Nishiyama, Tomoaki
Shimizu, Kentaro
author_sort Kadota, Koji
collection PubMed
description BACKGROUND: High-throughput sequencing, such as ribonucleic acid sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) analyses, enables various features of organisms to be compared through tag counts. Recent studies have demonstrated that the normalization step for RNA-seq data is critical for a more accurate subsequent analysis of differential gene expression. Development of a more robust normalization method is desirable for identifying the true difference in tag count data. RESULTS: We describe a strategy for normalizing tag count data, focusing on RNA-seq. The key concept is to remove data assigned as potential differentially expressed genes (DEGs) before calculating the normalization factor. Several R packages for identifying DEGs are currently available, and each package uses its own normalization method and gene ranking algorithm. We compared a total of eight package combinations: four R packages (edgeR, DESeq, baySeq, and NBPSeq) with their default normalization settings and with our normalization strategy. Many synthetic datasets under various scenarios were evaluated on the basis of the area under the curve (AUC) as a measure for both sensitivity and specificity. We found that packages using our strategy in the data normalization step overall performed well. This result was also observed for a real experimental dataset. CONCLUSION: Our results showed that the elimination of potential DEGs is essential for more accurate normalization of RNA-seq data. The concept of this normalization strategy can widely be applied to other types of tag count data and to microarray data.
format Online
Article
Text
id pubmed-3341196
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33411962012-05-02 A normalization strategy for comparing tag count data Kadota, Koji Nishiyama, Tomoaki Shimizu, Kentaro Algorithms Mol Biol Research BACKGROUND: High-throughput sequencing, such as ribonucleic acid sequencing (RNA-seq) and chromatin immunoprecipitation sequencing (ChIP-seq) analyses, enables various features of organisms to be compared through tag counts. Recent studies have demonstrated that the normalization step for RNA-seq data is critical for a more accurate subsequent analysis of differential gene expression. Development of a more robust normalization method is desirable for identifying the true difference in tag count data. RESULTS: We describe a strategy for normalizing tag count data, focusing on RNA-seq. The key concept is to remove data assigned as potential differentially expressed genes (DEGs) before calculating the normalization factor. Several R packages for identifying DEGs are currently available, and each package uses its own normalization method and gene ranking algorithm. We compared a total of eight package combinations: four R packages (edgeR, DESeq, baySeq, and NBPSeq) with their default normalization settings and with our normalization strategy. Many synthetic datasets under various scenarios were evaluated on the basis of the area under the curve (AUC) as a measure for both sensitivity and specificity. We found that packages using our strategy in the data normalization step overall performed well. This result was also observed for a real experimental dataset. CONCLUSION: Our results showed that the elimination of potential DEGs is essential for more accurate normalization of RNA-seq data. The concept of this normalization strategy can widely be applied to other types of tag count data and to microarray data. BioMed Central 2012-04-05 /pmc/articles/PMC3341196/ /pubmed/22475125 http://dx.doi.org/10.1186/1748-7188-7-5 Text en Copyright ©2012 Kadota et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Kadota, Koji
Nishiyama, Tomoaki
Shimizu, Kentaro
A normalization strategy for comparing tag count data
title A normalization strategy for comparing tag count data
title_full A normalization strategy for comparing tag count data
title_fullStr A normalization strategy for comparing tag count data
title_full_unstemmed A normalization strategy for comparing tag count data
title_short A normalization strategy for comparing tag count data
title_sort normalization strategy for comparing tag count data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3341196/
https://www.ncbi.nlm.nih.gov/pubmed/22475125
http://dx.doi.org/10.1186/1748-7188-7-5
work_keys_str_mv AT kadotakoji anormalizationstrategyforcomparingtagcountdata
AT nishiyamatomoaki anormalizationstrategyforcomparingtagcountdata
AT shimizukentaro anormalizationstrategyforcomparingtagcountdata
AT kadotakoji normalizationstrategyforcomparingtagcountdata
AT nishiyamatomoaki normalizationstrategyforcomparingtagcountdata
AT shimizukentaro normalizationstrategyforcomparingtagcountdata