Cargando…
A statistical normalization method and differential expression analysis for RNA-seq data between different species
BACKGROUND: High-throughput techniques bring novel tools and also statistical challenges to genomic research. Identifying genes with differential expression between different species is an effective way to discover evolutionarily conserved transcriptional responses. To remove systematic variation be...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6441199/ https://www.ncbi.nlm.nih.gov/pubmed/30925894 http://dx.doi.org/10.1186/s12859-019-2745-1 |
_version_ | 1783407514463764480 |
---|---|
author | Zhou, Yan Zhu, Jiadi Tong, Tiejun Wang, Junhui Lin, Bingqing Zhang, Jun |
author_facet | Zhou, Yan Zhu, Jiadi Tong, Tiejun Wang, Junhui Lin, Bingqing Zhang, Jun |
author_sort | Zhou, Yan |
collection | PubMed |
description | BACKGROUND: High-throughput techniques bring novel tools and also statistical challenges to genomic research. Identifying genes with differential expression between different species is an effective way to discover evolutionarily conserved transcriptional responses. To remove systematic variation between different species for a fair comparison, normalization serves as a crucial pre-processing step that adjusts for the varying sample sequencing depths and other confounding technical effects. RESULTS: In this paper, we propose a scale based normalization (SCBN) method by taking into account the available knowledge of conserved orthologous genes and by using the hypothesis testing framework. Considering the different gene lengths and unmapped genes between different species, we formulate the problem from the perspective of hypothesis testing and search for the optimal scaling factor that minimizes the deviation between the empirical and nominal type I errors. CONCLUSIONS: Simulation studies show that the proposed method performs significantly better than the existing competitor in a wide range of settings. An RNA-seq dataset of different species is also analyzed and it coincides with the conclusion that the proposed method outperforms the existing method. For practical applications, we have also developed an R package named “SCBN”, which is freely available at http://www.bioconductor.org/packages/devel/bioc/html/SCBN.html. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2745-1) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6441199 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-64411992019-04-11 A statistical normalization method and differential expression analysis for RNA-seq data between different species Zhou, Yan Zhu, Jiadi Tong, Tiejun Wang, Junhui Lin, Bingqing Zhang, Jun BMC Bioinformatics Methodology Article BACKGROUND: High-throughput techniques bring novel tools and also statistical challenges to genomic research. Identifying genes with differential expression between different species is an effective way to discover evolutionarily conserved transcriptional responses. To remove systematic variation between different species for a fair comparison, normalization serves as a crucial pre-processing step that adjusts for the varying sample sequencing depths and other confounding technical effects. RESULTS: In this paper, we propose a scale based normalization (SCBN) method by taking into account the available knowledge of conserved orthologous genes and by using the hypothesis testing framework. Considering the different gene lengths and unmapped genes between different species, we formulate the problem from the perspective of hypothesis testing and search for the optimal scaling factor that minimizes the deviation between the empirical and nominal type I errors. CONCLUSIONS: Simulation studies show that the proposed method performs significantly better than the existing competitor in a wide range of settings. An RNA-seq dataset of different species is also analyzed and it coincides with the conclusion that the proposed method outperforms the existing method. For practical applications, we have also developed an R package named “SCBN”, which is freely available at http://www.bioconductor.org/packages/devel/bioc/html/SCBN.html. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2745-1) contains supplementary material, which is available to authorized users. BioMed Central 2019-03-29 /pmc/articles/PMC6441199/ /pubmed/30925894 http://dx.doi.org/10.1186/s12859-019-2745-1 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Zhou, Yan Zhu, Jiadi Tong, Tiejun Wang, Junhui Lin, Bingqing Zhang, Jun A statistical normalization method and differential expression analysis for RNA-seq data between different species |
title | A statistical normalization method and differential expression analysis for RNA-seq data between different species |
title_full | A statistical normalization method and differential expression analysis for RNA-seq data between different species |
title_fullStr | A statistical normalization method and differential expression analysis for RNA-seq data between different species |
title_full_unstemmed | A statistical normalization method and differential expression analysis for RNA-seq data between different species |
title_short | A statistical normalization method and differential expression analysis for RNA-seq data between different species |
title_sort | statistical normalization method and differential expression analysis for rna-seq data between different species |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6441199/ https://www.ncbi.nlm.nih.gov/pubmed/30925894 http://dx.doi.org/10.1186/s12859-019-2745-1 |
work_keys_str_mv | AT zhouyan astatisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies AT zhujiadi astatisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies AT tongtiejun astatisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies AT wangjunhui astatisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies AT linbingqing astatisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies AT zhangjun astatisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies AT zhouyan statisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies AT zhujiadi statisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies AT tongtiejun statisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies AT wangjunhui statisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies AT linbingqing statisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies AT zhangjun statisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies |