Cargando…

A statistical normalization method and differential expression analysis for RNA-seq data between different species

BACKGROUND: High-throughput techniques bring novel tools and also statistical challenges to genomic research. Identifying genes with differential expression between different species is an effective way to discover evolutionarily conserved transcriptional responses. To remove systematic variation be...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Yan, Zhu, Jiadi, Tong, Tiejun, Wang, Junhui, Lin, Bingqing, Zhang, Jun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6441199/
https://www.ncbi.nlm.nih.gov/pubmed/30925894
http://dx.doi.org/10.1186/s12859-019-2745-1
_version_ 1783407514463764480
author Zhou, Yan
Zhu, Jiadi
Tong, Tiejun
Wang, Junhui
Lin, Bingqing
Zhang, Jun
author_facet Zhou, Yan
Zhu, Jiadi
Tong, Tiejun
Wang, Junhui
Lin, Bingqing
Zhang, Jun
author_sort Zhou, Yan
collection PubMed
description BACKGROUND: High-throughput techniques bring novel tools and also statistical challenges to genomic research. Identifying genes with differential expression between different species is an effective way to discover evolutionarily conserved transcriptional responses. To remove systematic variation between different species for a fair comparison, normalization serves as a crucial pre-processing step that adjusts for the varying sample sequencing depths and other confounding technical effects. RESULTS: In this paper, we propose a scale based normalization (SCBN) method by taking into account the available knowledge of conserved orthologous genes and by using the hypothesis testing framework. Considering the different gene lengths and unmapped genes between different species, we formulate the problem from the perspective of hypothesis testing and search for the optimal scaling factor that minimizes the deviation between the empirical and nominal type I errors. CONCLUSIONS: Simulation studies show that the proposed method performs significantly better than the existing competitor in a wide range of settings. An RNA-seq dataset of different species is also analyzed and it coincides with the conclusion that the proposed method outperforms the existing method. For practical applications, we have also developed an R package named “SCBN”, which is freely available at http://www.bioconductor.org/packages/devel/bioc/html/SCBN.html. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2745-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6441199
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64411992019-04-11 A statistical normalization method and differential expression analysis for RNA-seq data between different species Zhou, Yan Zhu, Jiadi Tong, Tiejun Wang, Junhui Lin, Bingqing Zhang, Jun BMC Bioinformatics Methodology Article BACKGROUND: High-throughput techniques bring novel tools and also statistical challenges to genomic research. Identifying genes with differential expression between different species is an effective way to discover evolutionarily conserved transcriptional responses. To remove systematic variation between different species for a fair comparison, normalization serves as a crucial pre-processing step that adjusts for the varying sample sequencing depths and other confounding technical effects. RESULTS: In this paper, we propose a scale based normalization (SCBN) method by taking into account the available knowledge of conserved orthologous genes and by using the hypothesis testing framework. Considering the different gene lengths and unmapped genes between different species, we formulate the problem from the perspective of hypothesis testing and search for the optimal scaling factor that minimizes the deviation between the empirical and nominal type I errors. CONCLUSIONS: Simulation studies show that the proposed method performs significantly better than the existing competitor in a wide range of settings. An RNA-seq dataset of different species is also analyzed and it coincides with the conclusion that the proposed method outperforms the existing method. For practical applications, we have also developed an R package named “SCBN”, which is freely available at http://www.bioconductor.org/packages/devel/bioc/html/SCBN.html. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2745-1) contains supplementary material, which is available to authorized users. BioMed Central 2019-03-29 /pmc/articles/PMC6441199/ /pubmed/30925894 http://dx.doi.org/10.1186/s12859-019-2745-1 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Zhou, Yan
Zhu, Jiadi
Tong, Tiejun
Wang, Junhui
Lin, Bingqing
Zhang, Jun
A statistical normalization method and differential expression analysis for RNA-seq data between different species
title A statistical normalization method and differential expression analysis for RNA-seq data between different species
title_full A statistical normalization method and differential expression analysis for RNA-seq data between different species
title_fullStr A statistical normalization method and differential expression analysis for RNA-seq data between different species
title_full_unstemmed A statistical normalization method and differential expression analysis for RNA-seq data between different species
title_short A statistical normalization method and differential expression analysis for RNA-seq data between different species
title_sort statistical normalization method and differential expression analysis for rna-seq data between different species
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6441199/
https://www.ncbi.nlm.nih.gov/pubmed/30925894
http://dx.doi.org/10.1186/s12859-019-2745-1
work_keys_str_mv AT zhouyan astatisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies
AT zhujiadi astatisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies
AT tongtiejun astatisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies
AT wangjunhui astatisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies
AT linbingqing astatisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies
AT zhangjun astatisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies
AT zhouyan statisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies
AT zhujiadi statisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies
AT tongtiejun statisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies
AT wangjunhui statisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies
AT linbingqing statisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies
AT zhangjun statisticalnormalizationmethodanddifferentialexpressionanalysisforrnaseqdatabetweendifferentspecies