Cargando…

Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster

BACKGROUND: A generally accepted approach to the analysis of RNA-Seq read count data does not yet exist. We sequenced the mRNA of 726 individuals from the Drosophila Genetic Reference Panel in order to quantify differences in gene expression among single flies. One of our experimental goals was to i...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Yanzhu, Golovnina, Kseniya, Chen, Zhen-Xia, Lee, Hang Noh, Negron, Yazmin L. Serrano, Sultana, Hina, Oliver, Brian, Harbison, Susan T.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702322/
https://www.ncbi.nlm.nih.gov/pubmed/26732976
http://dx.doi.org/10.1186/s12864-015-2353-z
_version_ 1782408618274979840
author Lin, Yanzhu
Golovnina, Kseniya
Chen, Zhen-Xia
Lee, Hang Noh
Negron, Yazmin L. Serrano
Sultana, Hina
Oliver, Brian
Harbison, Susan T.
author_facet Lin, Yanzhu
Golovnina, Kseniya
Chen, Zhen-Xia
Lee, Hang Noh
Negron, Yazmin L. Serrano
Sultana, Hina
Oliver, Brian
Harbison, Susan T.
author_sort Lin, Yanzhu
collection PubMed
description BACKGROUND: A generally accepted approach to the analysis of RNA-Seq read count data does not yet exist. We sequenced the mRNA of 726 individuals from the Drosophila Genetic Reference Panel in order to quantify differences in gene expression among single flies. One of our experimental goals was to identify the optimal analysis approach for the detection of differential gene expression among the factors we varied in the experiment: genotype, environment, sex, and their interactions. Here we evaluate three different filtering strategies, eight normalization methods, and two statistical approaches using our data set. We assessed differential gene expression among factors and performed a statistical power analysis using the eight biological replicates per genotype, environment, and sex in our data set. RESULTS: We found that the most critical considerations for the analysis of RNA-Seq read count data were the normalization method, underlying data distribution assumption, and numbers of biological replicates, an observation consistent with previous RNA-Seq and microarray analysis comparisons. Some common normalization methods, such as Total Count, Quantile, and RPKM normalization, did not align the data across samples. Furthermore, analyses using the Median, Quantile, and Trimmed Mean of M-values normalization methods were sensitive to the removal of low-expressed genes from the data set. Although it is robust in many types of analysis, the normal data distribution assumption produced results vastly different than the negative binomial distribution. In addition, at least three biological replicates per condition were required in order to have sufficient statistical power to detect expression differences among the three-way interaction of genotype, environment, and sex. CONCLUSIONS: The best analysis approach to our data was to normalize the read counts using the DESeq method and apply a generalized linear model assuming a negative binomial distribution using either edgeR or DESeq software. Genes having very low read counts were removed after normalizing the data and fitting it to the negative binomial distribution. We describe the results of this evaluation and include recommended analysis strategies for RNA-Seq read count data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2353-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4702322
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-47023222016-01-07 Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster Lin, Yanzhu Golovnina, Kseniya Chen, Zhen-Xia Lee, Hang Noh Negron, Yazmin L. Serrano Sultana, Hina Oliver, Brian Harbison, Susan T. BMC Genomics Methodology Article BACKGROUND: A generally accepted approach to the analysis of RNA-Seq read count data does not yet exist. We sequenced the mRNA of 726 individuals from the Drosophila Genetic Reference Panel in order to quantify differences in gene expression among single flies. One of our experimental goals was to identify the optimal analysis approach for the detection of differential gene expression among the factors we varied in the experiment: genotype, environment, sex, and their interactions. Here we evaluate three different filtering strategies, eight normalization methods, and two statistical approaches using our data set. We assessed differential gene expression among factors and performed a statistical power analysis using the eight biological replicates per genotype, environment, and sex in our data set. RESULTS: We found that the most critical considerations for the analysis of RNA-Seq read count data were the normalization method, underlying data distribution assumption, and numbers of biological replicates, an observation consistent with previous RNA-Seq and microarray analysis comparisons. Some common normalization methods, such as Total Count, Quantile, and RPKM normalization, did not align the data across samples. Furthermore, analyses using the Median, Quantile, and Trimmed Mean of M-values normalization methods were sensitive to the removal of low-expressed genes from the data set. Although it is robust in many types of analysis, the normal data distribution assumption produced results vastly different than the negative binomial distribution. In addition, at least three biological replicates per condition were required in order to have sufficient statistical power to detect expression differences among the three-way interaction of genotype, environment, and sex. CONCLUSIONS: The best analysis approach to our data was to normalize the read counts using the DESeq method and apply a generalized linear model assuming a negative binomial distribution using either edgeR or DESeq software. Genes having very low read counts were removed after normalizing the data and fitting it to the negative binomial distribution. We describe the results of this evaluation and include recommended analysis strategies for RNA-Seq read count data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2353-z) contains supplementary material, which is available to authorized users. BioMed Central 2016-01-05 /pmc/articles/PMC4702322/ /pubmed/26732976 http://dx.doi.org/10.1186/s12864-015-2353-z Text en © Lin et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Lin, Yanzhu
Golovnina, Kseniya
Chen, Zhen-Xia
Lee, Hang Noh
Negron, Yazmin L. Serrano
Sultana, Hina
Oliver, Brian
Harbison, Susan T.
Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster
title Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster
title_full Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster
title_fullStr Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster
title_full_unstemmed Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster
title_short Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster
title_sort comparison of normalization and differential expression analyses using rna-seq data from 726 individual drosophila melanogaster
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4702322/
https://www.ncbi.nlm.nih.gov/pubmed/26732976
http://dx.doi.org/10.1186/s12864-015-2353-z
work_keys_str_mv AT linyanzhu comparisonofnormalizationanddifferentialexpressionanalysesusingrnaseqdatafrom726individualdrosophilamelanogaster
AT golovninakseniya comparisonofnormalizationanddifferentialexpressionanalysesusingrnaseqdatafrom726individualdrosophilamelanogaster
AT chenzhenxia comparisonofnormalizationanddifferentialexpressionanalysesusingrnaseqdatafrom726individualdrosophilamelanogaster
AT leehangnoh comparisonofnormalizationanddifferentialexpressionanalysesusingrnaseqdatafrom726individualdrosophilamelanogaster
AT negronyazminlserrano comparisonofnormalizationanddifferentialexpressionanalysesusingrnaseqdatafrom726individualdrosophilamelanogaster
AT sultanahina comparisonofnormalizationanddifferentialexpressionanalysesusingrnaseqdatafrom726individualdrosophilamelanogaster
AT oliverbrian comparisonofnormalizationanddifferentialexpressionanalysesusingrnaseqdatafrom726individualdrosophilamelanogaster
AT harbisonsusant comparisonofnormalizationanddifferentialexpressionanalysesusingrnaseqdatafrom726individualdrosophilamelanogaster