Cargando…

Statistical methods on detecting differentially expressed genes for RNA-seq data

BACKGROUND: For RNA-seq data, the aggregated counts of the short reads from the same gene is used to approximate the gene expression level. The count data can be modelled as samples from Poisson distributions with possible different parameters. To detect differentially expressed genes under two situ...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Zhongxue, Liu, Jianzhong, Ng, Hon Keung Tony, Nadarajah, Saralees, Kaufman, Howard L, Yang, Jack Y, Deng, Youping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287564/
https://www.ncbi.nlm.nih.gov/pubmed/22784615
http://dx.doi.org/10.1186/1752-0509-5-S3-S1
_version_ 1782224692175699968
author Chen, Zhongxue
Liu, Jianzhong
Ng, Hon Keung Tony
Nadarajah, Saralees
Kaufman, Howard L
Yang, Jack Y
Deng, Youping
author_facet Chen, Zhongxue
Liu, Jianzhong
Ng, Hon Keung Tony
Nadarajah, Saralees
Kaufman, Howard L
Yang, Jack Y
Deng, Youping
author_sort Chen, Zhongxue
collection PubMed
description BACKGROUND: For RNA-seq data, the aggregated counts of the short reads from the same gene is used to approximate the gene expression level. The count data can be modelled as samples from Poisson distributions with possible different parameters. To detect differentially expressed genes under two situations, statistical methods for detecting the difference of two Poisson means are used. When the expression level of a gene is low, i.e., the number of count is small, it is usually more difficult to detect the mean differences, and therefore statistical methods which are more powerful for low expression level are particularly desirable. In statistical literature, several methods have been proposed to compare two Poisson means (rates). In this paper, we compare these methods by using simulated and real RNA-seq data. RESULTS: Through simulation study and real data analysis, we find that the Wald test with the data being log-transformed is more powerful than other methods, including the likelihood ratio test, which has similar power as the variance stabilizing transformation test; both are more powerful than the conditional exact test and Fisher exact test. CONCLUSIONS: When the count data in RNA-seq can be reasonably modelled as Poisson distribution, the Wald-Log test is more powerful and should be used to detect the differentially expressed genes.
format Online
Article
Text
id pubmed-3287564
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32875642012-03-01 Statistical methods on detecting differentially expressed genes for RNA-seq data Chen, Zhongxue Liu, Jianzhong Ng, Hon Keung Tony Nadarajah, Saralees Kaufman, Howard L Yang, Jack Y Deng, Youping BMC Syst Biol Research Article BACKGROUND: For RNA-seq data, the aggregated counts of the short reads from the same gene is used to approximate the gene expression level. The count data can be modelled as samples from Poisson distributions with possible different parameters. To detect differentially expressed genes under two situations, statistical methods for detecting the difference of two Poisson means are used. When the expression level of a gene is low, i.e., the number of count is small, it is usually more difficult to detect the mean differences, and therefore statistical methods which are more powerful for low expression level are particularly desirable. In statistical literature, several methods have been proposed to compare two Poisson means (rates). In this paper, we compare these methods by using simulated and real RNA-seq data. RESULTS: Through simulation study and real data analysis, we find that the Wald test with the data being log-transformed is more powerful than other methods, including the likelihood ratio test, which has similar power as the variance stabilizing transformation test; both are more powerful than the conditional exact test and Fisher exact test. CONCLUSIONS: When the count data in RNA-seq can be reasonably modelled as Poisson distribution, the Wald-Log test is more powerful and should be used to detect the differentially expressed genes. BioMed Central 2011-12-23 /pmc/articles/PMC3287564/ /pubmed/22784615 http://dx.doi.org/10.1186/1752-0509-5-S3-S1 Text en Copyright ©2011 Chen et al. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Chen, Zhongxue
Liu, Jianzhong
Ng, Hon Keung Tony
Nadarajah, Saralees
Kaufman, Howard L
Yang, Jack Y
Deng, Youping
Statistical methods on detecting differentially expressed genes for RNA-seq data
title Statistical methods on detecting differentially expressed genes for RNA-seq data
title_full Statistical methods on detecting differentially expressed genes for RNA-seq data
title_fullStr Statistical methods on detecting differentially expressed genes for RNA-seq data
title_full_unstemmed Statistical methods on detecting differentially expressed genes for RNA-seq data
title_short Statistical methods on detecting differentially expressed genes for RNA-seq data
title_sort statistical methods on detecting differentially expressed genes for rna-seq data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3287564/
https://www.ncbi.nlm.nih.gov/pubmed/22784615
http://dx.doi.org/10.1186/1752-0509-5-S3-S1
work_keys_str_mv AT chenzhongxue statisticalmethodsondetectingdifferentiallyexpressedgenesforrnaseqdata
AT liujianzhong statisticalmethodsondetectingdifferentiallyexpressedgenesforrnaseqdata
AT nghonkeungtony statisticalmethodsondetectingdifferentiallyexpressedgenesforrnaseqdata
AT nadarajahsaralees statisticalmethodsondetectingdifferentiallyexpressedgenesforrnaseqdata
AT kaufmanhowardl statisticalmethodsondetectingdifferentiallyexpressedgenesforrnaseqdata
AT yangjacky statisticalmethodsondetectingdifferentiallyexpressedgenesforrnaseqdata
AT dengyouping statisticalmethodsondetectingdifferentiallyexpressedgenesforrnaseqdata