Cargando…
Modeling Exon-Specific Bias Distribution Improves the Analysis of RNA-Seq Data
RNA-seq technology has become an important tool for quantifying the gene and transcript expression in transcriptome study. The two major difficulties for the gene and transcript expression quantification are the read mapping ambiguity and the overdispersion of the read distribution along reference s...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4598124/ https://www.ncbi.nlm.nih.gov/pubmed/26448625 http://dx.doi.org/10.1371/journal.pone.0140032 |
_version_ | 1782394035951894528 |
---|---|
author | Liu, Xuejun Zhang, Li Chen, Songcan |
author_facet | Liu, Xuejun Zhang, Li Chen, Songcan |
author_sort | Liu, Xuejun |
collection | PubMed |
description | RNA-seq technology has become an important tool for quantifying the gene and transcript expression in transcriptome study. The two major difficulties for the gene and transcript expression quantification are the read mapping ambiguity and the overdispersion of the read distribution along reference sequence. Many approaches have been proposed to deal with these difficulties. A number of existing methods use Poisson distribution to model the read counts and this easily splits the counts into the contributions from multiple transcripts. Meanwhile, various solutions were put forward to account for the overdispersion in the Poisson models. By checking the similarities among the variation patterns of read counts for individual genes, we found that the count variation is exon-specific and has the conserved pattern across the samples for each individual gene. We introduce Gamma-distributed latent variables to model the read sequencing preference for each exon. These variables are embedded to the rate parameter of a Poisson model to account for the overdispersion of read distribution. The model is tractable since the Gamma priors can be integrated out in the maximum likelihood estimation. We evaluate the proposed approach, PGseq, using four real datasets and one simulated dataset, and compare its performance with other popular methods. Results show that PGseq presents competitive performance compared to other alternatives in terms of accuracy in the gene and transcript expression calculation and in the downstream differential expression analysis. Especially, we show the advantage of our method in the analysis of low expression. |
format | Online Article Text |
id | pubmed-4598124 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-45981242015-10-20 Modeling Exon-Specific Bias Distribution Improves the Analysis of RNA-Seq Data Liu, Xuejun Zhang, Li Chen, Songcan PLoS One Research Article RNA-seq technology has become an important tool for quantifying the gene and transcript expression in transcriptome study. The two major difficulties for the gene and transcript expression quantification are the read mapping ambiguity and the overdispersion of the read distribution along reference sequence. Many approaches have been proposed to deal with these difficulties. A number of existing methods use Poisson distribution to model the read counts and this easily splits the counts into the contributions from multiple transcripts. Meanwhile, various solutions were put forward to account for the overdispersion in the Poisson models. By checking the similarities among the variation patterns of read counts for individual genes, we found that the count variation is exon-specific and has the conserved pattern across the samples for each individual gene. We introduce Gamma-distributed latent variables to model the read sequencing preference for each exon. These variables are embedded to the rate parameter of a Poisson model to account for the overdispersion of read distribution. The model is tractable since the Gamma priors can be integrated out in the maximum likelihood estimation. We evaluate the proposed approach, PGseq, using four real datasets and one simulated dataset, and compare its performance with other popular methods. Results show that PGseq presents competitive performance compared to other alternatives in terms of accuracy in the gene and transcript expression calculation and in the downstream differential expression analysis. Especially, we show the advantage of our method in the analysis of low expression. Public Library of Science 2015-10-08 /pmc/articles/PMC4598124/ /pubmed/26448625 http://dx.doi.org/10.1371/journal.pone.0140032 Text en © 2015 Liu et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Liu, Xuejun Zhang, Li Chen, Songcan Modeling Exon-Specific Bias Distribution Improves the Analysis of RNA-Seq Data |
title | Modeling Exon-Specific Bias Distribution Improves the Analysis of RNA-Seq Data |
title_full | Modeling Exon-Specific Bias Distribution Improves the Analysis of RNA-Seq Data |
title_fullStr | Modeling Exon-Specific Bias Distribution Improves the Analysis of RNA-Seq Data |
title_full_unstemmed | Modeling Exon-Specific Bias Distribution Improves the Analysis of RNA-Seq Data |
title_short | Modeling Exon-Specific Bias Distribution Improves the Analysis of RNA-Seq Data |
title_sort | modeling exon-specific bias distribution improves the analysis of rna-seq data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4598124/ https://www.ncbi.nlm.nih.gov/pubmed/26448625 http://dx.doi.org/10.1371/journal.pone.0140032 |
work_keys_str_mv | AT liuxuejun modelingexonspecificbiasdistributionimprovestheanalysisofrnaseqdata AT zhangli modelingexonspecificbiasdistributionimprovestheanalysisofrnaseqdata AT chensongcan modelingexonspecificbiasdistributionimprovestheanalysisofrnaseqdata |