Cargando…
PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution
Correctly estimating isoform-specific gene expression is important for understanding complicated biological mechanisms and for mapping disease susceptibility genes. However, estimating isoform-specific gene expression is challenging because various biases present in RNA-Seq (RNA sequencing) data com...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3919567/ https://www.ncbi.nlm.nih.gov/pubmed/24362841 http://dx.doi.org/10.1093/nar/gkt1304 |
_version_ | 1782303047532150784 |
---|---|
author | Hu, Yu Liu, Yichuan Mao, Xianyun Jia, Cheng Ferguson, Jane F. Xue, Chenyi Reilly, Muredach P. Li, Hongzhe Li, Mingyao |
author_facet | Hu, Yu Liu, Yichuan Mao, Xianyun Jia, Cheng Ferguson, Jane F. Xue, Chenyi Reilly, Muredach P. Li, Hongzhe Li, Mingyao |
author_sort | Hu, Yu |
collection | PubMed |
description | Correctly estimating isoform-specific gene expression is important for understanding complicated biological mechanisms and for mapping disease susceptibility genes. However, estimating isoform-specific gene expression is challenging because various biases present in RNA-Seq (RNA sequencing) data complicate the analysis, and if not appropriately corrected, can affect isoform expression estimation and downstream analysis. In this article, we present PennSeq, a statistical method that allows each isoform to have its own non-uniform read distribution. Instead of making parametric assumptions, we give adequate weight to the underlying data by the use of a non-parametric approach. Our rationale is that regardless what factors lead to non-uniformity, whether it is due to hexamer priming bias, local sequence bias, positional bias, RNA degradation, mapping bias or other unknown reasons, the probability that a fragment is sampled from a particular region will be reflected in the aligned data. This empirical approach thus maximally reflects the true underlying non-uniform read distribution. We evaluate the performance of PennSeq using both simulated data with known ground truth, and using two real Illumina RNA-Seq data sets including one with quantitative real time polymerase chain reaction measurements. Our results indicate superior performance of PennSeq over existing methods, particularly for isoforms demonstrating severe non-uniformity. PennSeq is freely available for download at http://sourceforge.net/projects/pennseq. |
format | Online Article Text |
id | pubmed-3919567 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-39195672014-02-10 PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution Hu, Yu Liu, Yichuan Mao, Xianyun Jia, Cheng Ferguson, Jane F. Xue, Chenyi Reilly, Muredach P. Li, Hongzhe Li, Mingyao Nucleic Acids Res Methods Online Correctly estimating isoform-specific gene expression is important for understanding complicated biological mechanisms and for mapping disease susceptibility genes. However, estimating isoform-specific gene expression is challenging because various biases present in RNA-Seq (RNA sequencing) data complicate the analysis, and if not appropriately corrected, can affect isoform expression estimation and downstream analysis. In this article, we present PennSeq, a statistical method that allows each isoform to have its own non-uniform read distribution. Instead of making parametric assumptions, we give adequate weight to the underlying data by the use of a non-parametric approach. Our rationale is that regardless what factors lead to non-uniformity, whether it is due to hexamer priming bias, local sequence bias, positional bias, RNA degradation, mapping bias or other unknown reasons, the probability that a fragment is sampled from a particular region will be reflected in the aligned data. This empirical approach thus maximally reflects the true underlying non-uniform read distribution. We evaluate the performance of PennSeq using both simulated data with known ground truth, and using two real Illumina RNA-Seq data sets including one with quantitative real time polymerase chain reaction measurements. Our results indicate superior performance of PennSeq over existing methods, particularly for isoforms demonstrating severe non-uniformity. PennSeq is freely available for download at http://sourceforge.net/projects/pennseq. Oxford University Press 2014-02 2013-12-20 /pmc/articles/PMC3919567/ /pubmed/24362841 http://dx.doi.org/10.1093/nar/gkt1304 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Online Hu, Yu Liu, Yichuan Mao, Xianyun Jia, Cheng Ferguson, Jane F. Xue, Chenyi Reilly, Muredach P. Li, Hongzhe Li, Mingyao PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution |
title | PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution |
title_full | PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution |
title_fullStr | PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution |
title_full_unstemmed | PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution |
title_short | PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution |
title_sort | pennseq: accurate isoform-specific gene expression quantification in rna-seq by modeling non-uniform read distribution |
topic | Methods Online |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3919567/ https://www.ncbi.nlm.nih.gov/pubmed/24362841 http://dx.doi.org/10.1093/nar/gkt1304 |
work_keys_str_mv | AT huyu pennseqaccurateisoformspecificgeneexpressionquantificationinrnaseqbymodelingnonuniformreaddistribution AT liuyichuan pennseqaccurateisoformspecificgeneexpressionquantificationinrnaseqbymodelingnonuniformreaddistribution AT maoxianyun pennseqaccurateisoformspecificgeneexpressionquantificationinrnaseqbymodelingnonuniformreaddistribution AT jiacheng pennseqaccurateisoformspecificgeneexpressionquantificationinrnaseqbymodelingnonuniformreaddistribution AT fergusonjanef pennseqaccurateisoformspecificgeneexpressionquantificationinrnaseqbymodelingnonuniformreaddistribution AT xuechenyi pennseqaccurateisoformspecificgeneexpressionquantificationinrnaseqbymodelingnonuniformreaddistribution AT reillymuredachp pennseqaccurateisoformspecificgeneexpressionquantificationinrnaseqbymodelingnonuniformreaddistribution AT lihongzhe pennseqaccurateisoformspecificgeneexpressionquantificationinrnaseqbymodelingnonuniformreaddistribution AT limingyao pennseqaccurateisoformspecificgeneexpressionquantificationinrnaseqbymodelingnonuniformreaddistribution |