Cargando…

PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution

Correctly estimating isoform-specific gene expression is important for understanding complicated biological mechanisms and for mapping disease susceptibility genes. However, estimating isoform-specific gene expression is challenging because various biases present in RNA-Seq (RNA sequencing) data com...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Yu, Liu, Yichuan, Mao, Xianyun, Jia, Cheng, Ferguson, Jane F., Xue, Chenyi, Reilly, Muredach P., Li, Hongzhe, Li, Mingyao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3919567/
https://www.ncbi.nlm.nih.gov/pubmed/24362841
http://dx.doi.org/10.1093/nar/gkt1304
_version_ 1782303047532150784
author Hu, Yu
Liu, Yichuan
Mao, Xianyun
Jia, Cheng
Ferguson, Jane F.
Xue, Chenyi
Reilly, Muredach P.
Li, Hongzhe
Li, Mingyao
author_facet Hu, Yu
Liu, Yichuan
Mao, Xianyun
Jia, Cheng
Ferguson, Jane F.
Xue, Chenyi
Reilly, Muredach P.
Li, Hongzhe
Li, Mingyao
author_sort Hu, Yu
collection PubMed
description Correctly estimating isoform-specific gene expression is important for understanding complicated biological mechanisms and for mapping disease susceptibility genes. However, estimating isoform-specific gene expression is challenging because various biases present in RNA-Seq (RNA sequencing) data complicate the analysis, and if not appropriately corrected, can affect isoform expression estimation and downstream analysis. In this article, we present PennSeq, a statistical method that allows each isoform to have its own non-uniform read distribution. Instead of making parametric assumptions, we give adequate weight to the underlying data by the use of a non-parametric approach. Our rationale is that regardless what factors lead to non-uniformity, whether it is due to hexamer priming bias, local sequence bias, positional bias, RNA degradation, mapping bias or other unknown reasons, the probability that a fragment is sampled from a particular region will be reflected in the aligned data. This empirical approach thus maximally reflects the true underlying non-uniform read distribution. We evaluate the performance of PennSeq using both simulated data with known ground truth, and using two real Illumina RNA-Seq data sets including one with quantitative real time polymerase chain reaction measurements. Our results indicate superior performance of PennSeq over existing methods, particularly for isoforms demonstrating severe non-uniformity. PennSeq is freely available for download at http://sourceforge.net/projects/pennseq.
format Online
Article
Text
id pubmed-3919567
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-39195672014-02-10 PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution Hu, Yu Liu, Yichuan Mao, Xianyun Jia, Cheng Ferguson, Jane F. Xue, Chenyi Reilly, Muredach P. Li, Hongzhe Li, Mingyao Nucleic Acids Res Methods Online Correctly estimating isoform-specific gene expression is important for understanding complicated biological mechanisms and for mapping disease susceptibility genes. However, estimating isoform-specific gene expression is challenging because various biases present in RNA-Seq (RNA sequencing) data complicate the analysis, and if not appropriately corrected, can affect isoform expression estimation and downstream analysis. In this article, we present PennSeq, a statistical method that allows each isoform to have its own non-uniform read distribution. Instead of making parametric assumptions, we give adequate weight to the underlying data by the use of a non-parametric approach. Our rationale is that regardless what factors lead to non-uniformity, whether it is due to hexamer priming bias, local sequence bias, positional bias, RNA degradation, mapping bias or other unknown reasons, the probability that a fragment is sampled from a particular region will be reflected in the aligned data. This empirical approach thus maximally reflects the true underlying non-uniform read distribution. We evaluate the performance of PennSeq using both simulated data with known ground truth, and using two real Illumina RNA-Seq data sets including one with quantitative real time polymerase chain reaction measurements. Our results indicate superior performance of PennSeq over existing methods, particularly for isoforms demonstrating severe non-uniformity. PennSeq is freely available for download at http://sourceforge.net/projects/pennseq. Oxford University Press 2014-02 2013-12-20 /pmc/articles/PMC3919567/ /pubmed/24362841 http://dx.doi.org/10.1093/nar/gkt1304 Text en © The Author(s) 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Hu, Yu
Liu, Yichuan
Mao, Xianyun
Jia, Cheng
Ferguson, Jane F.
Xue, Chenyi
Reilly, Muredach P.
Li, Hongzhe
Li, Mingyao
PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution
title PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution
title_full PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution
title_fullStr PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution
title_full_unstemmed PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution
title_short PennSeq: accurate isoform-specific gene expression quantification in RNA-Seq by modeling non-uniform read distribution
title_sort pennseq: accurate isoform-specific gene expression quantification in rna-seq by modeling non-uniform read distribution
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3919567/
https://www.ncbi.nlm.nih.gov/pubmed/24362841
http://dx.doi.org/10.1093/nar/gkt1304
work_keys_str_mv AT huyu pennseqaccurateisoformspecificgeneexpressionquantificationinrnaseqbymodelingnonuniformreaddistribution
AT liuyichuan pennseqaccurateisoformspecificgeneexpressionquantificationinrnaseqbymodelingnonuniformreaddistribution
AT maoxianyun pennseqaccurateisoformspecificgeneexpressionquantificationinrnaseqbymodelingnonuniformreaddistribution
AT jiacheng pennseqaccurateisoformspecificgeneexpressionquantificationinrnaseqbymodelingnonuniformreaddistribution
AT fergusonjanef pennseqaccurateisoformspecificgeneexpressionquantificationinrnaseqbymodelingnonuniformreaddistribution
AT xuechenyi pennseqaccurateisoformspecificgeneexpressionquantificationinrnaseqbymodelingnonuniformreaddistribution
AT reillymuredachp pennseqaccurateisoformspecificgeneexpressionquantificationinrnaseqbymodelingnonuniformreaddistribution
AT lihongzhe pennseqaccurateisoformspecificgeneexpressionquantificationinrnaseqbymodelingnonuniformreaddistribution
AT limingyao pennseqaccurateisoformspecificgeneexpressionquantificationinrnaseqbymodelingnonuniformreaddistribution