Cargando…

RNA-Seq gene expression estimation with read mapping uncertainty

Motivation: RNA-Seq is a promising new technology for accurately measuring gene expression levels. Expression estimation with RNA-Seq requires the mapping of relatively short sequencing reads to a reference genome or transcript set. Because reads are generally shorter than transcripts from which the...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Bo, Ruotti, Victor, Stewart, Ron M., Thomson, James A., Dewey, Colin N.
Formato: Texto
Lenguaje:English
Publicado: Oxford University Press 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2820677/
https://www.ncbi.nlm.nih.gov/pubmed/20022975
http://dx.doi.org/10.1093/bioinformatics/btp692
_version_ 1782177402175094784
author Li, Bo
Ruotti, Victor
Stewart, Ron M.
Thomson, James A.
Dewey, Colin N.
author_facet Li, Bo
Ruotti, Victor
Stewart, Ron M.
Thomson, James A.
Dewey, Colin N.
author_sort Li, Bo
collection PubMed
description Motivation: RNA-Seq is a promising new technology for accurately measuring gene expression levels. Expression estimation with RNA-Seq requires the mapping of relatively short sequencing reads to a reference genome or transcript set. Because reads are generally shorter than transcripts from which they are derived, a single read may map to multiple genes and isoforms, complicating expression analyses. Previous computational methods either discard reads that map to multiple locations or allocate them to genes heuristically. Results: We present a generative statistical model and associated inference methods that handle read mapping uncertainty in a principled manner. Through simulations parameterized by real RNA-Seq data, we show that our method is more accurate than previous methods. Our improved accuracy is the result of handling read mapping uncertainty with a statistical model and the estimation of gene expression levels as the sum of isoform expression levels. Unlike previous methods, our method is capable of modeling non-uniform read distributions. Simulations with our method indicate that a read length of 20–25 bases is optimal for gene-level expression estimation from mouse and maize RNA-Seq data when sequencing throughput is fixed. Availability: An initial C++ implementation of our method that was used for the results presented in this article is available at http://deweylab.biostat.wisc.edu/rsem. Contact: cdewey@biostat.wisc.edu Supplementary information: Supplementary data are available at Bioinformatics on
format Text
id pubmed-2820677
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-28206772010-02-12 RNA-Seq gene expression estimation with read mapping uncertainty Li, Bo Ruotti, Victor Stewart, Ron M. Thomson, James A. Dewey, Colin N. Bioinformatics Original Papers Motivation: RNA-Seq is a promising new technology for accurately measuring gene expression levels. Expression estimation with RNA-Seq requires the mapping of relatively short sequencing reads to a reference genome or transcript set. Because reads are generally shorter than transcripts from which they are derived, a single read may map to multiple genes and isoforms, complicating expression analyses. Previous computational methods either discard reads that map to multiple locations or allocate them to genes heuristically. Results: We present a generative statistical model and associated inference methods that handle read mapping uncertainty in a principled manner. Through simulations parameterized by real RNA-Seq data, we show that our method is more accurate than previous methods. Our improved accuracy is the result of handling read mapping uncertainty with a statistical model and the estimation of gene expression levels as the sum of isoform expression levels. Unlike previous methods, our method is capable of modeling non-uniform read distributions. Simulations with our method indicate that a read length of 20–25 bases is optimal for gene-level expression estimation from mouse and maize RNA-Seq data when sequencing throughput is fixed. Availability: An initial C++ implementation of our method that was used for the results presented in this article is available at http://deweylab.biostat.wisc.edu/rsem. Contact: cdewey@biostat.wisc.edu Supplementary information: Supplementary data are available at Bioinformatics on Oxford University Press 2010-02-15 2009-12-18 /pmc/articles/PMC2820677/ /pubmed/20022975 http://dx.doi.org/10.1093/bioinformatics/btp692 Text en © The Author(s) 2009. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Li, Bo
Ruotti, Victor
Stewart, Ron M.
Thomson, James A.
Dewey, Colin N.
RNA-Seq gene expression estimation with read mapping uncertainty
title RNA-Seq gene expression estimation with read mapping uncertainty
title_full RNA-Seq gene expression estimation with read mapping uncertainty
title_fullStr RNA-Seq gene expression estimation with read mapping uncertainty
title_full_unstemmed RNA-Seq gene expression estimation with read mapping uncertainty
title_short RNA-Seq gene expression estimation with read mapping uncertainty
title_sort rna-seq gene expression estimation with read mapping uncertainty
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2820677/
https://www.ncbi.nlm.nih.gov/pubmed/20022975
http://dx.doi.org/10.1093/bioinformatics/btp692
work_keys_str_mv AT libo rnaseqgeneexpressionestimationwithreadmappinguncertainty
AT ruottivictor rnaseqgeneexpressionestimationwithreadmappinguncertainty
AT stewartronm rnaseqgeneexpressionestimationwithreadmappinguncertainty
AT thomsonjamesa rnaseqgeneexpressionestimationwithreadmappinguncertainty
AT deweycolinn rnaseqgeneexpressionestimationwithreadmappinguncertainty