Cargando…
RNA-Seq gene expression estimation with read mapping uncertainty
Motivation: RNA-Seq is a promising new technology for accurately measuring gene expression levels. Expression estimation with RNA-Seq requires the mapping of relatively short sequencing reads to a reference genome or transcript set. Because reads are generally shorter than transcripts from which the...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2820677/ https://www.ncbi.nlm.nih.gov/pubmed/20022975 http://dx.doi.org/10.1093/bioinformatics/btp692 |
_version_ | 1782177402175094784 |
---|---|
author | Li, Bo Ruotti, Victor Stewart, Ron M. Thomson, James A. Dewey, Colin N. |
author_facet | Li, Bo Ruotti, Victor Stewart, Ron M. Thomson, James A. Dewey, Colin N. |
author_sort | Li, Bo |
collection | PubMed |
description | Motivation: RNA-Seq is a promising new technology for accurately measuring gene expression levels. Expression estimation with RNA-Seq requires the mapping of relatively short sequencing reads to a reference genome or transcript set. Because reads are generally shorter than transcripts from which they are derived, a single read may map to multiple genes and isoforms, complicating expression analyses. Previous computational methods either discard reads that map to multiple locations or allocate them to genes heuristically. Results: We present a generative statistical model and associated inference methods that handle read mapping uncertainty in a principled manner. Through simulations parameterized by real RNA-Seq data, we show that our method is more accurate than previous methods. Our improved accuracy is the result of handling read mapping uncertainty with a statistical model and the estimation of gene expression levels as the sum of isoform expression levels. Unlike previous methods, our method is capable of modeling non-uniform read distributions. Simulations with our method indicate that a read length of 20–25 bases is optimal for gene-level expression estimation from mouse and maize RNA-Seq data when sequencing throughput is fixed. Availability: An initial C++ implementation of our method that was used for the results presented in this article is available at http://deweylab.biostat.wisc.edu/rsem. Contact: cdewey@biostat.wisc.edu Supplementary information: Supplementary data are available at Bioinformatics on |
format | Text |
id | pubmed-2820677 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-28206772010-02-12 RNA-Seq gene expression estimation with read mapping uncertainty Li, Bo Ruotti, Victor Stewart, Ron M. Thomson, James A. Dewey, Colin N. Bioinformatics Original Papers Motivation: RNA-Seq is a promising new technology for accurately measuring gene expression levels. Expression estimation with RNA-Seq requires the mapping of relatively short sequencing reads to a reference genome or transcript set. Because reads are generally shorter than transcripts from which they are derived, a single read may map to multiple genes and isoforms, complicating expression analyses. Previous computational methods either discard reads that map to multiple locations or allocate them to genes heuristically. Results: We present a generative statistical model and associated inference methods that handle read mapping uncertainty in a principled manner. Through simulations parameterized by real RNA-Seq data, we show that our method is more accurate than previous methods. Our improved accuracy is the result of handling read mapping uncertainty with a statistical model and the estimation of gene expression levels as the sum of isoform expression levels. Unlike previous methods, our method is capable of modeling non-uniform read distributions. Simulations with our method indicate that a read length of 20–25 bases is optimal for gene-level expression estimation from mouse and maize RNA-Seq data when sequencing throughput is fixed. Availability: An initial C++ implementation of our method that was used for the results presented in this article is available at http://deweylab.biostat.wisc.edu/rsem. Contact: cdewey@biostat.wisc.edu Supplementary information: Supplementary data are available at Bioinformatics on Oxford University Press 2010-02-15 2009-12-18 /pmc/articles/PMC2820677/ /pubmed/20022975 http://dx.doi.org/10.1093/bioinformatics/btp692 Text en © The Author(s) 2009. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.5) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Original Papers Li, Bo Ruotti, Victor Stewart, Ron M. Thomson, James A. Dewey, Colin N. RNA-Seq gene expression estimation with read mapping uncertainty |
title | RNA-Seq gene expression estimation with read mapping uncertainty |
title_full | RNA-Seq gene expression estimation with read mapping uncertainty |
title_fullStr | RNA-Seq gene expression estimation with read mapping uncertainty |
title_full_unstemmed | RNA-Seq gene expression estimation with read mapping uncertainty |
title_short | RNA-Seq gene expression estimation with read mapping uncertainty |
title_sort | rna-seq gene expression estimation with read mapping uncertainty |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2820677/ https://www.ncbi.nlm.nih.gov/pubmed/20022975 http://dx.doi.org/10.1093/bioinformatics/btp692 |
work_keys_str_mv | AT libo rnaseqgeneexpressionestimationwithreadmappinguncertainty AT ruottivictor rnaseqgeneexpressionestimationwithreadmappinguncertainty AT stewartronm rnaseqgeneexpressionestimationwithreadmappinguncertainty AT thomsonjamesa rnaseqgeneexpressionestimationwithreadmappinguncertainty AT deweycolinn rnaseqgeneexpressionestimationwithreadmappinguncertainty |