Cargando…

RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome

BACKGROUND: RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemb...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Bo, Dewey, Colin N
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3163565/
https://www.ncbi.nlm.nih.gov/pubmed/21816040
http://dx.doi.org/10.1186/1471-2105-12-323
_version_ 1782210962686738432
author Li, Bo
Dewey, Colin N
author_facet Li, Bo
Dewey, Colin N
author_sort Li, Bo
collection PubMed
description BACKGROUND: RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. RESULTS: We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. CONCLUSIONS: RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive.
format Online
Article
Text
id pubmed-3163565
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31635652011-08-30 RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome Li, Bo Dewey, Colin N BMC Bioinformatics Software BACKGROUND: RNA-Seq is revolutionizing the way transcript abundances are measured. A key challenge in transcript quantification from RNA-Seq data is the handling of reads that map to multiple genes or isoforms. This issue is particularly important for quantification with de novo transcriptome assemblies in the absence of sequenced genomes, as it is difficult to determine which transcripts are isoforms of the same gene. A second significant issue is the design of RNA-Seq experiments, in terms of the number of reads, read length, and whether reads come from one or both ends of cDNA fragments. RESULTS: We present RSEM, an user-friendly software package for quantifying gene and isoform abundances from single-end or paired-end RNA-Seq data. RSEM outputs abundance estimates, 95% credibility intervals, and visualization files and can also simulate RNA-Seq data. In contrast to other existing tools, the software does not require a reference genome. Thus, in combination with a de novo transcriptome assembler, RSEM enables accurate transcript quantification for species without sequenced genomes. On simulated and real data sets, RSEM has superior or comparable performance to quantification methods that rely on a reference genome. Taking advantage of RSEM's ability to effectively use ambiguously-mapping reads, we show that accurate gene-level abundance estimates are best obtained with large numbers of short single-end reads. On the other hand, estimates of the relative frequencies of isoforms within single genes may be improved through the use of paired-end reads, depending on the number of possible splice forms for each gene. CONCLUSIONS: RSEM is an accurate and user-friendly software tool for quantifying transcript abundances from RNA-Seq data. As it does not rely on the existence of a reference genome, it is particularly useful for quantification with de novo transcriptome assemblies. In addition, RSEM has enabled valuable guidance for cost-efficient design of quantification experiments with RNA-Seq, which is currently relatively expensive. BioMed Central 2011-08-04 /pmc/articles/PMC3163565/ /pubmed/21816040 http://dx.doi.org/10.1186/1471-2105-12-323 Text en Copyright © 2011 Li and Dewey; licensee BioMed Central Ltd. https://creativecommons.org/licenses/by/2.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0 (https://creativecommons.org/licenses/by/2.0/) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Li, Bo
Dewey, Colin N
RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
title RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
title_full RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
title_fullStr RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
title_full_unstemmed RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
title_short RSEM: accurate transcript quantification from RNA-Seq data with or without a reference genome
title_sort rsem: accurate transcript quantification from rna-seq data with or without a reference genome
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3163565/
https://www.ncbi.nlm.nih.gov/pubmed/21816040
http://dx.doi.org/10.1186/1471-2105-12-323
work_keys_str_mv AT libo rsemaccuratetranscriptquantificationfromrnaseqdatawithorwithoutareferencegenome
AT deweycolinn rsemaccuratetranscriptquantificationfromrnaseqdatawithorwithoutareferencegenome