Cargando…

Strawberry: Fast and accurate genome-guided transcript reconstruction and quantification from RNA-Seq

We propose a novel method and software tool, Strawberry, for transcript reconstruction and quantification from RNA-Seq data under the guidance of genome alignment and independent of gene annotation. Strawberry consists of two modules: assembly and quantification. The novelty of Strawberry is that th...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Ruolin, Dickerson, Julie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5720828/
https://www.ncbi.nlm.nih.gov/pubmed/29176847
http://dx.doi.org/10.1371/journal.pcbi.1005851
_version_ 1783284740314365952
author Liu, Ruolin
Dickerson, Julie
author_facet Liu, Ruolin
Dickerson, Julie
author_sort Liu, Ruolin
collection PubMed
description We propose a novel method and software tool, Strawberry, for transcript reconstruction and quantification from RNA-Seq data under the guidance of genome alignment and independent of gene annotation. Strawberry consists of two modules: assembly and quantification. The novelty of Strawberry is that the two modules use different optimization frameworks but utilize the same data graph structure, which allows a highly efficient, expandable and accurate algorithm for dealing large data. The assembly module parses aligned reads into splicing graphs, and uses network flow algorithms to select the most likely transcripts. The quantification module uses a latent class model to assign read counts from the nodes of splicing graphs to transcripts. Strawberry simultaneously estimates the transcript abundances and corrects for sequencing bias through an EM algorithm. Based on simulations, Strawberry outperforms Cufflinks and StringTie in terms of both assembly and quantification accuracies. Under the evaluation of a real data set, the estimated transcript expression by Strawberry has the highest correlation with Nanostring probe counts, an independent experiment measure for transcript expression. Availability: Strawberry is written in C++14, and is available as open source software at https://github.com/ruolin/strawberry under the MIT license.
format Online
Article
Text
id pubmed-5720828
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-57208282017-12-15 Strawberry: Fast and accurate genome-guided transcript reconstruction and quantification from RNA-Seq Liu, Ruolin Dickerson, Julie PLoS Comput Biol Research Article We propose a novel method and software tool, Strawberry, for transcript reconstruction and quantification from RNA-Seq data under the guidance of genome alignment and independent of gene annotation. Strawberry consists of two modules: assembly and quantification. The novelty of Strawberry is that the two modules use different optimization frameworks but utilize the same data graph structure, which allows a highly efficient, expandable and accurate algorithm for dealing large data. The assembly module parses aligned reads into splicing graphs, and uses network flow algorithms to select the most likely transcripts. The quantification module uses a latent class model to assign read counts from the nodes of splicing graphs to transcripts. Strawberry simultaneously estimates the transcript abundances and corrects for sequencing bias through an EM algorithm. Based on simulations, Strawberry outperforms Cufflinks and StringTie in terms of both assembly and quantification accuracies. Under the evaluation of a real data set, the estimated transcript expression by Strawberry has the highest correlation with Nanostring probe counts, an independent experiment measure for transcript expression. Availability: Strawberry is written in C++14, and is available as open source software at https://github.com/ruolin/strawberry under the MIT license. Public Library of Science 2017-11-27 /pmc/articles/PMC5720828/ /pubmed/29176847 http://dx.doi.org/10.1371/journal.pcbi.1005851 Text en © 2017 Liu, Dickerson http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Liu, Ruolin
Dickerson, Julie
Strawberry: Fast and accurate genome-guided transcript reconstruction and quantification from RNA-Seq
title Strawberry: Fast and accurate genome-guided transcript reconstruction and quantification from RNA-Seq
title_full Strawberry: Fast and accurate genome-guided transcript reconstruction and quantification from RNA-Seq
title_fullStr Strawberry: Fast and accurate genome-guided transcript reconstruction and quantification from RNA-Seq
title_full_unstemmed Strawberry: Fast and accurate genome-guided transcript reconstruction and quantification from RNA-Seq
title_short Strawberry: Fast and accurate genome-guided transcript reconstruction and quantification from RNA-Seq
title_sort strawberry: fast and accurate genome-guided transcript reconstruction and quantification from rna-seq
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5720828/
https://www.ncbi.nlm.nih.gov/pubmed/29176847
http://dx.doi.org/10.1371/journal.pcbi.1005851
work_keys_str_mv AT liuruolin strawberryfastandaccurategenomeguidedtranscriptreconstructionandquantificationfromrnaseq
AT dickersonjulie strawberryfastandaccurategenomeguidedtranscriptreconstructionandquantificationfromrnaseq