Cargando…

A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data

BACKGROUND: The recent advance of high-throughput sequencing makes it feasible to study entire transcriptomes through the application of de novo sequence assembly algorithms. While a popular strategy is to first construct an intermediate de Bruijn graph structure to represent the transcriptome, an a...

Descripción completa

Detalles Bibliográficos
Autores principales: Sze, Sing-Hoi, Tarone, Aaron M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4120145/
https://www.ncbi.nlm.nih.gov/pubmed/25082000
http://dx.doi.org/10.1186/1471-2164-15-S5-S6
_version_ 1782329045622456320
author Sze, Sing-Hoi
Tarone, Aaron M
author_facet Sze, Sing-Hoi
Tarone, Aaron M
author_sort Sze, Sing-Hoi
collection PubMed
description BACKGROUND: The recent advance of high-throughput sequencing makes it feasible to study entire transcriptomes through the application of de novo sequence assembly algorithms. While a popular strategy is to first construct an intermediate de Bruijn graph structure to represent the transcriptome, an additional step is needed to construct predicted transcripts from the graph. RESULTS: Since the de Bruijn graph contains all branching possibilities, we develop a memory-efficient algorithm to recover alternative splicing information and library-specific expression information directly from the graph without prior genomic knowledge. We implement the algorithm as a postprocessing module of the Velvet assembler. We validate our algorithm by simulating the transcriptome assembly of Drosophila using its known genome, and by performing Drosophila transcriptome assembly using publicly available RNA-Seq libraries. Under a range of conditions, our algorithm recovers sequences and alternative splicing junctions with higher specificity than Oases or Trans-ABySS. CONCLUSIONS: Since our postprocessing algorithm does not consume as much memory as Velvet and is less memory-intensive than Oases, it allows biologists to assemble large libraries with limited computational resources. Our algorithm has been applied to perform transcriptome assembly of the non-model blow fly Lucilia sericata that was reported in a previous article, which shows that the assembly is of high quality and it facilitates comparison of the Lucilia sericata transcriptome to Drosophila and two mosquitoes, prediction and experimental validation of alternative splicing, investigation of differential expression among various developmental stages, and identification of transposable elements.
format Online
Article
Text
id pubmed-4120145
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-41201452014-08-11 A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data Sze, Sing-Hoi Tarone, Aaron M BMC Genomics Research BACKGROUND: The recent advance of high-throughput sequencing makes it feasible to study entire transcriptomes through the application of de novo sequence assembly algorithms. While a popular strategy is to first construct an intermediate de Bruijn graph structure to represent the transcriptome, an additional step is needed to construct predicted transcripts from the graph. RESULTS: Since the de Bruijn graph contains all branching possibilities, we develop a memory-efficient algorithm to recover alternative splicing information and library-specific expression information directly from the graph without prior genomic knowledge. We implement the algorithm as a postprocessing module of the Velvet assembler. We validate our algorithm by simulating the transcriptome assembly of Drosophila using its known genome, and by performing Drosophila transcriptome assembly using publicly available RNA-Seq libraries. Under a range of conditions, our algorithm recovers sequences and alternative splicing junctions with higher specificity than Oases or Trans-ABySS. CONCLUSIONS: Since our postprocessing algorithm does not consume as much memory as Velvet and is less memory-intensive than Oases, it allows biologists to assemble large libraries with limited computational resources. Our algorithm has been applied to perform transcriptome assembly of the non-model blow fly Lucilia sericata that was reported in a previous article, which shows that the assembly is of high quality and it facilitates comparison of the Lucilia sericata transcriptome to Drosophila and two mosquitoes, prediction and experimental validation of alternative splicing, investigation of differential expression among various developmental stages, and identification of transposable elements. BioMed Central 2014-07-14 /pmc/articles/PMC4120145/ /pubmed/25082000 http://dx.doi.org/10.1186/1471-2164-15-S5-S6 Text en Copyright © 2014 Sze and Tarone; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Sze, Sing-Hoi
Tarone, Aaron M
A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data
title A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data
title_full A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data
title_fullStr A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data
title_full_unstemmed A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data
title_short A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data
title_sort memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de bruijn graphs of rna-seq data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4120145/
https://www.ncbi.nlm.nih.gov/pubmed/25082000
http://dx.doi.org/10.1186/1471-2164-15-S5-S6
work_keys_str_mv AT szesinghoi amemoryefficientalgorithmtoobtainsplicinggraphsanddenovoexpressionestimatesfromdebruijngraphsofrnaseqdata
AT taroneaaronm amemoryefficientalgorithmtoobtainsplicinggraphsanddenovoexpressionestimatesfromdebruijngraphsofrnaseqdata
AT szesinghoi memoryefficientalgorithmtoobtainsplicinggraphsanddenovoexpressionestimatesfromdebruijngraphsofrnaseqdata
AT taroneaaronm memoryefficientalgorithmtoobtainsplicinggraphsanddenovoexpressionestimatesfromdebruijngraphsofrnaseqdata