Cargando…
A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data
BACKGROUND: The recent advance of high-throughput sequencing makes it feasible to study entire transcriptomes through the application of de novo sequence assembly algorithms. While a popular strategy is to first construct an intermediate de Bruijn graph structure to represent the transcriptome, an a...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4120145/ https://www.ncbi.nlm.nih.gov/pubmed/25082000 http://dx.doi.org/10.1186/1471-2164-15-S5-S6 |
_version_ | 1782329045622456320 |
---|---|
author | Sze, Sing-Hoi Tarone, Aaron M |
author_facet | Sze, Sing-Hoi Tarone, Aaron M |
author_sort | Sze, Sing-Hoi |
collection | PubMed |
description | BACKGROUND: The recent advance of high-throughput sequencing makes it feasible to study entire transcriptomes through the application of de novo sequence assembly algorithms. While a popular strategy is to first construct an intermediate de Bruijn graph structure to represent the transcriptome, an additional step is needed to construct predicted transcripts from the graph. RESULTS: Since the de Bruijn graph contains all branching possibilities, we develop a memory-efficient algorithm to recover alternative splicing information and library-specific expression information directly from the graph without prior genomic knowledge. We implement the algorithm as a postprocessing module of the Velvet assembler. We validate our algorithm by simulating the transcriptome assembly of Drosophila using its known genome, and by performing Drosophila transcriptome assembly using publicly available RNA-Seq libraries. Under a range of conditions, our algorithm recovers sequences and alternative splicing junctions with higher specificity than Oases or Trans-ABySS. CONCLUSIONS: Since our postprocessing algorithm does not consume as much memory as Velvet and is less memory-intensive than Oases, it allows biologists to assemble large libraries with limited computational resources. Our algorithm has been applied to perform transcriptome assembly of the non-model blow fly Lucilia sericata that was reported in a previous article, which shows that the assembly is of high quality and it facilitates comparison of the Lucilia sericata transcriptome to Drosophila and two mosquitoes, prediction and experimental validation of alternative splicing, investigation of differential expression among various developmental stages, and identification of transposable elements. |
format | Online Article Text |
id | pubmed-4120145 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-41201452014-08-11 A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data Sze, Sing-Hoi Tarone, Aaron M BMC Genomics Research BACKGROUND: The recent advance of high-throughput sequencing makes it feasible to study entire transcriptomes through the application of de novo sequence assembly algorithms. While a popular strategy is to first construct an intermediate de Bruijn graph structure to represent the transcriptome, an additional step is needed to construct predicted transcripts from the graph. RESULTS: Since the de Bruijn graph contains all branching possibilities, we develop a memory-efficient algorithm to recover alternative splicing information and library-specific expression information directly from the graph without prior genomic knowledge. We implement the algorithm as a postprocessing module of the Velvet assembler. We validate our algorithm by simulating the transcriptome assembly of Drosophila using its known genome, and by performing Drosophila transcriptome assembly using publicly available RNA-Seq libraries. Under a range of conditions, our algorithm recovers sequences and alternative splicing junctions with higher specificity than Oases or Trans-ABySS. CONCLUSIONS: Since our postprocessing algorithm does not consume as much memory as Velvet and is less memory-intensive than Oases, it allows biologists to assemble large libraries with limited computational resources. Our algorithm has been applied to perform transcriptome assembly of the non-model blow fly Lucilia sericata that was reported in a previous article, which shows that the assembly is of high quality and it facilitates comparison of the Lucilia sericata transcriptome to Drosophila and two mosquitoes, prediction and experimental validation of alternative splicing, investigation of differential expression among various developmental stages, and identification of transposable elements. BioMed Central 2014-07-14 /pmc/articles/PMC4120145/ /pubmed/25082000 http://dx.doi.org/10.1186/1471-2164-15-S5-S6 Text en Copyright © 2014 Sze and Tarone; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Sze, Sing-Hoi Tarone, Aaron M A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data |
title | A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data |
title_full | A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data |
title_fullStr | A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data |
title_full_unstemmed | A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data |
title_short | A memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de Bruijn graphs of RNA-Seq data |
title_sort | memory-efficient algorithm to obtain splicing graphs and de novo expression estimates from de bruijn graphs of rna-seq data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4120145/ https://www.ncbi.nlm.nih.gov/pubmed/25082000 http://dx.doi.org/10.1186/1471-2164-15-S5-S6 |
work_keys_str_mv | AT szesinghoi amemoryefficientalgorithmtoobtainsplicinggraphsanddenovoexpressionestimatesfromdebruijngraphsofrnaseqdata AT taroneaaronm amemoryefficientalgorithmtoobtainsplicinggraphsanddenovoexpressionestimatesfromdebruijngraphsofrnaseqdata AT szesinghoi memoryefficientalgorithmtoobtainsplicinggraphsanddenovoexpressionestimatesfromdebruijngraphsofrnaseqdata AT taroneaaronm memoryefficientalgorithmtoobtainsplicinggraphsanddenovoexpressionestimatesfromdebruijngraphsofrnaseqdata |