Cargando…

A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms

BACKGROUND: While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sze, Sing-Hoi, Parrott, Jonathan J., Tarone, Aaron M.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5731495/ https://www.ncbi.nlm.nih.gov/pubmed/29244008 http://dx.doi.org/10.1186/s12864-017-4270-9

Descripción
Sumario:	BACKGROUND: While the continued development of high-throughput sequencing has facilitated studies of entire transcriptomes in non-model organisms, the incorporation of an increasing amount of RNA-Seq libraries has made de novo transcriptome assembly difficult. Although algorithms that can assemble a large amount of RNA-Seq data are available, they are generally very memory-intensive and can only be used to construct small assemblies. RESULTS: We develop a divide-and-conquer strategy that allows these algorithms to be utilized, by subdividing a large RNA-Seq data set into small libraries. Each individual library is assembled independently by an existing algorithm, and a merging algorithm is developed to combine these assemblies by picking a subset of high quality transcripts to form a large transcriptome. When compared to existing algorithms that return a single assembly directly, this strategy achieves comparable or increased accuracy as memory-efficient algorithms that can be used to process a large amount of RNA-Seq data, and comparable or decreased accuracy as memory-intensive algorithms that can only be used to construct small assemblies. CONCLUSIONS: Our divide-and-conquer strategy allows memory-intensive de novo transcriptome assembly algorithms to be utilized to construct large assemblies.

A divide-and-conquer algorithm for large-scale de novo transcriptome assembly through combining small assemblies from existing algorithms

Ejemplares similares