Cargando…

Enhancing De Novo Transcriptome Assembly by Incorporating Multiple Overlap Sizes

Background. The emergence of next-generation sequencing platform gives rise to a new generation of assembly algorithms. Compared with the Sanger sequencing data, the next-generation sequence data present shorter reads, higher coverage depth, and different error profiles. These features bring new cha...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Chien-Chih, Lin, Wen-Dar, Chang, Yu-Jung, Chen, Chuen-Liang, Ho, Jan-Ming
Formato: Online Artículo Texto
Lenguaje:English
Publicado: International Scholarly Research Network 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4417554/
https://www.ncbi.nlm.nih.gov/pubmed/25969752
http://dx.doi.org/10.5402/2012/816402
Descripción
Sumario:Background. The emergence of next-generation sequencing platform gives rise to a new generation of assembly algorithms. Compared with the Sanger sequencing data, the next-generation sequence data present shorter reads, higher coverage depth, and different error profiles. These features bring new challenging issues for de novo transcriptome assembly. Methodology. To explore the influence of these features on assembly algorithms, we studied the relationship between read overlap size, coverage depth, and error rate using simulated data. According to the relationship, we propose a de novo transcriptome assembly procedure, called Euler-mix, and demonstrate its performance on a real transcriptome dataset of mice. The simulation tool and evaluation tool are freely available as open source. Significance. Euler-mix is a straightforward pipeline; it focuses on dealing with the variation of coverage depth of short reads dataset. The experiment result showed that Euler-mix improves the performance of de novo transcriptome assembly.