Cargando…

De novo meta-assembly of ultra-deep sequencing data

We introduce a new divide and conquer approach to deal with the problem of de novo genome assembly in the presence of ultra-deep sequencing data (i.e. coverage of 1000x or higher). Our proposed meta-assembler Slicembler partitions the input data into optimal-sized ‘slices’ and uses a standard assemb...

Descripción completa

Detalles Bibliográficos
Autores principales: Mirebrahim, Hamid, Close, Timothy J., Lonardi, Stefano
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4765875/
https://www.ncbi.nlm.nih.gov/pubmed/26072514
http://dx.doi.org/10.1093/bioinformatics/btv226
_version_ 1782417587848609792
author Mirebrahim, Hamid
Close, Timothy J.
Lonardi, Stefano
author_facet Mirebrahim, Hamid
Close, Timothy J.
Lonardi, Stefano
author_sort Mirebrahim, Hamid
collection PubMed
description We introduce a new divide and conquer approach to deal with the problem of de novo genome assembly in the presence of ultra-deep sequencing data (i.e. coverage of 1000x or higher). Our proposed meta-assembler Slicembler partitions the input data into optimal-sized ‘slices’ and uses a standard assembly tool (e.g. Velvet, SPAdes, IDBA_UD and Ray) to assemble each slice individually. Slicembler uses majority voting among the individual assemblies to identify long contigs that can be merged to the consensus assembly. To improve its efficiency, Slicembler uses a generalized suffix tree to identify these frequent contigs (or fraction thereof). Extensive experimental results on real ultra-deep sequencing data (8000x coverage) and simulated data show that Slicembler significantly improves the quality of the assembly compared with the performance of the base assembler. In fact, most of the times, Slicembler generates error-free assemblies. We also show that Slicembler is much more resistant against high sequencing error rate than the base assembler. Availability and implementation: Slicembler can be accessed at http://slicembler.cs.ucr.edu/. Contact: hamid.mirebrahim@email.ucr.edu
format Online
Article
Text
id pubmed-4765875
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-47658752016-03-04 De novo meta-assembly of ultra-deep sequencing data Mirebrahim, Hamid Close, Timothy J. Lonardi, Stefano Bioinformatics Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland We introduce a new divide and conquer approach to deal with the problem of de novo genome assembly in the presence of ultra-deep sequencing data (i.e. coverage of 1000x or higher). Our proposed meta-assembler Slicembler partitions the input data into optimal-sized ‘slices’ and uses a standard assembly tool (e.g. Velvet, SPAdes, IDBA_UD and Ray) to assemble each slice individually. Slicembler uses majority voting among the individual assemblies to identify long contigs that can be merged to the consensus assembly. To improve its efficiency, Slicembler uses a generalized suffix tree to identify these frequent contigs (or fraction thereof). Extensive experimental results on real ultra-deep sequencing data (8000x coverage) and simulated data show that Slicembler significantly improves the quality of the assembly compared with the performance of the base assembler. In fact, most of the times, Slicembler generates error-free assemblies. We also show that Slicembler is much more resistant against high sequencing error rate than the base assembler. Availability and implementation: Slicembler can be accessed at http://slicembler.cs.ucr.edu/. Contact: hamid.mirebrahim@email.ucr.edu Oxford University Press 2015-06-15 2015-06-10 /pmc/articles/PMC4765875/ /pubmed/26072514 http://dx.doi.org/10.1093/bioinformatics/btv226 Text en © The Author 2015. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland
Mirebrahim, Hamid
Close, Timothy J.
Lonardi, Stefano
De novo meta-assembly of ultra-deep sequencing data
title De novo meta-assembly of ultra-deep sequencing data
title_full De novo meta-assembly of ultra-deep sequencing data
title_fullStr De novo meta-assembly of ultra-deep sequencing data
title_full_unstemmed De novo meta-assembly of ultra-deep sequencing data
title_short De novo meta-assembly of ultra-deep sequencing data
title_sort de novo meta-assembly of ultra-deep sequencing data
topic Ismb/Eccb 2015 Proceedings Papers Committee July 10 to July 14, 2015, Dublin, Ireland
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4765875/
https://www.ncbi.nlm.nih.gov/pubmed/26072514
http://dx.doi.org/10.1093/bioinformatics/btv226
work_keys_str_mv AT mirebrahimhamid denovometaassemblyofultradeepsequencingdata
AT closetimothyj denovometaassemblyofultradeepsequencingdata
AT lonardistefano denovometaassemblyofultradeepsequencingdata