Cargando…

SEQuel: improving the accuracy of genome assemblies

Motivation: Assemblies of next-generation sequencing (NGS) data, although accurate, still contain a substantial number of errors that need to be corrected after the assembly process. We develop SEQuel, a tool that corrects errors (i.e. insertions, deletions and substitution errors) in the assembled...

Descripción completa

Detalles Bibliográficos
Autores principales: Ronen, Roy, Boucher, Christina, Chitsaz, Hamidreza, Pevzner, Pavel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3371851/
https://www.ncbi.nlm.nih.gov/pubmed/22689760
http://dx.doi.org/10.1093/bioinformatics/bts219
_version_ 1782235270460997632
author Ronen, Roy
Boucher, Christina
Chitsaz, Hamidreza
Pevzner, Pavel
author_facet Ronen, Roy
Boucher, Christina
Chitsaz, Hamidreza
Pevzner, Pavel
author_sort Ronen, Roy
collection PubMed
description Motivation: Assemblies of next-generation sequencing (NGS) data, although accurate, still contain a substantial number of errors that need to be corrected after the assembly process. We develop SEQuel, a tool that corrects errors (i.e. insertions, deletions and substitution errors) in the assembled contigs. Fundamental to the algorithm behind SEQuel is the positional de Bruijn graph, a graph structure that models k-mers within reads while incorporating the approximate positions of reads into the model. Results: SEQuel reduced the number of small insertions and deletions in the assemblies of standard multi-cell Escherichia coli data by almost half, and corrected between 30% and 94% of the substitution errors. Further, we show SEQuel is imperative to improving single-cell assembly, which is inherently more challenging due to higher error rates and non-uniform coverage; over half of the small indels, and substitution errors in the single-cell assemblies were corrected. We apply SEQuel to the recently assembled Deltaproteobacterium SAR324 genome, which is the first bacterial genome with a comprehensive single-cell genome assembly, and make over 800 changes (insertions, deletions and substitutions) to refine this assembly. Availability: SEQuel can be used as a post-processing step in combination with any NGS assembler and is freely available at http://bix.ucsd.edu/SEQuel/. Contact: ppevzner@cs.ucsd.edu
format Online
Article
Text
id pubmed-3371851
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-33718512012-06-11 SEQuel: improving the accuracy of genome assemblies Ronen, Roy Boucher, Christina Chitsaz, Hamidreza Pevzner, Pavel Bioinformatics Ismb 2012 Proceedings Papers Committee July 15 to July 19, 2012, Long Beach, Ca, Usa Motivation: Assemblies of next-generation sequencing (NGS) data, although accurate, still contain a substantial number of errors that need to be corrected after the assembly process. We develop SEQuel, a tool that corrects errors (i.e. insertions, deletions and substitution errors) in the assembled contigs. Fundamental to the algorithm behind SEQuel is the positional de Bruijn graph, a graph structure that models k-mers within reads while incorporating the approximate positions of reads into the model. Results: SEQuel reduced the number of small insertions and deletions in the assemblies of standard multi-cell Escherichia coli data by almost half, and corrected between 30% and 94% of the substitution errors. Further, we show SEQuel is imperative to improving single-cell assembly, which is inherently more challenging due to higher error rates and non-uniform coverage; over half of the small indels, and substitution errors in the single-cell assemblies were corrected. We apply SEQuel to the recently assembled Deltaproteobacterium SAR324 genome, which is the first bacterial genome with a comprehensive single-cell genome assembly, and make over 800 changes (insertions, deletions and substitutions) to refine this assembly. Availability: SEQuel can be used as a post-processing step in combination with any NGS assembler and is freely available at http://bix.ucsd.edu/SEQuel/. Contact: ppevzner@cs.ucsd.edu Oxford University Press 2012-06-15 2012-06-09 /pmc/articles/PMC3371851/ /pubmed/22689760 http://dx.doi.org/10.1093/bioinformatics/bts219 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Ismb 2012 Proceedings Papers Committee July 15 to July 19, 2012, Long Beach, Ca, Usa
Ronen, Roy
Boucher, Christina
Chitsaz, Hamidreza
Pevzner, Pavel
SEQuel: improving the accuracy of genome assemblies
title SEQuel: improving the accuracy of genome assemblies
title_full SEQuel: improving the accuracy of genome assemblies
title_fullStr SEQuel: improving the accuracy of genome assemblies
title_full_unstemmed SEQuel: improving the accuracy of genome assemblies
title_short SEQuel: improving the accuracy of genome assemblies
title_sort sequel: improving the accuracy of genome assemblies
topic Ismb 2012 Proceedings Papers Committee July 15 to July 19, 2012, Long Beach, Ca, Usa
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3371851/
https://www.ncbi.nlm.nih.gov/pubmed/22689760
http://dx.doi.org/10.1093/bioinformatics/bts219
work_keys_str_mv AT ronenroy sequelimprovingtheaccuracyofgenomeassemblies
AT boucherchristina sequelimprovingtheaccuracyofgenomeassemblies
AT chitsazhamidreza sequelimprovingtheaccuracyofgenomeassemblies
AT pevznerpavel sequelimprovingtheaccuracyofgenomeassemblies