Cargando…
SEQuel: improving the accuracy of genome assemblies
Motivation: Assemblies of next-generation sequencing (NGS) data, although accurate, still contain a substantial number of errors that need to be corrected after the assembly process. We develop SEQuel, a tool that corrects errors (i.e. insertions, deletions and substitution errors) in the assembled...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3371851/ https://www.ncbi.nlm.nih.gov/pubmed/22689760 http://dx.doi.org/10.1093/bioinformatics/bts219 |
_version_ | 1782235270460997632 |
---|---|
author | Ronen, Roy Boucher, Christina Chitsaz, Hamidreza Pevzner, Pavel |
author_facet | Ronen, Roy Boucher, Christina Chitsaz, Hamidreza Pevzner, Pavel |
author_sort | Ronen, Roy |
collection | PubMed |
description | Motivation: Assemblies of next-generation sequencing (NGS) data, although accurate, still contain a substantial number of errors that need to be corrected after the assembly process. We develop SEQuel, a tool that corrects errors (i.e. insertions, deletions and substitution errors) in the assembled contigs. Fundamental to the algorithm behind SEQuel is the positional de Bruijn graph, a graph structure that models k-mers within reads while incorporating the approximate positions of reads into the model. Results: SEQuel reduced the number of small insertions and deletions in the assemblies of standard multi-cell Escherichia coli data by almost half, and corrected between 30% and 94% of the substitution errors. Further, we show SEQuel is imperative to improving single-cell assembly, which is inherently more challenging due to higher error rates and non-uniform coverage; over half of the small indels, and substitution errors in the single-cell assemblies were corrected. We apply SEQuel to the recently assembled Deltaproteobacterium SAR324 genome, which is the first bacterial genome with a comprehensive single-cell genome assembly, and make over 800 changes (insertions, deletions and substitutions) to refine this assembly. Availability: SEQuel can be used as a post-processing step in combination with any NGS assembler and is freely available at http://bix.ucsd.edu/SEQuel/. Contact: ppevzner@cs.ucsd.edu |
format | Online Article Text |
id | pubmed-3371851 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-33718512012-06-11 SEQuel: improving the accuracy of genome assemblies Ronen, Roy Boucher, Christina Chitsaz, Hamidreza Pevzner, Pavel Bioinformatics Ismb 2012 Proceedings Papers Committee July 15 to July 19, 2012, Long Beach, Ca, Usa Motivation: Assemblies of next-generation sequencing (NGS) data, although accurate, still contain a substantial number of errors that need to be corrected after the assembly process. We develop SEQuel, a tool that corrects errors (i.e. insertions, deletions and substitution errors) in the assembled contigs. Fundamental to the algorithm behind SEQuel is the positional de Bruijn graph, a graph structure that models k-mers within reads while incorporating the approximate positions of reads into the model. Results: SEQuel reduced the number of small insertions and deletions in the assemblies of standard multi-cell Escherichia coli data by almost half, and corrected between 30% and 94% of the substitution errors. Further, we show SEQuel is imperative to improving single-cell assembly, which is inherently more challenging due to higher error rates and non-uniform coverage; over half of the small indels, and substitution errors in the single-cell assemblies were corrected. We apply SEQuel to the recently assembled Deltaproteobacterium SAR324 genome, which is the first bacterial genome with a comprehensive single-cell genome assembly, and make over 800 changes (insertions, deletions and substitutions) to refine this assembly. Availability: SEQuel can be used as a post-processing step in combination with any NGS assembler and is freely available at http://bix.ucsd.edu/SEQuel/. Contact: ppevzner@cs.ucsd.edu Oxford University Press 2012-06-15 2012-06-09 /pmc/articles/PMC3371851/ /pubmed/22689760 http://dx.doi.org/10.1093/bioinformatics/bts219 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Ismb 2012 Proceedings Papers Committee July 15 to July 19, 2012, Long Beach, Ca, Usa Ronen, Roy Boucher, Christina Chitsaz, Hamidreza Pevzner, Pavel SEQuel: improving the accuracy of genome assemblies |
title | SEQuel: improving the accuracy of genome assemblies |
title_full | SEQuel: improving the accuracy of genome assemblies |
title_fullStr | SEQuel: improving the accuracy of genome assemblies |
title_full_unstemmed | SEQuel: improving the accuracy of genome assemblies |
title_short | SEQuel: improving the accuracy of genome assemblies |
title_sort | sequel: improving the accuracy of genome assemblies |
topic | Ismb 2012 Proceedings Papers Committee July 15 to July 19, 2012, Long Beach, Ca, Usa |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3371851/ https://www.ncbi.nlm.nih.gov/pubmed/22689760 http://dx.doi.org/10.1093/bioinformatics/bts219 |
work_keys_str_mv | AT ronenroy sequelimprovingtheaccuracyofgenomeassemblies AT boucherchristina sequelimprovingtheaccuracyofgenomeassemblies AT chitsazhamidreza sequelimprovingtheaccuracyofgenomeassemblies AT pevznerpavel sequelimprovingtheaccuracyofgenomeassemblies |