Cargando…

SWALO: scaffolding with assembly likelihood optimization

Scaffolding, i.e. ordering and orienting contigs is an important step in genome assembly. We present a method for scaffolding using second generation sequencing reads based on likelihoods of genome assemblies. A generative model for sequencing is used to obtain maximum likelihood estimates of gaps b...

Descripción completa

Detalles Bibliográficos
Autores principales: Rahman, Atif, Pachter, Lior
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8599790/
https://www.ncbi.nlm.nih.gov/pubmed/34417615
http://dx.doi.org/10.1093/nar/gkab717
_version_ 1784601019988049920
author Rahman, Atif
Pachter, Lior
author_facet Rahman, Atif
Pachter, Lior
author_sort Rahman, Atif
collection PubMed
description Scaffolding, i.e. ordering and orienting contigs is an important step in genome assembly. We present a method for scaffolding using second generation sequencing reads based on likelihoods of genome assemblies. A generative model for sequencing is used to obtain maximum likelihood estimates of gaps between contigs and to estimate whether linking contigs into scaffolds would lead to an increase in the likelihood of the assembly. We then link contigs if they can be unambiguously joined or if the corresponding increase in likelihood is substantially greater than that of other possible joins of those contigs. The method is implemented in a tool called Swalo with approximations to make it efficient and applicable to large datasets. Analysis on real and simulated datasets reveals that it consistently makes more or similar number of correct joins as other scaffolders while linking very few contigs incorrectly, thus outperforming other scaffolders and demonstrating that substantial improvement in genome assembly may be achieved through the use of statistical models. Swalo is freely available for download at https://atifrahman.github.io/SWALO/.
format Online
Article
Text
id pubmed-8599790
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-85997902021-11-18 SWALO: scaffolding with assembly likelihood optimization Rahman, Atif Pachter, Lior Nucleic Acids Res Methods Online Scaffolding, i.e. ordering and orienting contigs is an important step in genome assembly. We present a method for scaffolding using second generation sequencing reads based on likelihoods of genome assemblies. A generative model for sequencing is used to obtain maximum likelihood estimates of gaps between contigs and to estimate whether linking contigs into scaffolds would lead to an increase in the likelihood of the assembly. We then link contigs if they can be unambiguously joined or if the corresponding increase in likelihood is substantially greater than that of other possible joins of those contigs. The method is implemented in a tool called Swalo with approximations to make it efficient and applicable to large datasets. Analysis on real and simulated datasets reveals that it consistently makes more or similar number of correct joins as other scaffolders while linking very few contigs incorrectly, thus outperforming other scaffolders and demonstrating that substantial improvement in genome assembly may be achieved through the use of statistical models. Swalo is freely available for download at https://atifrahman.github.io/SWALO/. Oxford University Press 2021-08-20 /pmc/articles/PMC8599790/ /pubmed/34417615 http://dx.doi.org/10.1093/nar/gkab717 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Online
Rahman, Atif
Pachter, Lior
SWALO: scaffolding with assembly likelihood optimization
title SWALO: scaffolding with assembly likelihood optimization
title_full SWALO: scaffolding with assembly likelihood optimization
title_fullStr SWALO: scaffolding with assembly likelihood optimization
title_full_unstemmed SWALO: scaffolding with assembly likelihood optimization
title_short SWALO: scaffolding with assembly likelihood optimization
title_sort swalo: scaffolding with assembly likelihood optimization
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8599790/
https://www.ncbi.nlm.nih.gov/pubmed/34417615
http://dx.doi.org/10.1093/nar/gkab717
work_keys_str_mv AT rahmanatif swaloscaffoldingwithassemblylikelihoodoptimization
AT pachterlior swaloscaffoldingwithassemblylikelihoodoptimization