Cargando…

Kermit: linkage map guided long read assembly

BACKGROUND : With long reads getting even longer and cheaper, large scale sequencing projects can be accomplished without short reads at an affordable cost. Due to the high error rates and less mature tools, de novo assembly of long reads is still challenging and often results in a large collection...

Descripción completa

Detalles Bibliográficos
Autores principales: Walve, Riku, Rastas, Pasi, Salmela, Leena
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6425630/
https://www.ncbi.nlm.nih.gov/pubmed/30930956
http://dx.doi.org/10.1186/s13015-019-0143-x
_version_ 1783404875810340864
author Walve, Riku
Rastas, Pasi
Salmela, Leena
author_facet Walve, Riku
Rastas, Pasi
Salmela, Leena
author_sort Walve, Riku
collection PubMed
description BACKGROUND : With long reads getting even longer and cheaper, large scale sequencing projects can be accomplished without short reads at an affordable cost. Due to the high error rates and less mature tools, de novo assembly of long reads is still challenging and often results in a large collection of contigs. Dense linkage maps are collections of markers whose location on the genome is approximately known. Therefore they provide long range information that has the potential to greatly aid in de novo assembly. Previously linkage maps have been used to detect misassemblies and to manually order contigs. However, no fully automated tools exist to incorporate linkage maps in assembly but instead large amounts of manual labour is needed to order the contigs into chromosomes. RESULTS : We formulate the genome assembly problem in the presence of linkage maps and present the first method for guided genome assembly using linkage maps. Our method is based on an additional cleaning step added to the assembly. We show that it can simplify the underlying assembly graph, resulting in more contiguous assemblies and reducing the amount of misassemblies when compared to de novo assembly. CONCLUSIONS : We present the first method to integrate linkage maps directly into genome assembly. With a modest increase in runtime, our method improves contiguity and correctness of genome assembly.
format Online
Article
Text
id pubmed-6425630
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64256302019-03-29 Kermit: linkage map guided long read assembly Walve, Riku Rastas, Pasi Salmela, Leena Algorithms Mol Biol Research BACKGROUND : With long reads getting even longer and cheaper, large scale sequencing projects can be accomplished without short reads at an affordable cost. Due to the high error rates and less mature tools, de novo assembly of long reads is still challenging and often results in a large collection of contigs. Dense linkage maps are collections of markers whose location on the genome is approximately known. Therefore they provide long range information that has the potential to greatly aid in de novo assembly. Previously linkage maps have been used to detect misassemblies and to manually order contigs. However, no fully automated tools exist to incorporate linkage maps in assembly but instead large amounts of manual labour is needed to order the contigs into chromosomes. RESULTS : We formulate the genome assembly problem in the presence of linkage maps and present the first method for guided genome assembly using linkage maps. Our method is based on an additional cleaning step added to the assembly. We show that it can simplify the underlying assembly graph, resulting in more contiguous assemblies and reducing the amount of misassemblies when compared to de novo assembly. CONCLUSIONS : We present the first method to integrate linkage maps directly into genome assembly. With a modest increase in runtime, our method improves contiguity and correctness of genome assembly. BioMed Central 2019-03-20 /pmc/articles/PMC6425630/ /pubmed/30930956 http://dx.doi.org/10.1186/s13015-019-0143-x Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Walve, Riku
Rastas, Pasi
Salmela, Leena
Kermit: linkage map guided long read assembly
title Kermit: linkage map guided long read assembly
title_full Kermit: linkage map guided long read assembly
title_fullStr Kermit: linkage map guided long read assembly
title_full_unstemmed Kermit: linkage map guided long read assembly
title_short Kermit: linkage map guided long read assembly
title_sort kermit: linkage map guided long read assembly
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6425630/
https://www.ncbi.nlm.nih.gov/pubmed/30930956
http://dx.doi.org/10.1186/s13015-019-0143-x
work_keys_str_mv AT walveriku kermitlinkagemapguidedlongreadassembly
AT rastaspasi kermitlinkagemapguidedlongreadassembly
AT salmelaleena kermitlinkagemapguidedlongreadassembly