Cargando…

Scaffolding of long read assemblies using long range contact information

BACKGROUND: Long read technologies have revolutionized de novo genome assembly by generating contigs orders of magnitude longer than that of short read assemblies. Although assembly contiguity has increased, it usually does not reconstruct a full chromosome or an arm of the chromosome, resulting in...

Descripción completa

Detalles Bibliográficos
Autores principales: Ghurye, Jay, Pop, Mihai, Koren, Sergey, Bickhart, Derek, Chin, Chen-Shan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5508778/
https://www.ncbi.nlm.nih.gov/pubmed/28701198
http://dx.doi.org/10.1186/s12864-017-3879-z
_version_ 1783249931566317568
author Ghurye, Jay
Pop, Mihai
Koren, Sergey
Bickhart, Derek
Chin, Chen-Shan
author_facet Ghurye, Jay
Pop, Mihai
Koren, Sergey
Bickhart, Derek
Chin, Chen-Shan
author_sort Ghurye, Jay
collection PubMed
description BACKGROUND: Long read technologies have revolutionized de novo genome assembly by generating contigs orders of magnitude longer than that of short read assemblies. Although assembly contiguity has increased, it usually does not reconstruct a full chromosome or an arm of the chromosome, resulting in an unfinished chromosome level assembly. To increase the contiguity of the assembly to the chromosome level, different strategies are used which exploit long range contact information between chromosomes in the genome. METHODS: We develop a scalable and computationally efficient scaffolding method that can boost the assembly contiguity to a large extent using genome-wide chromatin interaction data such as Hi-C. RESULTS: we demonstrate an algorithm that uses Hi-C data for longer-range scaffolding of de novo long read genome assemblies. We tested our methods on the human and goat genome assemblies. We compare our scaffolds with the scaffolds generated by LACHESIS based on various metrics. CONCLUSION: Our new algorithm SALSA produces more accurate scaffolds compared to the existing state of the art method LACHESIS. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3879-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5508778
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55087782017-07-17 Scaffolding of long read assemblies using long range contact information Ghurye, Jay Pop, Mihai Koren, Sergey Bickhart, Derek Chin, Chen-Shan BMC Genomics Methodology Article BACKGROUND: Long read technologies have revolutionized de novo genome assembly by generating contigs orders of magnitude longer than that of short read assemblies. Although assembly contiguity has increased, it usually does not reconstruct a full chromosome or an arm of the chromosome, resulting in an unfinished chromosome level assembly. To increase the contiguity of the assembly to the chromosome level, different strategies are used which exploit long range contact information between chromosomes in the genome. METHODS: We develop a scalable and computationally efficient scaffolding method that can boost the assembly contiguity to a large extent using genome-wide chromatin interaction data such as Hi-C. RESULTS: we demonstrate an algorithm that uses Hi-C data for longer-range scaffolding of de novo long read genome assemblies. We tested our methods on the human and goat genome assemblies. We compare our scaffolds with the scaffolds generated by LACHESIS based on various metrics. CONCLUSION: Our new algorithm SALSA produces more accurate scaffolds compared to the existing state of the art method LACHESIS. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3879-z) contains supplementary material, which is available to authorized users. BioMed Central 2017-07-12 /pmc/articles/PMC5508778/ /pubmed/28701198 http://dx.doi.org/10.1186/s12864-017-3879-z Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Ghurye, Jay
Pop, Mihai
Koren, Sergey
Bickhart, Derek
Chin, Chen-Shan
Scaffolding of long read assemblies using long range contact information
title Scaffolding of long read assemblies using long range contact information
title_full Scaffolding of long read assemblies using long range contact information
title_fullStr Scaffolding of long read assemblies using long range contact information
title_full_unstemmed Scaffolding of long read assemblies using long range contact information
title_short Scaffolding of long read assemblies using long range contact information
title_sort scaffolding of long read assemblies using long range contact information
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5508778/
https://www.ncbi.nlm.nih.gov/pubmed/28701198
http://dx.doi.org/10.1186/s12864-017-3879-z
work_keys_str_mv AT ghuryejay scaffoldingoflongreadassembliesusinglongrangecontactinformation
AT popmihai scaffoldingoflongreadassembliesusinglongrangecontactinformation
AT korensergey scaffoldingoflongreadassembliesusinglongrangecontactinformation
AT bickhartderek scaffoldingoflongreadassembliesusinglongrangecontactinformation
AT chinchenshan scaffoldingoflongreadassembliesusinglongrangecontactinformation