Cargando…

Efficient iterative Hi-C scaffolder based on N-best neighbors

BACKGROUND: Efficient and effective genome scaffolding tools are still in high demand for generating reference-quality assemblies. While long read data itself is unlikely to create a chromosome-scale assembly for most eukaryotic species, the inexpensive Hi-C sequencing technology, capable of capturi...

Descripción completa

Detalles Bibliográficos
Autores principales: Guan, Dengfeng, McCarthy, Shane A., Ning, Zemin, Wang, Guohua, Wang, Yadong, Durbin, Richard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8627104/
https://www.ncbi.nlm.nih.gov/pubmed/34837944
http://dx.doi.org/10.1186/s12859-021-04453-5
_version_ 1784606789883396096
author Guan, Dengfeng
McCarthy, Shane A.
Ning, Zemin
Wang, Guohua
Wang, Yadong
Durbin, Richard
author_facet Guan, Dengfeng
McCarthy, Shane A.
Ning, Zemin
Wang, Guohua
Wang, Yadong
Durbin, Richard
author_sort Guan, Dengfeng
collection PubMed
description BACKGROUND: Efficient and effective genome scaffolding tools are still in high demand for generating reference-quality assemblies. While long read data itself is unlikely to create a chromosome-scale assembly for most eukaryotic species, the inexpensive Hi-C sequencing technology, capable of capturing the chromosomal profile of a genome, is now widely used to complete the task. However, the existing Hi-C based scaffolding tools either require a priori chromosome number as input, or lack the ability to build highly continuous scaffolds. RESULTS: We design and develop a novel Hi-C based scaffolding tool, pin_hic, which takes advantage of contact information from Hi-C reads to construct a scaffolding graph iteratively based on N-best neighbors of contigs. Subsequent to scaffolding, it identifies potential misjoins and breaks them to keep the scaffolding accuracy. Through our tests on three long read based de novo assemblies from three different species, we demonstrate that pin_hic is more efficient than current standard state-of-art tools, and it can generate much more continuous scaffolds, while achieving a higher or comparable accuracy. CONCLUSIONS: Pin_hic is an efficient Hi-C based scaffolding tool, which can be useful for building chromosome-scale assemblies. As many sequencing projects have been launched in the recent years, we believe pin_hic has potential to be applied in these projects and makes a meaningful contribution. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04453-5.
format Online
Article
Text
id pubmed-8627104
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-86271042021-11-30 Efficient iterative Hi-C scaffolder based on N-best neighbors Guan, Dengfeng McCarthy, Shane A. Ning, Zemin Wang, Guohua Wang, Yadong Durbin, Richard BMC Bioinformatics Software BACKGROUND: Efficient and effective genome scaffolding tools are still in high demand for generating reference-quality assemblies. While long read data itself is unlikely to create a chromosome-scale assembly for most eukaryotic species, the inexpensive Hi-C sequencing technology, capable of capturing the chromosomal profile of a genome, is now widely used to complete the task. However, the existing Hi-C based scaffolding tools either require a priori chromosome number as input, or lack the ability to build highly continuous scaffolds. RESULTS: We design and develop a novel Hi-C based scaffolding tool, pin_hic, which takes advantage of contact information from Hi-C reads to construct a scaffolding graph iteratively based on N-best neighbors of contigs. Subsequent to scaffolding, it identifies potential misjoins and breaks them to keep the scaffolding accuracy. Through our tests on three long read based de novo assemblies from three different species, we demonstrate that pin_hic is more efficient than current standard state-of-art tools, and it can generate much more continuous scaffolds, while achieving a higher or comparable accuracy. CONCLUSIONS: Pin_hic is an efficient Hi-C based scaffolding tool, which can be useful for building chromosome-scale assemblies. As many sequencing projects have been launched in the recent years, we believe pin_hic has potential to be applied in these projects and makes a meaningful contribution. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04453-5. BioMed Central 2021-11-27 /pmc/articles/PMC8627104/ /pubmed/34837944 http://dx.doi.org/10.1186/s12859-021-04453-5 Text en © The Author(s) 2021, corrected publication 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Guan, Dengfeng
McCarthy, Shane A.
Ning, Zemin
Wang, Guohua
Wang, Yadong
Durbin, Richard
Efficient iterative Hi-C scaffolder based on N-best neighbors
title Efficient iterative Hi-C scaffolder based on N-best neighbors
title_full Efficient iterative Hi-C scaffolder based on N-best neighbors
title_fullStr Efficient iterative Hi-C scaffolder based on N-best neighbors
title_full_unstemmed Efficient iterative Hi-C scaffolder based on N-best neighbors
title_short Efficient iterative Hi-C scaffolder based on N-best neighbors
title_sort efficient iterative hi-c scaffolder based on n-best neighbors
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8627104/
https://www.ncbi.nlm.nih.gov/pubmed/34837944
http://dx.doi.org/10.1186/s12859-021-04453-5
work_keys_str_mv AT guandengfeng efficientiterativehicscaffolderbasedonnbestneighbors
AT mccarthyshanea efficientiterativehicscaffolderbasedonnbestneighbors
AT ningzemin efficientiterativehicscaffolderbasedonnbestneighbors
AT wangguohua efficientiterativehicscaffolderbasedonnbestneighbors
AT wangyadong efficientiterativehicscaffolderbasedonnbestneighbors
AT durbinrichard efficientiterativehicscaffolderbasedonnbestneighbors