Cargando…

LDscaff: LD-based scaffolding of de novo genome assemblies

BACKGROUND: Genome assembly is fundamental for de novo genome analysis. Hybrid assembly, utilizing various sequencing technologies increases both contiguity and accuracy. While such approaches require extra costly sequencing efforts, the information provided millions of existed whole-genome sequenci...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Zicheng, Zhou, Yingxiao, Wang, Shuai, Zhang, Xiuqing, Wang, Changfa, Li, Shuaicheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7768660/
https://www.ncbi.nlm.nih.gov/pubmed/33371875
http://dx.doi.org/10.1186/s12859-020-03895-7
_version_ 1783629203003932672
author Zhao, Zicheng
Zhou, Yingxiao
Wang, Shuai
Zhang, Xiuqing
Wang, Changfa
Li, Shuaicheng
author_facet Zhao, Zicheng
Zhou, Yingxiao
Wang, Shuai
Zhang, Xiuqing
Wang, Changfa
Li, Shuaicheng
author_sort Zhao, Zicheng
collection PubMed
description BACKGROUND: Genome assembly is fundamental for de novo genome analysis. Hybrid assembly, utilizing various sequencing technologies increases both contiguity and accuracy. While such approaches require extra costly sequencing efforts, the information provided millions of existed whole-genome sequencing data have not been fully utilized to resolve the task of scaffolding. Genetic recombination patterns in population data indicate non-random association among alleles at different loci, can provide physical distance signals to guide scaffolding. RESULTS: In this paper, we propose LDscaff for draft genome assembly incorporating linkage disequilibrium information in population data. We evaluated the performance of our method with both simulated data and real data. We simulated scaffolds by splitting the pig reference genome and reassembled them. Gaps between scaffolds were introduced ranging from 0 to 100 KB. The genome misassembly rate is 2.43% when there is no gap. Then we implemented our method to refine the Giant Panda genome and the donkey genome, which are purely assembled by NGS data. After LDscaff treatment, the resulting Panda assembly has scaffold N50 of 3.6 MB, 2.5 times larger than the original N50 (1.3 MB). The re-assembled donkey assembly has an improved N50 length of 32.1 MB from 23.8 MB. CONCLUSIONS: Our method effectively improves the assemblies with existed re-sequencing data, and is an potential alternative to the existing assemblers required for the collection of new data.
format Online
Article
Text
id pubmed-7768660
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-77686602020-12-29 LDscaff: LD-based scaffolding of de novo genome assemblies Zhao, Zicheng Zhou, Yingxiao Wang, Shuai Zhang, Xiuqing Wang, Changfa Li, Shuaicheng BMC Bioinformatics Software BACKGROUND: Genome assembly is fundamental for de novo genome analysis. Hybrid assembly, utilizing various sequencing technologies increases both contiguity and accuracy. While such approaches require extra costly sequencing efforts, the information provided millions of existed whole-genome sequencing data have not been fully utilized to resolve the task of scaffolding. Genetic recombination patterns in population data indicate non-random association among alleles at different loci, can provide physical distance signals to guide scaffolding. RESULTS: In this paper, we propose LDscaff for draft genome assembly incorporating linkage disequilibrium information in population data. We evaluated the performance of our method with both simulated data and real data. We simulated scaffolds by splitting the pig reference genome and reassembled them. Gaps between scaffolds were introduced ranging from 0 to 100 KB. The genome misassembly rate is 2.43% when there is no gap. Then we implemented our method to refine the Giant Panda genome and the donkey genome, which are purely assembled by NGS data. After LDscaff treatment, the resulting Panda assembly has scaffold N50 of 3.6 MB, 2.5 times larger than the original N50 (1.3 MB). The re-assembled donkey assembly has an improved N50 length of 32.1 MB from 23.8 MB. CONCLUSIONS: Our method effectively improves the assemblies with existed re-sequencing data, and is an potential alternative to the existing assemblers required for the collection of new data. BioMed Central 2020-12-28 /pmc/articles/PMC7768660/ /pubmed/33371875 http://dx.doi.org/10.1186/s12859-020-03895-7 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Zhao, Zicheng
Zhou, Yingxiao
Wang, Shuai
Zhang, Xiuqing
Wang, Changfa
Li, Shuaicheng
LDscaff: LD-based scaffolding of de novo genome assemblies
title LDscaff: LD-based scaffolding of de novo genome assemblies
title_full LDscaff: LD-based scaffolding of de novo genome assemblies
title_fullStr LDscaff: LD-based scaffolding of de novo genome assemblies
title_full_unstemmed LDscaff: LD-based scaffolding of de novo genome assemblies
title_short LDscaff: LD-based scaffolding of de novo genome assemblies
title_sort ldscaff: ld-based scaffolding of de novo genome assemblies
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7768660/
https://www.ncbi.nlm.nih.gov/pubmed/33371875
http://dx.doi.org/10.1186/s12859-020-03895-7
work_keys_str_mv AT zhaozicheng ldscaffldbasedscaffoldingofdenovogenomeassemblies
AT zhouyingxiao ldscaffldbasedscaffoldingofdenovogenomeassemblies
AT wangshuai ldscaffldbasedscaffoldingofdenovogenomeassemblies
AT zhangxiuqing ldscaffldbasedscaffoldingofdenovogenomeassemblies
AT wangchangfa ldscaffldbasedscaffoldingofdenovogenomeassemblies
AT lishuaicheng ldscaffldbasedscaffoldingofdenovogenomeassemblies