Cargando…

Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement

Assembly of a high-quality genome is important for downstream comparative and functional genomic studies. However, most tools for genome assembly assessment only give qualitative reports, which do not pinpoint assembly errors at specific regions. Here, we develop a new reference-free tool, Clipping...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Kunpeng, Xu, Peng, Wang, Jinpeng, Yi, Xin, Jiao, Yuannian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10582259/
https://www.ncbi.nlm.nih.gov/pubmed/37848433
http://dx.doi.org/10.1038/s41467-023-42336-w
_version_ 1785122289607507968
author Li, Kunpeng
Xu, Peng
Wang, Jinpeng
Yi, Xin
Jiao, Yuannian
author_facet Li, Kunpeng
Xu, Peng
Wang, Jinpeng
Yi, Xin
Jiao, Yuannian
author_sort Li, Kunpeng
collection PubMed
description Assembly of a high-quality genome is important for downstream comparative and functional genomic studies. However, most tools for genome assembly assessment only give qualitative reports, which do not pinpoint assembly errors at specific regions. Here, we develop a new reference-free tool, Clipping information for Revealing Assembly Quality (CRAQ), which maps raw reads back to assembled sequences to identify regional and structural assembly errors based on effective clipped alignment information. Error counts are transformed into corresponding assembly evaluation indexes to reflect the assembly quality at single-nucleotide resolution. Notably, CRAQ distinguishes assembly errors from heterozygous sites or structural differences between haplotypes. This tool can clearly indicate low-quality regions and potential structural error breakpoints; thus, it can identify misjoined regions that should be split for further scaffold building and improvement of the assembly. We have benchmarked CRAQ on multiple genomes assembled using different strategies, and demonstrated the misjoin correction for improving the constructed pseudomolecules.
format Online
Article
Text
id pubmed-10582259
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-105822592023-10-19 Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement Li, Kunpeng Xu, Peng Wang, Jinpeng Yi, Xin Jiao, Yuannian Nat Commun Article Assembly of a high-quality genome is important for downstream comparative and functional genomic studies. However, most tools for genome assembly assessment only give qualitative reports, which do not pinpoint assembly errors at specific regions. Here, we develop a new reference-free tool, Clipping information for Revealing Assembly Quality (CRAQ), which maps raw reads back to assembled sequences to identify regional and structural assembly errors based on effective clipped alignment information. Error counts are transformed into corresponding assembly evaluation indexes to reflect the assembly quality at single-nucleotide resolution. Notably, CRAQ distinguishes assembly errors from heterozygous sites or structural differences between haplotypes. This tool can clearly indicate low-quality regions and potential structural error breakpoints; thus, it can identify misjoined regions that should be split for further scaffold building and improvement of the assembly. We have benchmarked CRAQ on multiple genomes assembled using different strategies, and demonstrated the misjoin correction for improving the constructed pseudomolecules. Nature Publishing Group UK 2023-10-17 /pmc/articles/PMC10582259/ /pubmed/37848433 http://dx.doi.org/10.1038/s41467-023-42336-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Li, Kunpeng
Xu, Peng
Wang, Jinpeng
Yi, Xin
Jiao, Yuannian
Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement
title Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement
title_full Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement
title_fullStr Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement
title_full_unstemmed Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement
title_short Identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement
title_sort identification of errors in draft genome assemblies at single-nucleotide resolution for quality assessment and improvement
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10582259/
https://www.ncbi.nlm.nih.gov/pubmed/37848433
http://dx.doi.org/10.1038/s41467-023-42336-w
work_keys_str_mv AT likunpeng identificationoferrorsindraftgenomeassembliesatsinglenucleotideresolutionforqualityassessmentandimprovement
AT xupeng identificationoferrorsindraftgenomeassembliesatsinglenucleotideresolutionforqualityassessmentandimprovement
AT wangjinpeng identificationoferrorsindraftgenomeassembliesatsinglenucleotideresolutionforqualityassessmentandimprovement
AT yixin identificationoferrorsindraftgenomeassembliesatsinglenucleotideresolutionforqualityassessmentandimprovement
AT jiaoyuannian identificationoferrorsindraftgenomeassembliesatsinglenucleotideresolutionforqualityassessmentandimprovement