Cargando…

compleasm: a faster and more accurate reimplementation of BUSCO

MOTIVATION: Evaluating the gene completeness is critical to measuring the quality of a genome assembly. An incomplete assembly can lead to errors in gene predictions, annotation, and other downstream analyses. Benchmarking Universal Single-Copy Orthologs (BUSCO) is a widely used tool for assessing t...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Neng, Li, Heng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10558035/
https://www.ncbi.nlm.nih.gov/pubmed/37758247
http://dx.doi.org/10.1093/bioinformatics/btad595
_version_ 1785117199787098112
author Huang, Neng
Li, Heng
author_facet Huang, Neng
Li, Heng
author_sort Huang, Neng
collection PubMed
description MOTIVATION: Evaluating the gene completeness is critical to measuring the quality of a genome assembly. An incomplete assembly can lead to errors in gene predictions, annotation, and other downstream analyses. Benchmarking Universal Single-Copy Orthologs (BUSCO) is a widely used tool for assessing the completeness of genome assembly by testing the presence of a set of single-copy orthologs conserved across a wide range of taxa. However, BUSCO is slow particularly for large genome assemblies. It is cumbersome to apply BUSCO to a large number of assemblies. RESULTS: Here, we present compleasm, an efficient tool for assessing the completeness of genome assemblies. Compleasm utilizes the miniprot protein-to-genome aligner and the conserved orthologous genes from BUSCO. It is 14 times faster than BUSCO for human assemblies and reports a more accurate completeness of 99.6% than BUSCO’s 95.7%, which is in close agreement with the annotation completeness of 99.5% for T2T-CHM13. AVAILABILITY AND IMPLEMENTATION: https://github.com/huangnengCSU/compleasm.
format Online
Article
Text
id pubmed-10558035
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-105580352023-10-07 compleasm: a faster and more accurate reimplementation of BUSCO Huang, Neng Li, Heng Bioinformatics Applications Note MOTIVATION: Evaluating the gene completeness is critical to measuring the quality of a genome assembly. An incomplete assembly can lead to errors in gene predictions, annotation, and other downstream analyses. Benchmarking Universal Single-Copy Orthologs (BUSCO) is a widely used tool for assessing the completeness of genome assembly by testing the presence of a set of single-copy orthologs conserved across a wide range of taxa. However, BUSCO is slow particularly for large genome assemblies. It is cumbersome to apply BUSCO to a large number of assemblies. RESULTS: Here, we present compleasm, an efficient tool for assessing the completeness of genome assemblies. Compleasm utilizes the miniprot protein-to-genome aligner and the conserved orthologous genes from BUSCO. It is 14 times faster than BUSCO for human assemblies and reports a more accurate completeness of 99.6% than BUSCO’s 95.7%, which is in close agreement with the annotation completeness of 99.5% for T2T-CHM13. AVAILABILITY AND IMPLEMENTATION: https://github.com/huangnengCSU/compleasm. Oxford University Press 2023-09-27 /pmc/articles/PMC10558035/ /pubmed/37758247 http://dx.doi.org/10.1093/bioinformatics/btad595 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Note
Huang, Neng
Li, Heng
compleasm: a faster and more accurate reimplementation of BUSCO
title compleasm: a faster and more accurate reimplementation of BUSCO
title_full compleasm: a faster and more accurate reimplementation of BUSCO
title_fullStr compleasm: a faster and more accurate reimplementation of BUSCO
title_full_unstemmed compleasm: a faster and more accurate reimplementation of BUSCO
title_short compleasm: a faster and more accurate reimplementation of BUSCO
title_sort compleasm: a faster and more accurate reimplementation of busco
topic Applications Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10558035/
https://www.ncbi.nlm.nih.gov/pubmed/37758247
http://dx.doi.org/10.1093/bioinformatics/btad595
work_keys_str_mv AT huangneng compleasmafasterandmoreaccuratereimplementationofbusco
AT liheng compleasmafasterandmoreaccuratereimplementationofbusco