Cargando…

Assessing species coverage and assembly quality of rapidly accumulating sequenced genomes

BACKGROUND: Ambitious initiatives to coordinate genome sequencing of Earth's biodiversity mean that the accumulation of genomic data is growing rapidly. In addition to cataloguing biodiversity, these data provide the basis for understanding biological function and evolution. Accurate and comple...

Descripción completa

Detalles Bibliográficos
Autores principales: Feron, Romain, Waterhouse, Robert M
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8881204/
https://www.ncbi.nlm.nih.gov/pubmed/35217859
http://dx.doi.org/10.1093/gigascience/giac006
_version_ 1784659416283348992
author Feron, Romain
Waterhouse, Robert M
author_facet Feron, Romain
Waterhouse, Robert M
author_sort Feron, Romain
collection PubMed
description BACKGROUND: Ambitious initiatives to coordinate genome sequencing of Earth's biodiversity mean that the accumulation of genomic data is growing rapidly. In addition to cataloguing biodiversity, these data provide the basis for understanding biological function and evolution. Accurate and complete genome assemblies offer a comprehensive and reliable foundation upon which to advance our understanding of organismal biology at genetic, species, and ecosystem levels. However, ever-changing sequencing technologies and analysis methods mean that available data are often heterogeneous in quality. To guide forthcoming genome generation efforts and promote efficient prioritization of resources, it is thus essential to define and monitor taxonomic coverage and quality of the data. FINDINGS: Here we present an automated analysis workflow that surveys genome assemblies from the United States NCBI, assesses their completeness using the relevant BUSCO datasets, and collates the results into an interactively browsable resource. We apply our workflow to produce a community resource of available assemblies from the phylum Arthropoda, the Arthropoda Assembly Assessment Catalogue. Using this resource, we survey current taxonomic coverage and assembly quality at the NCBI, examine how key assembly metrics relate to gene content completeness, and compare results from using different BUSCO lineage datasets. CONCLUSIONS: These results demonstrate how the workflow can be used to build a community resource that enables large-scale assessments to survey species coverage and data quality of available genome assemblies, and to guide prioritizations for ongoing and future sampling, sequencing, and genome generation initiatives.
format Online
Article
Text
id pubmed-8881204
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-88812042022-02-28 Assessing species coverage and assembly quality of rapidly accumulating sequenced genomes Feron, Romain Waterhouse, Robert M Gigascience Technical Note BACKGROUND: Ambitious initiatives to coordinate genome sequencing of Earth's biodiversity mean that the accumulation of genomic data is growing rapidly. In addition to cataloguing biodiversity, these data provide the basis for understanding biological function and evolution. Accurate and complete genome assemblies offer a comprehensive and reliable foundation upon which to advance our understanding of organismal biology at genetic, species, and ecosystem levels. However, ever-changing sequencing technologies and analysis methods mean that available data are often heterogeneous in quality. To guide forthcoming genome generation efforts and promote efficient prioritization of resources, it is thus essential to define and monitor taxonomic coverage and quality of the data. FINDINGS: Here we present an automated analysis workflow that surveys genome assemblies from the United States NCBI, assesses their completeness using the relevant BUSCO datasets, and collates the results into an interactively browsable resource. We apply our workflow to produce a community resource of available assemblies from the phylum Arthropoda, the Arthropoda Assembly Assessment Catalogue. Using this resource, we survey current taxonomic coverage and assembly quality at the NCBI, examine how key assembly metrics relate to gene content completeness, and compare results from using different BUSCO lineage datasets. CONCLUSIONS: These results demonstrate how the workflow can be used to build a community resource that enables large-scale assessments to survey species coverage and data quality of available genome assemblies, and to guide prioritizations for ongoing and future sampling, sequencing, and genome generation initiatives. Oxford University Press 2022-02-25 /pmc/articles/PMC8881204/ /pubmed/35217859 http://dx.doi.org/10.1093/gigascience/giac006 Text en © The Author(s) 2022. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Feron, Romain
Waterhouse, Robert M
Assessing species coverage and assembly quality of rapidly accumulating sequenced genomes
title Assessing species coverage and assembly quality of rapidly accumulating sequenced genomes
title_full Assessing species coverage and assembly quality of rapidly accumulating sequenced genomes
title_fullStr Assessing species coverage and assembly quality of rapidly accumulating sequenced genomes
title_full_unstemmed Assessing species coverage and assembly quality of rapidly accumulating sequenced genomes
title_short Assessing species coverage and assembly quality of rapidly accumulating sequenced genomes
title_sort assessing species coverage and assembly quality of rapidly accumulating sequenced genomes
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8881204/
https://www.ncbi.nlm.nih.gov/pubmed/35217859
http://dx.doi.org/10.1093/gigascience/giac006
work_keys_str_mv AT feronromain assessingspeciescoverageandassemblyqualityofrapidlyaccumulatingsequencedgenomes
AT waterhouserobertm assessingspeciescoverageandassemblyqualityofrapidlyaccumulatingsequencedgenomes