Cargando…

Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life

As genomic data transform our understanding of biodiversity, the Earth BioGenome Project (EBP) has set a goal of generating reference quality genome assemblies for all ~1.9 million described eukaryotic taxa. Meeting this goal requires coordination among many individual regional and taxon-focussed pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Challis, Richard, Kumar, Sujai, Sotero-Caio, Cibele, Brown, Max, Blaxter, Mark
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9971660/
https://www.ncbi.nlm.nih.gov/pubmed/36864925
http://dx.doi.org/10.12688/wellcomeopenres.18658.1
_version_ 1784898144855654400
author Challis, Richard
Kumar, Sujai
Sotero-Caio, Cibele
Brown, Max
Blaxter, Mark
author_facet Challis, Richard
Kumar, Sujai
Sotero-Caio, Cibele
Brown, Max
Blaxter, Mark
author_sort Challis, Richard
collection PubMed
description As genomic data transform our understanding of biodiversity, the Earth BioGenome Project (EBP) has set a goal of generating reference quality genome assemblies for all ~1.9 million described eukaryotic taxa. Meeting this goal requires coordination among many individual regional and taxon-focussed projects working under the EBP umbrella. Large-scale sequencing projects require ready access to validated genome-relevant metadata, such as genome sizes and karyotypes, but these data are dispersed across the literature, and directly measured values are lacking for most taxa. To meet these needs, we have developed Genomes on a Tree (GoaT), an Elasticsearch-powered datastore and search index for genome-relevant metadata and sequencing project plans and statuses. GoaT indexes publicly available metadata for all eukaryotic species and interpolates missing values through phylogenetic comparison. GoaT also holds target priority and sequencing status information for many projects affiliated to the EBP to aid project coordination. Metadata and status attributes in GoaT can be queried through a mature API, a web front end, and a command line interface. The web front end additionally provides summary visualisations for data exploration and reporting (see https://goat.genomehubs.org). GoaT currently holds direct or estimated values for over 70 taxon attributes and over 30 assembly attributes across 1.5 million eukaryotic species. The depth and breadth of curated data, frequent updates, and a versatile query interface make GoaT a powerful data aggregator and portal to explore and report underlying data for the eukaryotic tree of life. We illustrate this utility through a series of use cases from planning through to completion of a genome-sequencing project.   
format Online
Article
Text
id pubmed-9971660
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-99716602023-03-01 Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life Challis, Richard Kumar, Sujai Sotero-Caio, Cibele Brown, Max Blaxter, Mark Wellcome Open Res Software Tool Article As genomic data transform our understanding of biodiversity, the Earth BioGenome Project (EBP) has set a goal of generating reference quality genome assemblies for all ~1.9 million described eukaryotic taxa. Meeting this goal requires coordination among many individual regional and taxon-focussed projects working under the EBP umbrella. Large-scale sequencing projects require ready access to validated genome-relevant metadata, such as genome sizes and karyotypes, but these data are dispersed across the literature, and directly measured values are lacking for most taxa. To meet these needs, we have developed Genomes on a Tree (GoaT), an Elasticsearch-powered datastore and search index for genome-relevant metadata and sequencing project plans and statuses. GoaT indexes publicly available metadata for all eukaryotic species and interpolates missing values through phylogenetic comparison. GoaT also holds target priority and sequencing status information for many projects affiliated to the EBP to aid project coordination. Metadata and status attributes in GoaT can be queried through a mature API, a web front end, and a command line interface. The web front end additionally provides summary visualisations for data exploration and reporting (see https://goat.genomehubs.org). GoaT currently holds direct or estimated values for over 70 taxon attributes and over 30 assembly attributes across 1.5 million eukaryotic species. The depth and breadth of curated data, frequent updates, and a versatile query interface make GoaT a powerful data aggregator and portal to explore and report underlying data for the eukaryotic tree of life. We illustrate this utility through a series of use cases from planning through to completion of a genome-sequencing project.    F1000 Research Limited 2023-01-17 /pmc/articles/PMC9971660/ /pubmed/36864925 http://dx.doi.org/10.12688/wellcomeopenres.18658.1 Text en Copyright: © 2023 Challis R et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Tool Article
Challis, Richard
Kumar, Sujai
Sotero-Caio, Cibele
Brown, Max
Blaxter, Mark
Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life
title Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life
title_full Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life
title_fullStr Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life
title_full_unstemmed Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life
title_short Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life
title_sort genomes on a tree (goat): a versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life
topic Software Tool Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9971660/
https://www.ncbi.nlm.nih.gov/pubmed/36864925
http://dx.doi.org/10.12688/wellcomeopenres.18658.1
work_keys_str_mv AT challisrichard genomesonatreegoataversatilescalablesearchengineforgenomicandsequencingprojectmetadataacrosstheeukaryotictreeoflife
AT kumarsujai genomesonatreegoataversatilescalablesearchengineforgenomicandsequencingprojectmetadataacrosstheeukaryotictreeoflife
AT soterocaiocibele genomesonatreegoataversatilescalablesearchengineforgenomicandsequencingprojectmetadataacrosstheeukaryotictreeoflife
AT brownmax genomesonatreegoataversatilescalablesearchengineforgenomicandsequencingprojectmetadataacrosstheeukaryotictreeoflife
AT blaxtermark genomesonatreegoataversatilescalablesearchengineforgenomicandsequencingprojectmetadataacrosstheeukaryotictreeoflife