Cargando…
Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life
As genomic data transform our understanding of biodiversity, the Earth BioGenome Project (EBP) has set a goal of generating reference quality genome assemblies for all ~1.9 million described eukaryotic taxa. Meeting this goal requires coordination among many individual regional and taxon-focussed pr...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
F1000 Research Limited
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9971660/ https://www.ncbi.nlm.nih.gov/pubmed/36864925 http://dx.doi.org/10.12688/wellcomeopenres.18658.1 |
_version_ | 1784898144855654400 |
---|---|
author | Challis, Richard Kumar, Sujai Sotero-Caio, Cibele Brown, Max Blaxter, Mark |
author_facet | Challis, Richard Kumar, Sujai Sotero-Caio, Cibele Brown, Max Blaxter, Mark |
author_sort | Challis, Richard |
collection | PubMed |
description | As genomic data transform our understanding of biodiversity, the Earth BioGenome Project (EBP) has set a goal of generating reference quality genome assemblies for all ~1.9 million described eukaryotic taxa. Meeting this goal requires coordination among many individual regional and taxon-focussed projects working under the EBP umbrella. Large-scale sequencing projects require ready access to validated genome-relevant metadata, such as genome sizes and karyotypes, but these data are dispersed across the literature, and directly measured values are lacking for most taxa. To meet these needs, we have developed Genomes on a Tree (GoaT), an Elasticsearch-powered datastore and search index for genome-relevant metadata and sequencing project plans and statuses. GoaT indexes publicly available metadata for all eukaryotic species and interpolates missing values through phylogenetic comparison. GoaT also holds target priority and sequencing status information for many projects affiliated to the EBP to aid project coordination. Metadata and status attributes in GoaT can be queried through a mature API, a web front end, and a command line interface. The web front end additionally provides summary visualisations for data exploration and reporting (see https://goat.genomehubs.org). GoaT currently holds direct or estimated values for over 70 taxon attributes and over 30 assembly attributes across 1.5 million eukaryotic species. The depth and breadth of curated data, frequent updates, and a versatile query interface make GoaT a powerful data aggregator and portal to explore and report underlying data for the eukaryotic tree of life. We illustrate this utility through a series of use cases from planning through to completion of a genome-sequencing project. |
format | Online Article Text |
id | pubmed-9971660 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | F1000 Research Limited |
record_format | MEDLINE/PubMed |
spelling | pubmed-99716602023-03-01 Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life Challis, Richard Kumar, Sujai Sotero-Caio, Cibele Brown, Max Blaxter, Mark Wellcome Open Res Software Tool Article As genomic data transform our understanding of biodiversity, the Earth BioGenome Project (EBP) has set a goal of generating reference quality genome assemblies for all ~1.9 million described eukaryotic taxa. Meeting this goal requires coordination among many individual regional and taxon-focussed projects working under the EBP umbrella. Large-scale sequencing projects require ready access to validated genome-relevant metadata, such as genome sizes and karyotypes, but these data are dispersed across the literature, and directly measured values are lacking for most taxa. To meet these needs, we have developed Genomes on a Tree (GoaT), an Elasticsearch-powered datastore and search index for genome-relevant metadata and sequencing project plans and statuses. GoaT indexes publicly available metadata for all eukaryotic species and interpolates missing values through phylogenetic comparison. GoaT also holds target priority and sequencing status information for many projects affiliated to the EBP to aid project coordination. Metadata and status attributes in GoaT can be queried through a mature API, a web front end, and a command line interface. The web front end additionally provides summary visualisations for data exploration and reporting (see https://goat.genomehubs.org). GoaT currently holds direct or estimated values for over 70 taxon attributes and over 30 assembly attributes across 1.5 million eukaryotic species. The depth and breadth of curated data, frequent updates, and a versatile query interface make GoaT a powerful data aggregator and portal to explore and report underlying data for the eukaryotic tree of life. We illustrate this utility through a series of use cases from planning through to completion of a genome-sequencing project. F1000 Research Limited 2023-01-17 /pmc/articles/PMC9971660/ /pubmed/36864925 http://dx.doi.org/10.12688/wellcomeopenres.18658.1 Text en Copyright: © 2023 Challis R et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Tool Article Challis, Richard Kumar, Sujai Sotero-Caio, Cibele Brown, Max Blaxter, Mark Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life |
title | Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life |
title_full | Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life |
title_fullStr | Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life |
title_full_unstemmed | Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life |
title_short | Genomes on a Tree (GoaT): A versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life |
title_sort | genomes on a tree (goat): a versatile, scalable search engine for genomic and sequencing project metadata across the eukaryotic tree of life |
topic | Software Tool Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9971660/ https://www.ncbi.nlm.nih.gov/pubmed/36864925 http://dx.doi.org/10.12688/wellcomeopenres.18658.1 |
work_keys_str_mv | AT challisrichard genomesonatreegoataversatilescalablesearchengineforgenomicandsequencingprojectmetadataacrosstheeukaryotictreeoflife AT kumarsujai genomesonatreegoataversatilescalablesearchengineforgenomicandsequencingprojectmetadataacrosstheeukaryotictreeoflife AT soterocaiocibele genomesonatreegoataversatilescalablesearchengineforgenomicandsequencingprojectmetadataacrosstheeukaryotictreeoflife AT brownmax genomesonatreegoataversatilescalablesearchengineforgenomicandsequencingprojectmetadataacrosstheeukaryotictreeoflife AT blaxtermark genomesonatreegoataversatilescalablesearchengineforgenomicandsequencingprojectmetadataacrosstheeukaryotictreeoflife |