Cargando…

Ontology-Enriched Specifications Enabling Findable, Accessible, Interoperable, and Reusable Marine Metagenomic Datasets in Cyberinfrastructure Systems

Marine microbial ecology requires the systematic comparison of biogeochemical and sequence data to analyze environmental influences on the distribution and variability of microbial communities. With ever-increasing quantities of metagenomic data, there is a growing need to make datasets Findable, Ac...

Descripción completa

Detalles Bibliográficos
Autores principales: Blumberg, Kai L., Ponsero, Alise J., Bomhoff, Matthew, Wood-Charlson, Elisha M., DeLong, Edward F., Hurwitz, Bonnie L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8692764/
https://www.ncbi.nlm.nih.gov/pubmed/34956127
http://dx.doi.org/10.3389/fmicb.2021.765268
_version_ 1784619003194376192
author Blumberg, Kai L.
Ponsero, Alise J.
Bomhoff, Matthew
Wood-Charlson, Elisha M.
DeLong, Edward F.
Hurwitz, Bonnie L.
author_facet Blumberg, Kai L.
Ponsero, Alise J.
Bomhoff, Matthew
Wood-Charlson, Elisha M.
DeLong, Edward F.
Hurwitz, Bonnie L.
author_sort Blumberg, Kai L.
collection PubMed
description Marine microbial ecology requires the systematic comparison of biogeochemical and sequence data to analyze environmental influences on the distribution and variability of microbial communities. With ever-increasing quantities of metagenomic data, there is a growing need to make datasets Findable, Accessible, Interoperable, and Reusable (FAIR) across diverse ecosystems. FAIR data is essential to developing analytical frameworks that integrate microbiological, genomic, ecological, oceanographic, and computational methods. Although community standards defining the minimal metadata required to accompany sequence data exist, they haven’t been consistently used across projects, precluding interoperability. Moreover, these data are not machine-actionable or discoverable by cyberinfrastructure systems. By making ‘omic and physicochemical datasets FAIR to machine systems, we can enable sequence data discovery and reuse based on machine-readable descriptions of environments or physicochemical gradients. In this work, we developed a novel technical specification for dataset encapsulation for the FAIR reuse of marine metagenomic and physicochemical datasets within cyberinfrastructure systems. This includes using Frictionless Data Packages enriched with terminology from environmental and life-science ontologies to annotate measured variables, their units, and the measurement devices used. This approach was implemented in Planet Microbe, a cyberinfrastructure platform and marine metagenomic web-portal. Here, we discuss the data properties built into the specification to make global ocean datasets FAIR within the Planet Microbe portal. We additionally discuss the selection of, and contributions to marine-science ontologies used within the specification. Finally, we use the system to discover data by which to answer various biological questions about environments, physicochemical gradients, and microbial communities in meta-analyses. This work represents a future direction in marine metagenomic research by proposing a specification for FAIR dataset encapsulation that, if adopted within cyberinfrastructure systems, would automate the discovery, exchange, and re-use of data needed to answer broader reaching questions than originally intended.
format Online
Article
Text
id pubmed-8692764
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-86927642021-12-23 Ontology-Enriched Specifications Enabling Findable, Accessible, Interoperable, and Reusable Marine Metagenomic Datasets in Cyberinfrastructure Systems Blumberg, Kai L. Ponsero, Alise J. Bomhoff, Matthew Wood-Charlson, Elisha M. DeLong, Edward F. Hurwitz, Bonnie L. Front Microbiol Microbiology Marine microbial ecology requires the systematic comparison of biogeochemical and sequence data to analyze environmental influences on the distribution and variability of microbial communities. With ever-increasing quantities of metagenomic data, there is a growing need to make datasets Findable, Accessible, Interoperable, and Reusable (FAIR) across diverse ecosystems. FAIR data is essential to developing analytical frameworks that integrate microbiological, genomic, ecological, oceanographic, and computational methods. Although community standards defining the minimal metadata required to accompany sequence data exist, they haven’t been consistently used across projects, precluding interoperability. Moreover, these data are not machine-actionable or discoverable by cyberinfrastructure systems. By making ‘omic and physicochemical datasets FAIR to machine systems, we can enable sequence data discovery and reuse based on machine-readable descriptions of environments or physicochemical gradients. In this work, we developed a novel technical specification for dataset encapsulation for the FAIR reuse of marine metagenomic and physicochemical datasets within cyberinfrastructure systems. This includes using Frictionless Data Packages enriched with terminology from environmental and life-science ontologies to annotate measured variables, their units, and the measurement devices used. This approach was implemented in Planet Microbe, a cyberinfrastructure platform and marine metagenomic web-portal. Here, we discuss the data properties built into the specification to make global ocean datasets FAIR within the Planet Microbe portal. We additionally discuss the selection of, and contributions to marine-science ontologies used within the specification. Finally, we use the system to discover data by which to answer various biological questions about environments, physicochemical gradients, and microbial communities in meta-analyses. This work represents a future direction in marine metagenomic research by proposing a specification for FAIR dataset encapsulation that, if adopted within cyberinfrastructure systems, would automate the discovery, exchange, and re-use of data needed to answer broader reaching questions than originally intended. Frontiers Media S.A. 2021-12-08 /pmc/articles/PMC8692764/ /pubmed/34956127 http://dx.doi.org/10.3389/fmicb.2021.765268 Text en Copyright © 2021 Blumberg, Ponsero, Bomhoff, Wood-Charlson, DeLong and Hurwitz. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Blumberg, Kai L.
Ponsero, Alise J.
Bomhoff, Matthew
Wood-Charlson, Elisha M.
DeLong, Edward F.
Hurwitz, Bonnie L.
Ontology-Enriched Specifications Enabling Findable, Accessible, Interoperable, and Reusable Marine Metagenomic Datasets in Cyberinfrastructure Systems
title Ontology-Enriched Specifications Enabling Findable, Accessible, Interoperable, and Reusable Marine Metagenomic Datasets in Cyberinfrastructure Systems
title_full Ontology-Enriched Specifications Enabling Findable, Accessible, Interoperable, and Reusable Marine Metagenomic Datasets in Cyberinfrastructure Systems
title_fullStr Ontology-Enriched Specifications Enabling Findable, Accessible, Interoperable, and Reusable Marine Metagenomic Datasets in Cyberinfrastructure Systems
title_full_unstemmed Ontology-Enriched Specifications Enabling Findable, Accessible, Interoperable, and Reusable Marine Metagenomic Datasets in Cyberinfrastructure Systems
title_short Ontology-Enriched Specifications Enabling Findable, Accessible, Interoperable, and Reusable Marine Metagenomic Datasets in Cyberinfrastructure Systems
title_sort ontology-enriched specifications enabling findable, accessible, interoperable, and reusable marine metagenomic datasets in cyberinfrastructure systems
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8692764/
https://www.ncbi.nlm.nih.gov/pubmed/34956127
http://dx.doi.org/10.3389/fmicb.2021.765268
work_keys_str_mv AT blumbergkail ontologyenrichedspecificationsenablingfindableaccessibleinteroperableandreusablemarinemetagenomicdatasetsincyberinfrastructuresystems
AT ponseroalisej ontologyenrichedspecificationsenablingfindableaccessibleinteroperableandreusablemarinemetagenomicdatasetsincyberinfrastructuresystems
AT bomhoffmatthew ontologyenrichedspecificationsenablingfindableaccessibleinteroperableandreusablemarinemetagenomicdatasetsincyberinfrastructuresystems
AT woodcharlsonelisham ontologyenrichedspecificationsenablingfindableaccessibleinteroperableandreusablemarinemetagenomicdatasetsincyberinfrastructuresystems
AT delongedwardf ontologyenrichedspecificationsenablingfindableaccessibleinteroperableandreusablemarinemetagenomicdatasetsincyberinfrastructuresystems
AT hurwitzbonniel ontologyenrichedspecificationsenablingfindableaccessibleinteroperableandreusablemarinemetagenomicdatasetsincyberinfrastructuresystems