Cargando…

A new phylogenetic data standard for computable clade definitions: the Phyloreference Exchange Format (Phyx)

To be computationally reproducible and efficient, integration of disparate data depends on shared entities whose matching meaning (semantics) can be computationally assessed. For biodiversity data one of the most prevalent shared entities for linking data records is the associated taxon concept. Unl...

Descripción completa

Detalles Bibliográficos
Autores principales: Vaidya, Gaurav, Cellinese, Nico, Lapp, Hilmar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8855714/
https://www.ncbi.nlm.nih.gov/pubmed/35186448
http://dx.doi.org/10.7717/peerj.12618
_version_ 1784653705914613760
author Vaidya, Gaurav
Cellinese, Nico
Lapp, Hilmar
author_facet Vaidya, Gaurav
Cellinese, Nico
Lapp, Hilmar
author_sort Vaidya, Gaurav
collection PubMed
description To be computationally reproducible and efficient, integration of disparate data depends on shared entities whose matching meaning (semantics) can be computationally assessed. For biodiversity data one of the most prevalent shared entities for linking data records is the associated taxon concept. Unlike Linnaean taxon names, the traditional way in which taxon concepts are provided, phylogenetic definitions are native to phylogenetic trees and offer well-defined semantics that can be transformed into formal, computationally evaluable logic expressions. These attributes make them highly suitable for phylogeny-driven comparative biology by allowing computationally verifiable and reproducible integration of taxon-linked data against Tree of Life-scale phylogenies. To achieve this, the first step is transforming phylogenetic definitions from the natural language text in which they are published to a structured interoperable data format that maintains strong ties to semantics and lends itself well to sharing, reuse, and long-term archival. To this end, we developed the Phyloreference Exchange Format (Phyx), a JSON-LD-based text format encompassing rich metadata for all elements of a phylogenetic definition, and we created a supporting software library, phyx.js, to streamline computational management of such files. Together they form a foundation layer for digitizing and computing with phylogenetic definitions of clades.
format Online
Article
Text
id pubmed-8855714
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-88557142022-02-19 A new phylogenetic data standard for computable clade definitions: the Phyloreference Exchange Format (Phyx) Vaidya, Gaurav Cellinese, Nico Lapp, Hilmar PeerJ Bioinformatics To be computationally reproducible and efficient, integration of disparate data depends on shared entities whose matching meaning (semantics) can be computationally assessed. For biodiversity data one of the most prevalent shared entities for linking data records is the associated taxon concept. Unlike Linnaean taxon names, the traditional way in which taxon concepts are provided, phylogenetic definitions are native to phylogenetic trees and offer well-defined semantics that can be transformed into formal, computationally evaluable logic expressions. These attributes make them highly suitable for phylogeny-driven comparative biology by allowing computationally verifiable and reproducible integration of taxon-linked data against Tree of Life-scale phylogenies. To achieve this, the first step is transforming phylogenetic definitions from the natural language text in which they are published to a structured interoperable data format that maintains strong ties to semantics and lends itself well to sharing, reuse, and long-term archival. To this end, we developed the Phyloreference Exchange Format (Phyx), a JSON-LD-based text format encompassing rich metadata for all elements of a phylogenetic definition, and we created a supporting software library, phyx.js, to streamline computational management of such files. Together they form a foundation layer for digitizing and computing with phylogenetic definitions of clades. PeerJ Inc. 2022-02-15 /pmc/articles/PMC8855714/ /pubmed/35186448 http://dx.doi.org/10.7717/peerj.12618 Text en ©2022 Vaidya et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Vaidya, Gaurav
Cellinese, Nico
Lapp, Hilmar
A new phylogenetic data standard for computable clade definitions: the Phyloreference Exchange Format (Phyx)
title A new phylogenetic data standard for computable clade definitions: the Phyloreference Exchange Format (Phyx)
title_full A new phylogenetic data standard for computable clade definitions: the Phyloreference Exchange Format (Phyx)
title_fullStr A new phylogenetic data standard for computable clade definitions: the Phyloreference Exchange Format (Phyx)
title_full_unstemmed A new phylogenetic data standard for computable clade definitions: the Phyloreference Exchange Format (Phyx)
title_short A new phylogenetic data standard for computable clade definitions: the Phyloreference Exchange Format (Phyx)
title_sort new phylogenetic data standard for computable clade definitions: the phyloreference exchange format (phyx)
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8855714/
https://www.ncbi.nlm.nih.gov/pubmed/35186448
http://dx.doi.org/10.7717/peerj.12618
work_keys_str_mv AT vaidyagaurav anewphylogeneticdatastandardforcomputablecladedefinitionsthephyloreferenceexchangeformatphyx
AT cellinesenico anewphylogeneticdatastandardforcomputablecladedefinitionsthephyloreferenceexchangeformatphyx
AT lapphilmar anewphylogeneticdatastandardforcomputablecladedefinitionsthephyloreferenceexchangeformatphyx
AT vaidyagaurav newphylogeneticdatastandardforcomputablecladedefinitionsthephyloreferenceexchangeformatphyx
AT cellinesenico newphylogeneticdatastandardforcomputablecladedefinitionsthephyloreferenceexchangeformatphyx
AT lapphilmar newphylogeneticdatastandardforcomputablecladedefinitionsthephyloreferenceexchangeformatphyx