Cargando…

Towards self-describing and FAIR bulk formats for biomedical data

We introduce a self-describing serialized format for bulk biomedical data called the Portable Format for Biomedical (PFB) data. The Portable Format for Biomedical data is based upon Avro and encapsulates a data model, a data dictionary, the data itself, and pointers to third party controlled vocabul...

Descripción completa

Detalles Bibliográficos
Autores principales: Lukowski, Michael, Prokhorenkov, Andrew, Grossman, Robert L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10035862/
https://www.ncbi.nlm.nih.gov/pubmed/36913405
http://dx.doi.org/10.1371/journal.pcbi.1010944
_version_ 1784911508988231680
author Lukowski, Michael
Prokhorenkov, Andrew
Grossman, Robert L.
author_facet Lukowski, Michael
Prokhorenkov, Andrew
Grossman, Robert L.
author_sort Lukowski, Michael
collection PubMed
description We introduce a self-describing serialized format for bulk biomedical data called the Portable Format for Biomedical (PFB) data. The Portable Format for Biomedical data is based upon Avro and encapsulates a data model, a data dictionary, the data itself, and pointers to third party controlled vocabularies. In general, each data element in the data dictionary is associated with a third party controlled vocabulary to make it easier for applications to harmonize two or more PFB files. We also introduce an open source software development kit (SDK) called PyPFB for creating, exploring and modifying PFB files. We describe experimental studies showing the performance improvements when importing and exporting bulk biomedical data in the PFB format versus using JSON and SQL formats.
format Online
Article
Text
id pubmed-10035862
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-100358622023-03-24 Towards self-describing and FAIR bulk formats for biomedical data Lukowski, Michael Prokhorenkov, Andrew Grossman, Robert L. PLoS Comput Biol Research Article We introduce a self-describing serialized format for bulk biomedical data called the Portable Format for Biomedical (PFB) data. The Portable Format for Biomedical data is based upon Avro and encapsulates a data model, a data dictionary, the data itself, and pointers to third party controlled vocabularies. In general, each data element in the data dictionary is associated with a third party controlled vocabulary to make it easier for applications to harmonize two or more PFB files. We also introduce an open source software development kit (SDK) called PyPFB for creating, exploring and modifying PFB files. We describe experimental studies showing the performance improvements when importing and exporting bulk biomedical data in the PFB format versus using JSON and SQL formats. Public Library of Science 2023-03-13 /pmc/articles/PMC10035862/ /pubmed/36913405 http://dx.doi.org/10.1371/journal.pcbi.1010944 Text en © 2023 Lukowski et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Lukowski, Michael
Prokhorenkov, Andrew
Grossman, Robert L.
Towards self-describing and FAIR bulk formats for biomedical data
title Towards self-describing and FAIR bulk formats for biomedical data
title_full Towards self-describing and FAIR bulk formats for biomedical data
title_fullStr Towards self-describing and FAIR bulk formats for biomedical data
title_full_unstemmed Towards self-describing and FAIR bulk formats for biomedical data
title_short Towards self-describing and FAIR bulk formats for biomedical data
title_sort towards self-describing and fair bulk formats for biomedical data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10035862/
https://www.ncbi.nlm.nih.gov/pubmed/36913405
http://dx.doi.org/10.1371/journal.pcbi.1010944
work_keys_str_mv AT lukowskimichael towardsselfdescribingandfairbulkformatsforbiomedicaldata
AT prokhorenkovandrew towardsselfdescribingandfairbulkformatsforbiomedicaldata
AT grossmanrobertl towardsselfdescribingandfairbulkformatsforbiomedicaldata