Cargando…
Towards self-describing and FAIR bulk formats for biomedical data
We introduce a self-describing serialized format for bulk biomedical data called the Portable Format for Biomedical (PFB) data. The Portable Format for Biomedical data is based upon Avro and encapsulates a data model, a data dictionary, the data itself, and pointers to third party controlled vocabul...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10035862/ https://www.ncbi.nlm.nih.gov/pubmed/36913405 http://dx.doi.org/10.1371/journal.pcbi.1010944 |
_version_ | 1784911508988231680 |
---|---|
author | Lukowski, Michael Prokhorenkov, Andrew Grossman, Robert L. |
author_facet | Lukowski, Michael Prokhorenkov, Andrew Grossman, Robert L. |
author_sort | Lukowski, Michael |
collection | PubMed |
description | We introduce a self-describing serialized format for bulk biomedical data called the Portable Format for Biomedical (PFB) data. The Portable Format for Biomedical data is based upon Avro and encapsulates a data model, a data dictionary, the data itself, and pointers to third party controlled vocabularies. In general, each data element in the data dictionary is associated with a third party controlled vocabulary to make it easier for applications to harmonize two or more PFB files. We also introduce an open source software development kit (SDK) called PyPFB for creating, exploring and modifying PFB files. We describe experimental studies showing the performance improvements when importing and exporting bulk biomedical data in the PFB format versus using JSON and SQL formats. |
format | Online Article Text |
id | pubmed-10035862 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-100358622023-03-24 Towards self-describing and FAIR bulk formats for biomedical data Lukowski, Michael Prokhorenkov, Andrew Grossman, Robert L. PLoS Comput Biol Research Article We introduce a self-describing serialized format for bulk biomedical data called the Portable Format for Biomedical (PFB) data. The Portable Format for Biomedical data is based upon Avro and encapsulates a data model, a data dictionary, the data itself, and pointers to third party controlled vocabularies. In general, each data element in the data dictionary is associated with a third party controlled vocabulary to make it easier for applications to harmonize two or more PFB files. We also introduce an open source software development kit (SDK) called PyPFB for creating, exploring and modifying PFB files. We describe experimental studies showing the performance improvements when importing and exporting bulk biomedical data in the PFB format versus using JSON and SQL formats. Public Library of Science 2023-03-13 /pmc/articles/PMC10035862/ /pubmed/36913405 http://dx.doi.org/10.1371/journal.pcbi.1010944 Text en © 2023 Lukowski et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Lukowski, Michael Prokhorenkov, Andrew Grossman, Robert L. Towards self-describing and FAIR bulk formats for biomedical data |
title | Towards self-describing and FAIR bulk formats for biomedical data |
title_full | Towards self-describing and FAIR bulk formats for biomedical data |
title_fullStr | Towards self-describing and FAIR bulk formats for biomedical data |
title_full_unstemmed | Towards self-describing and FAIR bulk formats for biomedical data |
title_short | Towards self-describing and FAIR bulk formats for biomedical data |
title_sort | towards self-describing and fair bulk formats for biomedical data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10035862/ https://www.ncbi.nlm.nih.gov/pubmed/36913405 http://dx.doi.org/10.1371/journal.pcbi.1010944 |
work_keys_str_mv | AT lukowskimichael towardsselfdescribingandfairbulkformatsforbiomedicaldata AT prokhorenkovandrew towardsselfdescribingandfairbulkformatsforbiomedicaldata AT grossmanrobertl towardsselfdescribingandfairbulkformatsforbiomedicaldata |