Cargando…

Developing a standardized but extendable framework to increase the findability of infectious disease datasets

Biomedical datasets are increasing in size, stored in many repositories, and face challenges in FAIRness (findability, accessibility, interoperability, reusability). As a Consortium of infectious disease researchers from 15 Centers, we aim to adopt open science practices to promote transparency, enc...

Descripción completa

Detalles Bibliográficos
Autores principales: Tsueng, Ginger, Cano, Marco A. Alvarado, Bento, José, Czech, Candice, Kang, Mengjia, Pache, Lars, Rasmussen, Luke V., Savidge, Tor C., Starren, Justin, Wu, Qinglong, Xin, Jiwen, Yeaman, Michael R., Zhou, Xinghua, Su, Andrew I., Wu, Chunlei, Brown, Liliana, Shabman, Reed S., Hughes, Laura D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9950378/
https://www.ncbi.nlm.nih.gov/pubmed/36823157
http://dx.doi.org/10.1038/s41597-023-01968-9
_version_ 1784893150055104512
author Tsueng, Ginger
Cano, Marco A. Alvarado
Bento, José
Czech, Candice
Kang, Mengjia
Pache, Lars
Rasmussen, Luke V.
Savidge, Tor C.
Starren, Justin
Wu, Qinglong
Xin, Jiwen
Yeaman, Michael R.
Zhou, Xinghua
Su, Andrew I.
Wu, Chunlei
Brown, Liliana
Shabman, Reed S.
Hughes, Laura D.
author_facet Tsueng, Ginger
Cano, Marco A. Alvarado
Bento, José
Czech, Candice
Kang, Mengjia
Pache, Lars
Rasmussen, Luke V.
Savidge, Tor C.
Starren, Justin
Wu, Qinglong
Xin, Jiwen
Yeaman, Michael R.
Zhou, Xinghua
Su, Andrew I.
Wu, Chunlei
Brown, Liliana
Shabman, Reed S.
Hughes, Laura D.
author_sort Tsueng, Ginger
collection PubMed
description Biomedical datasets are increasing in size, stored in many repositories, and face challenges in FAIRness (findability, accessibility, interoperability, reusability). As a Consortium of infectious disease researchers from 15 Centers, we aim to adopt open science practices to promote transparency, encourage reproducibility, and accelerate research advances through data reuse. To improve FAIRness of our datasets and computational tools, we evaluated metadata standards across established biomedical data repositories. The vast majority do not adhere to a single standard, such as Schema.org, which is widely-adopted by generalist repositories. Consequently, datasets in these repositories are not findable in aggregation projects like Google Dataset Search. We alleviated this gap by creating a reusable metadata schema based on Schema.org and catalogued nearly 400 datasets and computational tools we collected. The approach is easily reusable to create schemas interoperable with community standards, but customized to a particular context. Our approach enabled data discovery, increased the reusability of datasets from a large research consortium, and accelerated research. Lastly, we discuss ongoing challenges with FAIRness beyond discoverability.
format Online
Article
Text
id pubmed-9950378
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-99503782023-02-25 Developing a standardized but extendable framework to increase the findability of infectious disease datasets Tsueng, Ginger Cano, Marco A. Alvarado Bento, José Czech, Candice Kang, Mengjia Pache, Lars Rasmussen, Luke V. Savidge, Tor C. Starren, Justin Wu, Qinglong Xin, Jiwen Yeaman, Michael R. Zhou, Xinghua Su, Andrew I. Wu, Chunlei Brown, Liliana Shabman, Reed S. Hughes, Laura D. Sci Data Article Biomedical datasets are increasing in size, stored in many repositories, and face challenges in FAIRness (findability, accessibility, interoperability, reusability). As a Consortium of infectious disease researchers from 15 Centers, we aim to adopt open science practices to promote transparency, encourage reproducibility, and accelerate research advances through data reuse. To improve FAIRness of our datasets and computational tools, we evaluated metadata standards across established biomedical data repositories. The vast majority do not adhere to a single standard, such as Schema.org, which is widely-adopted by generalist repositories. Consequently, datasets in these repositories are not findable in aggregation projects like Google Dataset Search. We alleviated this gap by creating a reusable metadata schema based on Schema.org and catalogued nearly 400 datasets and computational tools we collected. The approach is easily reusable to create schemas interoperable with community standards, but customized to a particular context. Our approach enabled data discovery, increased the reusability of datasets from a large research consortium, and accelerated research. Lastly, we discuss ongoing challenges with FAIRness beyond discoverability. Nature Publishing Group UK 2023-02-23 /pmc/articles/PMC9950378/ /pubmed/36823157 http://dx.doi.org/10.1038/s41597-023-01968-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Tsueng, Ginger
Cano, Marco A. Alvarado
Bento, José
Czech, Candice
Kang, Mengjia
Pache, Lars
Rasmussen, Luke V.
Savidge, Tor C.
Starren, Justin
Wu, Qinglong
Xin, Jiwen
Yeaman, Michael R.
Zhou, Xinghua
Su, Andrew I.
Wu, Chunlei
Brown, Liliana
Shabman, Reed S.
Hughes, Laura D.
Developing a standardized but extendable framework to increase the findability of infectious disease datasets
title Developing a standardized but extendable framework to increase the findability of infectious disease datasets
title_full Developing a standardized but extendable framework to increase the findability of infectious disease datasets
title_fullStr Developing a standardized but extendable framework to increase the findability of infectious disease datasets
title_full_unstemmed Developing a standardized but extendable framework to increase the findability of infectious disease datasets
title_short Developing a standardized but extendable framework to increase the findability of infectious disease datasets
title_sort developing a standardized but extendable framework to increase the findability of infectious disease datasets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9950378/
https://www.ncbi.nlm.nih.gov/pubmed/36823157
http://dx.doi.org/10.1038/s41597-023-01968-9
work_keys_str_mv AT tsuengginger developingastandardizedbutextendableframeworktoincreasethefindabilityofinfectiousdiseasedatasets
AT canomarcoaalvarado developingastandardizedbutextendableframeworktoincreasethefindabilityofinfectiousdiseasedatasets
AT bentojose developingastandardizedbutextendableframeworktoincreasethefindabilityofinfectiousdiseasedatasets
AT czechcandice developingastandardizedbutextendableframeworktoincreasethefindabilityofinfectiousdiseasedatasets
AT kangmengjia developingastandardizedbutextendableframeworktoincreasethefindabilityofinfectiousdiseasedatasets
AT pachelars developingastandardizedbutextendableframeworktoincreasethefindabilityofinfectiousdiseasedatasets
AT rasmussenlukev developingastandardizedbutextendableframeworktoincreasethefindabilityofinfectiousdiseasedatasets
AT savidgetorc developingastandardizedbutextendableframeworktoincreasethefindabilityofinfectiousdiseasedatasets
AT starrenjustin developingastandardizedbutextendableframeworktoincreasethefindabilityofinfectiousdiseasedatasets
AT wuqinglong developingastandardizedbutextendableframeworktoincreasethefindabilityofinfectiousdiseasedatasets
AT xinjiwen developingastandardizedbutextendableframeworktoincreasethefindabilityofinfectiousdiseasedatasets
AT yeamanmichaelr developingastandardizedbutextendableframeworktoincreasethefindabilityofinfectiousdiseasedatasets
AT zhouxinghua developingastandardizedbutextendableframeworktoincreasethefindabilityofinfectiousdiseasedatasets
AT suandrewi developingastandardizedbutextendableframeworktoincreasethefindabilityofinfectiousdiseasedatasets
AT wuchunlei developingastandardizedbutextendableframeworktoincreasethefindabilityofinfectiousdiseasedatasets
AT brownliliana developingastandardizedbutextendableframeworktoincreasethefindabilityofinfectiousdiseasedatasets
AT shabmanreeds developingastandardizedbutextendableframeworktoincreasethefindabilityofinfectiousdiseasedatasets
AT hugheslaurad developingastandardizedbutextendableframeworktoincreasethefindabilityofinfectiousdiseasedatasets
AT developingastandardizedbutextendableframeworktoincreasethefindabilityofinfectiousdiseasedatasets