Cargando…

Local data commons: the sleeping beauty in the community of data commons

BACKGROUND: Public Data Commons (PDC) have been highlighted in the scientific literature for their capacity to collect and harmonize big data. On the other hand, local data commons (LDC), located within an institution or organization, have been underrepresented in the scientific literature, even tho...

Descripción completa

Detalles Bibliográficos
Autores principales: Jeong, Jong Cheol, Hands, Isaac, Kolesar, Jill M., Rao, Mahadev, Davis, Bront, Dobyns, York, Hurt-Mueller, Joseph, Levens, Justin, Gregory, Jenny, Williams, John, Witt, Lisa, Kim, Eun Mi, Burton, Carlee, Elbiheary, Amir A., Chang, Mingguang, Durbin, Eric B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9502580/
https://www.ncbi.nlm.nih.gov/pubmed/36151511
http://dx.doi.org/10.1186/s12859-022-04922-5
_version_ 1784795741424713728
author Jeong, Jong Cheol
Hands, Isaac
Kolesar, Jill M.
Rao, Mahadev
Davis, Bront
Dobyns, York
Hurt-Mueller, Joseph
Levens, Justin
Gregory, Jenny
Williams, John
Witt, Lisa
Kim, Eun Mi
Burton, Carlee
Elbiheary, Amir A.
Chang, Mingguang
Durbin, Eric B.
author_facet Jeong, Jong Cheol
Hands, Isaac
Kolesar, Jill M.
Rao, Mahadev
Davis, Bront
Dobyns, York
Hurt-Mueller, Joseph
Levens, Justin
Gregory, Jenny
Williams, John
Witt, Lisa
Kim, Eun Mi
Burton, Carlee
Elbiheary, Amir A.
Chang, Mingguang
Durbin, Eric B.
author_sort Jeong, Jong Cheol
collection PubMed
description BACKGROUND: Public Data Commons (PDC) have been highlighted in the scientific literature for their capacity to collect and harmonize big data. On the other hand, local data commons (LDC), located within an institution or organization, have been underrepresented in the scientific literature, even though they are a critical part of research infrastructure. Being closest to the sources of data, LDCs provide the ability to collect and maintain the most up-to-date, high-quality data within an organization, closest to the sources of the data. As a data provider, LDCs have many challenges in both collecting and standardizing data, moreover, as a consumer of PDC, they face problems of data harmonization stemming from the monolithic harmonization pipeline designs commonly adapted by many PDCs. Unfortunately, existing guidelines and resources for building and maintaining data commons exclusively focus on PDC and provide very little information on LDC. RESULTS: This article focuses on four important observations. First, there are three different types of LDC service models that are defined based on their roles and requirements. These can be used as guidelines for building new LDC or enhancing the services of existing LDC. Second, the seven core services of LDC are discussed, including cohort identification and facilitation of genomic sequencing, the management of molecular reports and associated infrastructure, quality control, data harmonization, data integration, data sharing, and data access control. Third, instead of commonly developed monolithic systems, we propose a new data sharing method for data harmonization that combines both divide-and-conquer and bottom-up approaches. Finally, an end-to-end LDC implementation is introduced with real-world examples. CONCLUSIONS: Although LDCs are an optimal place to identify and address data quality issues, they have traditionally been relegated to the role of passive data provider for much larger PDC. Indeed, many LDCs limit their functions to only conducting routine data storage and transmission tasks due to a lack of information on how to design, develop, and improve their services using limited resources. We hope that this work will be the first small step in raising awareness among the LDCs of their expanded utility and to publicize to a wider audience the importance of LDC.
format Online
Article
Text
id pubmed-9502580
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-95025802022-09-24 Local data commons: the sleeping beauty in the community of data commons Jeong, Jong Cheol Hands, Isaac Kolesar, Jill M. Rao, Mahadev Davis, Bront Dobyns, York Hurt-Mueller, Joseph Levens, Justin Gregory, Jenny Williams, John Witt, Lisa Kim, Eun Mi Burton, Carlee Elbiheary, Amir A. Chang, Mingguang Durbin, Eric B. BMC Bioinformatics Methodology BACKGROUND: Public Data Commons (PDC) have been highlighted in the scientific literature for their capacity to collect and harmonize big data. On the other hand, local data commons (LDC), located within an institution or organization, have been underrepresented in the scientific literature, even though they are a critical part of research infrastructure. Being closest to the sources of data, LDCs provide the ability to collect and maintain the most up-to-date, high-quality data within an organization, closest to the sources of the data. As a data provider, LDCs have many challenges in both collecting and standardizing data, moreover, as a consumer of PDC, they face problems of data harmonization stemming from the monolithic harmonization pipeline designs commonly adapted by many PDCs. Unfortunately, existing guidelines and resources for building and maintaining data commons exclusively focus on PDC and provide very little information on LDC. RESULTS: This article focuses on four important observations. First, there are three different types of LDC service models that are defined based on their roles and requirements. These can be used as guidelines for building new LDC or enhancing the services of existing LDC. Second, the seven core services of LDC are discussed, including cohort identification and facilitation of genomic sequencing, the management of molecular reports and associated infrastructure, quality control, data harmonization, data integration, data sharing, and data access control. Third, instead of commonly developed monolithic systems, we propose a new data sharing method for data harmonization that combines both divide-and-conquer and bottom-up approaches. Finally, an end-to-end LDC implementation is introduced with real-world examples. CONCLUSIONS: Although LDCs are an optimal place to identify and address data quality issues, they have traditionally been relegated to the role of passive data provider for much larger PDC. Indeed, many LDCs limit their functions to only conducting routine data storage and transmission tasks due to a lack of information on how to design, develop, and improve their services using limited resources. We hope that this work will be the first small step in raising awareness among the LDCs of their expanded utility and to publicize to a wider audience the importance of LDC. BioMed Central 2022-09-23 /pmc/articles/PMC9502580/ /pubmed/36151511 http://dx.doi.org/10.1186/s12859-022-04922-5 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology
Jeong, Jong Cheol
Hands, Isaac
Kolesar, Jill M.
Rao, Mahadev
Davis, Bront
Dobyns, York
Hurt-Mueller, Joseph
Levens, Justin
Gregory, Jenny
Williams, John
Witt, Lisa
Kim, Eun Mi
Burton, Carlee
Elbiheary, Amir A.
Chang, Mingguang
Durbin, Eric B.
Local data commons: the sleeping beauty in the community of data commons
title Local data commons: the sleeping beauty in the community of data commons
title_full Local data commons: the sleeping beauty in the community of data commons
title_fullStr Local data commons: the sleeping beauty in the community of data commons
title_full_unstemmed Local data commons: the sleeping beauty in the community of data commons
title_short Local data commons: the sleeping beauty in the community of data commons
title_sort local data commons: the sleeping beauty in the community of data commons
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9502580/
https://www.ncbi.nlm.nih.gov/pubmed/36151511
http://dx.doi.org/10.1186/s12859-022-04922-5
work_keys_str_mv AT jeongjongcheol localdatacommonsthesleepingbeautyinthecommunityofdatacommons
AT handsisaac localdatacommonsthesleepingbeautyinthecommunityofdatacommons
AT kolesarjillm localdatacommonsthesleepingbeautyinthecommunityofdatacommons
AT raomahadev localdatacommonsthesleepingbeautyinthecommunityofdatacommons
AT davisbront localdatacommonsthesleepingbeautyinthecommunityofdatacommons
AT dobynsyork localdatacommonsthesleepingbeautyinthecommunityofdatacommons
AT hurtmuellerjoseph localdatacommonsthesleepingbeautyinthecommunityofdatacommons
AT levensjustin localdatacommonsthesleepingbeautyinthecommunityofdatacommons
AT gregoryjenny localdatacommonsthesleepingbeautyinthecommunityofdatacommons
AT williamsjohn localdatacommonsthesleepingbeautyinthecommunityofdatacommons
AT wittlisa localdatacommonsthesleepingbeautyinthecommunityofdatacommons
AT kimeunmi localdatacommonsthesleepingbeautyinthecommunityofdatacommons
AT burtoncarlee localdatacommonsthesleepingbeautyinthecommunityofdatacommons
AT elbihearyamira localdatacommonsthesleepingbeautyinthecommunityofdatacommons
AT changmingguang localdatacommonsthesleepingbeautyinthecommunityofdatacommons
AT durbinericb localdatacommonsthesleepingbeautyinthecommunityofdatacommons