Cargando…

CORAL: A framework for rigorous self-validated data modeling and integrative, reproducible data analysis

BACKGROUND: Many organizations face challenges in managing and analyzing data, especially when relevant datasets arise from multiple sources and methods. Analyzing heterogeneous datasets and additional derived data requires rigorous tracking of their interrelationships and provenance. This task has...

Descripción completa

Detalles Bibliográficos
Autores principales: Novichkov, Pavel S, Chandonia, John-Marc, Arkin, Adam P
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9575582/
https://www.ncbi.nlm.nih.gov/pubmed/36251274
http://dx.doi.org/10.1093/gigascience/giac089
_version_ 1784811341430652928
author Novichkov, Pavel S
Chandonia, John-Marc
Arkin, Adam P
author_facet Novichkov, Pavel S
Chandonia, John-Marc
Arkin, Adam P
author_sort Novichkov, Pavel S
collection PubMed
description BACKGROUND: Many organizations face challenges in managing and analyzing data, especially when relevant datasets arise from multiple sources and methods. Analyzing heterogeneous datasets and additional derived data requires rigorous tracking of their interrelationships and provenance. This task has long been a Grand Challenge of data science and has more recently been formalized in the FAIR principles: that all data objects be Findable, Accessible, Interoperable, and Reusable, both for machines and for people. Adherence to these principles is necessary for proper stewardship of information, for testing regulatory compliance, for measuring the efficiency of processes, and for facilitating reuse of data-analytical frameworks. FINDINGS: We present the Contextual Ontology-based Repository Analysis Library (CORAL), a platform that greatly facilitates adherence to all 4 of the FAIR principles, including the especially difficult challenge of making heterogeneous datasets Interoperable and Reusable across all parts of a large, long-lasting organization. To achieve this, CORAL's data model requires that data generators extensively document the context for all data, and our tools maintain that context throughout the entire analysis pipeline. CORAL also features a web interface for data generators to upload and explore data, as well as a Jupyter notebook interface for data analysts, both backed by a common API. CONCLUSIONS: CORAL enables organizations to build FAIR data types on the fly as they are needed, avoiding the expense of bespoke data modeling. CORAL provides a uniquely powerful platform to enable integrative cross-dataset analyses, generating deeper insights than are possible using traditional analysis tools.
format Online
Article
Text
id pubmed-9575582
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-95755822022-10-19 CORAL: A framework for rigorous self-validated data modeling and integrative, reproducible data analysis Novichkov, Pavel S Chandonia, John-Marc Arkin, Adam P Gigascience Technical Note BACKGROUND: Many organizations face challenges in managing and analyzing data, especially when relevant datasets arise from multiple sources and methods. Analyzing heterogeneous datasets and additional derived data requires rigorous tracking of their interrelationships and provenance. This task has long been a Grand Challenge of data science and has more recently been formalized in the FAIR principles: that all data objects be Findable, Accessible, Interoperable, and Reusable, both for machines and for people. Adherence to these principles is necessary for proper stewardship of information, for testing regulatory compliance, for measuring the efficiency of processes, and for facilitating reuse of data-analytical frameworks. FINDINGS: We present the Contextual Ontology-based Repository Analysis Library (CORAL), a platform that greatly facilitates adherence to all 4 of the FAIR principles, including the especially difficult challenge of making heterogeneous datasets Interoperable and Reusable across all parts of a large, long-lasting organization. To achieve this, CORAL's data model requires that data generators extensively document the context for all data, and our tools maintain that context throughout the entire analysis pipeline. CORAL also features a web interface for data generators to upload and explore data, as well as a Jupyter notebook interface for data analysts, both backed by a common API. CONCLUSIONS: CORAL enables organizations to build FAIR data types on the fly as they are needed, avoiding the expense of bespoke data modeling. CORAL provides a uniquely powerful platform to enable integrative cross-dataset analyses, generating deeper insights than are possible using traditional analysis tools. Oxford University Press 2022-10-17 /pmc/articles/PMC9575582/ /pubmed/36251274 http://dx.doi.org/10.1093/gigascience/giac089 Text en © The Author(s) 2022. Published by Oxford University Press GigaScience. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Novichkov, Pavel S
Chandonia, John-Marc
Arkin, Adam P
CORAL: A framework for rigorous self-validated data modeling and integrative, reproducible data analysis
title CORAL: A framework for rigorous self-validated data modeling and integrative, reproducible data analysis
title_full CORAL: A framework for rigorous self-validated data modeling and integrative, reproducible data analysis
title_fullStr CORAL: A framework for rigorous self-validated data modeling and integrative, reproducible data analysis
title_full_unstemmed CORAL: A framework for rigorous self-validated data modeling and integrative, reproducible data analysis
title_short CORAL: A framework for rigorous self-validated data modeling and integrative, reproducible data analysis
title_sort coral: a framework for rigorous self-validated data modeling and integrative, reproducible data analysis
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9575582/
https://www.ncbi.nlm.nih.gov/pubmed/36251274
http://dx.doi.org/10.1093/gigascience/giac089
work_keys_str_mv AT novichkovpavels coralaframeworkforrigorousselfvalidateddatamodelingandintegrativereproducibledataanalysis
AT chandoniajohnmarc coralaframeworkforrigorousselfvalidateddatamodelingandintegrativereproducibledataanalysis
AT arkinadamp coralaframeworkforrigorousselfvalidateddatamodelingandintegrativereproducibledataanalysis