Cargando…
FAIRly big: A framework for computationally reproducible processing of large-scale data
Large-scale datasets present unique opportunities to perform scientific investigations with unprecedented breadth. However, they also pose considerable challenges for the findability, accessibility, interoperability, and reusability (FAIR) of research outcomes due to infrastructure limitations, data...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8917149/ https://www.ncbi.nlm.nih.gov/pubmed/35277501 http://dx.doi.org/10.1038/s41597-022-01163-2 |
_version_ | 1784668478112792576 |
---|---|
author | Wagner, Adina S. Waite, Laura K. Wierzba, Małgorzata Hoffstaedter, Felix Waite, Alexander Q. Poldrack, Benjamin Eickhoff, Simon B. Hanke, Michael |
author_facet | Wagner, Adina S. Waite, Laura K. Wierzba, Małgorzata Hoffstaedter, Felix Waite, Alexander Q. Poldrack, Benjamin Eickhoff, Simon B. Hanke, Michael |
author_sort | Wagner, Adina S. |
collection | PubMed |
description | Large-scale datasets present unique opportunities to perform scientific investigations with unprecedented breadth. However, they also pose considerable challenges for the findability, accessibility, interoperability, and reusability (FAIR) of research outcomes due to infrastructure limitations, data usage constraints, or software license restrictions. Here we introduce a DataLad-based, domain-agnostic framework suitable for reproducible data processing in compliance with open science mandates. The framework attempts to minimize platform idiosyncrasies and performance-related complexities. It affords the capture of machine-actionable computational provenance records that can be used to retrace and verify the origins of research outcomes, as well as be re-executed independent of the original computing infrastructure. We demonstrate the framework’s performance using two showcases: one highlighting data sharing and transparency (using the studyforrest.org dataset) and another highlighting scalability (using the largest public brain imaging dataset available: the UK Biobank dataset). |
format | Online Article Text |
id | pubmed-8917149 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-89171492022-03-28 FAIRly big: A framework for computationally reproducible processing of large-scale data Wagner, Adina S. Waite, Laura K. Wierzba, Małgorzata Hoffstaedter, Felix Waite, Alexander Q. Poldrack, Benjamin Eickhoff, Simon B. Hanke, Michael Sci Data Article Large-scale datasets present unique opportunities to perform scientific investigations with unprecedented breadth. However, they also pose considerable challenges for the findability, accessibility, interoperability, and reusability (FAIR) of research outcomes due to infrastructure limitations, data usage constraints, or software license restrictions. Here we introduce a DataLad-based, domain-agnostic framework suitable for reproducible data processing in compliance with open science mandates. The framework attempts to minimize platform idiosyncrasies and performance-related complexities. It affords the capture of machine-actionable computational provenance records that can be used to retrace and verify the origins of research outcomes, as well as be re-executed independent of the original computing infrastructure. We demonstrate the framework’s performance using two showcases: one highlighting data sharing and transparency (using the studyforrest.org dataset) and another highlighting scalability (using the largest public brain imaging dataset available: the UK Biobank dataset). Nature Publishing Group UK 2022-03-11 /pmc/articles/PMC8917149/ /pubmed/35277501 http://dx.doi.org/10.1038/s41597-022-01163-2 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Wagner, Adina S. Waite, Laura K. Wierzba, Małgorzata Hoffstaedter, Felix Waite, Alexander Q. Poldrack, Benjamin Eickhoff, Simon B. Hanke, Michael FAIRly big: A framework for computationally reproducible processing of large-scale data |
title | FAIRly big: A framework for computationally reproducible processing of large-scale data |
title_full | FAIRly big: A framework for computationally reproducible processing of large-scale data |
title_fullStr | FAIRly big: A framework for computationally reproducible processing of large-scale data |
title_full_unstemmed | FAIRly big: A framework for computationally reproducible processing of large-scale data |
title_short | FAIRly big: A framework for computationally reproducible processing of large-scale data |
title_sort | fairly big: a framework for computationally reproducible processing of large-scale data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8917149/ https://www.ncbi.nlm.nih.gov/pubmed/35277501 http://dx.doi.org/10.1038/s41597-022-01163-2 |
work_keys_str_mv | AT wagneradinas fairlybigaframeworkforcomputationallyreproducibleprocessingoflargescaledata AT waitelaurak fairlybigaframeworkforcomputationallyreproducibleprocessingoflargescaledata AT wierzbamałgorzata fairlybigaframeworkforcomputationallyreproducibleprocessingoflargescaledata AT hoffstaedterfelix fairlybigaframeworkforcomputationallyreproducibleprocessingoflargescaledata AT waitealexanderq fairlybigaframeworkforcomputationallyreproducibleprocessingoflargescaledata AT poldrackbenjamin fairlybigaframeworkforcomputationallyreproducibleprocessingoflargescaledata AT eickhoffsimonb fairlybigaframeworkforcomputationallyreproducibleprocessingoflargescaledata AT hankemichael fairlybigaframeworkforcomputationallyreproducibleprocessingoflargescaledata |