Cargando…

FAIRly big: A framework for computationally reproducible processing of large-scale data

Large-scale datasets present unique opportunities to perform scientific investigations with unprecedented breadth. However, they also pose considerable challenges for the findability, accessibility, interoperability, and reusability (FAIR) of research outcomes due to infrastructure limitations, data...

Descripción completa

Detalles Bibliográficos
Autores principales: Wagner, Adina S., Waite, Laura K., Wierzba, Małgorzata, Hoffstaedter, Felix, Waite, Alexander Q., Poldrack, Benjamin, Eickhoff, Simon B., Hanke, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8917149/
https://www.ncbi.nlm.nih.gov/pubmed/35277501
http://dx.doi.org/10.1038/s41597-022-01163-2
_version_ 1784668478112792576
author Wagner, Adina S.
Waite, Laura K.
Wierzba, Małgorzata
Hoffstaedter, Felix
Waite, Alexander Q.
Poldrack, Benjamin
Eickhoff, Simon B.
Hanke, Michael
author_facet Wagner, Adina S.
Waite, Laura K.
Wierzba, Małgorzata
Hoffstaedter, Felix
Waite, Alexander Q.
Poldrack, Benjamin
Eickhoff, Simon B.
Hanke, Michael
author_sort Wagner, Adina S.
collection PubMed
description Large-scale datasets present unique opportunities to perform scientific investigations with unprecedented breadth. However, they also pose considerable challenges for the findability, accessibility, interoperability, and reusability (FAIR) of research outcomes due to infrastructure limitations, data usage constraints, or software license restrictions. Here we introduce a DataLad-based, domain-agnostic framework suitable for reproducible data processing in compliance with open science mandates. The framework attempts to minimize platform idiosyncrasies and performance-related complexities. It affords the capture of machine-actionable computational provenance records that can be used to retrace and verify the origins of research outcomes, as well as be re-executed independent of the original computing infrastructure. We demonstrate the framework’s performance using two showcases: one highlighting data sharing and transparency (using the studyforrest.org dataset) and another highlighting scalability (using the largest public brain imaging dataset available: the UK Biobank dataset).
format Online
Article
Text
id pubmed-8917149
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-89171492022-03-28 FAIRly big: A framework for computationally reproducible processing of large-scale data Wagner, Adina S. Waite, Laura K. Wierzba, Małgorzata Hoffstaedter, Felix Waite, Alexander Q. Poldrack, Benjamin Eickhoff, Simon B. Hanke, Michael Sci Data Article Large-scale datasets present unique opportunities to perform scientific investigations with unprecedented breadth. However, they also pose considerable challenges for the findability, accessibility, interoperability, and reusability (FAIR) of research outcomes due to infrastructure limitations, data usage constraints, or software license restrictions. Here we introduce a DataLad-based, domain-agnostic framework suitable for reproducible data processing in compliance with open science mandates. The framework attempts to minimize platform idiosyncrasies and performance-related complexities. It affords the capture of machine-actionable computational provenance records that can be used to retrace and verify the origins of research outcomes, as well as be re-executed independent of the original computing infrastructure. We demonstrate the framework’s performance using two showcases: one highlighting data sharing and transparency (using the studyforrest.org dataset) and another highlighting scalability (using the largest public brain imaging dataset available: the UK Biobank dataset). Nature Publishing Group UK 2022-03-11 /pmc/articles/PMC8917149/ /pubmed/35277501 http://dx.doi.org/10.1038/s41597-022-01163-2 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Wagner, Adina S.
Waite, Laura K.
Wierzba, Małgorzata
Hoffstaedter, Felix
Waite, Alexander Q.
Poldrack, Benjamin
Eickhoff, Simon B.
Hanke, Michael
FAIRly big: A framework for computationally reproducible processing of large-scale data
title FAIRly big: A framework for computationally reproducible processing of large-scale data
title_full FAIRly big: A framework for computationally reproducible processing of large-scale data
title_fullStr FAIRly big: A framework for computationally reproducible processing of large-scale data
title_full_unstemmed FAIRly big: A framework for computationally reproducible processing of large-scale data
title_short FAIRly big: A framework for computationally reproducible processing of large-scale data
title_sort fairly big: a framework for computationally reproducible processing of large-scale data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8917149/
https://www.ncbi.nlm.nih.gov/pubmed/35277501
http://dx.doi.org/10.1038/s41597-022-01163-2
work_keys_str_mv AT wagneradinas fairlybigaframeworkforcomputationallyreproducibleprocessingoflargescaledata
AT waitelaurak fairlybigaframeworkforcomputationallyreproducibleprocessingoflargescaledata
AT wierzbamałgorzata fairlybigaframeworkforcomputationallyreproducibleprocessingoflargescaledata
AT hoffstaedterfelix fairlybigaframeworkforcomputationallyreproducibleprocessingoflargescaledata
AT waitealexanderq fairlybigaframeworkforcomputationallyreproducibleprocessingoflargescaledata
AT poldrackbenjamin fairlybigaframeworkforcomputationallyreproducibleprocessingoflargescaledata
AT eickhoffsimonb fairlybigaframeworkforcomputationallyreproducibleprocessingoflargescaledata
AT hankemichael fairlybigaframeworkforcomputationallyreproducibleprocessingoflargescaledata