Cargando…
File-based localization of numerical perturbations in data analysis pipelines
BACKGROUND: Data analysis pipelines are known to be affected by computational conditions, presumably owing to the creation and propagation of numerical errors. While this process could play a major role in the current reproducibility crisis, the precise causes of such instabilities and the path alon...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7710495/ https://www.ncbi.nlm.nih.gov/pubmed/33269388 http://dx.doi.org/10.1093/gigascience/giaa106 |
_version_ | 1783617959659307008 |
---|---|
author | Salari, Ali Kiar, Gregory Lewis, Lindsay Evans, Alan C Glatard, Tristan |
author_facet | Salari, Ali Kiar, Gregory Lewis, Lindsay Evans, Alan C Glatard, Tristan |
author_sort | Salari, Ali |
collection | PubMed |
description | BACKGROUND: Data analysis pipelines are known to be affected by computational conditions, presumably owing to the creation and propagation of numerical errors. While this process could play a major role in the current reproducibility crisis, the precise causes of such instabilities and the path along which they propagate in pipelines are unclear. METHOD: We present Spot, a tool to identify which processes in a pipeline create numerical differences when executed in different computational conditions. Spot leverages system-call interception through ReproZip to reconstruct and compare provenance graphs without pipeline instrumentation. RESULTS: By applying Spot to the structural pre-processing pipelines of the Human Connectome Project, we found that linear and non-linear registration are the cause of most numerical instabilities in these pipelines, which confirms previous findings. |
format | Online Article Text |
id | pubmed-7710495 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-77104952020-12-09 File-based localization of numerical perturbations in data analysis pipelines Salari, Ali Kiar, Gregory Lewis, Lindsay Evans, Alan C Glatard, Tristan Gigascience Research BACKGROUND: Data analysis pipelines are known to be affected by computational conditions, presumably owing to the creation and propagation of numerical errors. While this process could play a major role in the current reproducibility crisis, the precise causes of such instabilities and the path along which they propagate in pipelines are unclear. METHOD: We present Spot, a tool to identify which processes in a pipeline create numerical differences when executed in different computational conditions. Spot leverages system-call interception through ReproZip to reconstruct and compare provenance graphs without pipeline instrumentation. RESULTS: By applying Spot to the structural pre-processing pipelines of the Human Connectome Project, we found that linear and non-linear registration are the cause of most numerical instabilities in these pipelines, which confirms previous findings. Oxford University Press 2020-12-02 /pmc/articles/PMC7710495/ /pubmed/33269388 http://dx.doi.org/10.1093/gigascience/giaa106 Text en © The Author(s) 2020. Published by Oxford University Press GigaScience. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Salari, Ali Kiar, Gregory Lewis, Lindsay Evans, Alan C Glatard, Tristan File-based localization of numerical perturbations in data analysis pipelines |
title | File-based localization of numerical perturbations in data analysis pipelines |
title_full | File-based localization of numerical perturbations in data analysis pipelines |
title_fullStr | File-based localization of numerical perturbations in data analysis pipelines |
title_full_unstemmed | File-based localization of numerical perturbations in data analysis pipelines |
title_short | File-based localization of numerical perturbations in data analysis pipelines |
title_sort | file-based localization of numerical perturbations in data analysis pipelines |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7710495/ https://www.ncbi.nlm.nih.gov/pubmed/33269388 http://dx.doi.org/10.1093/gigascience/giaa106 |
work_keys_str_mv | AT salariali filebasedlocalizationofnumericalperturbationsindataanalysispipelines AT kiargregory filebasedlocalizationofnumericalperturbationsindataanalysispipelines AT lewislindsay filebasedlocalizationofnumericalperturbationsindataanalysispipelines AT evansalanc filebasedlocalizationofnumericalperturbationsindataanalysispipelines AT glatardtristan filebasedlocalizationofnumericalperturbationsindataanalysispipelines |