Cargando…

File-based localization of numerical perturbations in data analysis pipelines

BACKGROUND: Data analysis pipelines are known to be affected by computational conditions, presumably owing to the creation and propagation of numerical errors. While this process could play a major role in the current reproducibility crisis, the precise causes of such instabilities and the path alon...

Descripción completa

Detalles Bibliográficos
Autores principales: Salari, Ali, Kiar, Gregory, Lewis, Lindsay, Evans, Alan C, Glatard, Tristan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7710495/
https://www.ncbi.nlm.nih.gov/pubmed/33269388
http://dx.doi.org/10.1093/gigascience/giaa106
Descripción
Sumario:BACKGROUND: Data analysis pipelines are known to be affected by computational conditions, presumably owing to the creation and propagation of numerical errors. While this process could play a major role in the current reproducibility crisis, the precise causes of such instabilities and the path along which they propagate in pipelines are unclear. METHOD: We present Spot, a tool to identify which processes in a pipeline create numerical differences when executed in different computational conditions. Spot leverages system-call interception through ReproZip to reconstruct and compare provenance graphs without pipeline instrumentation. RESULTS: By applying Spot to the structural pre-processing pipelines of the Human Connectome Project, we found that linear and non-linear registration are the cause of most numerical instabilities in these pipelines, which confirms previous findings.