Cargando…

FAIR data pipeline: provenance-driven data management for traceable scientific workflows

Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecis...

Descripción completa

Detalles Bibliográficos
Autores principales: Mitchell, Sonia Natalie, Lahiff, Andrew, Cummings, Nathan, Hollocombe, Jonathan, Boskamp, Bram, Field, Ryan, Reddyhoff, Dennis, Zarebski, Kristian, Wilson, Antony, Viola, Bruno, Burke, Martin, Archibald, Blair, Bessell, Paul, Blackwell, Richard, Boden, Lisa A., Brett, Alys, Brett, Sam, Dundas, Ruth, Enright, Jessica, Gonzalez-Beltran, Alejandra N., Harris, Claire, Hinder, Ian, David Hughes, Christopher, Knight, Martin, Mano, Vino, McMonagle, Ciaran, Mellor, Dominic, Mohr, Sibylle, Marion, Glenn, Matthews, Louise, McKendrick, Iain J., Mark Pooley, Christopher, Porphyre, Thibaud, Reeves, Aaron, Townsend, Edward, Turner, Robert, Walton, Jeremy, Reeve, Richard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9376726/
https://www.ncbi.nlm.nih.gov/pubmed/35965468
http://dx.doi.org/10.1098/rsta.2021.0300
_version_ 1784768195551297536
author Mitchell, Sonia Natalie
Lahiff, Andrew
Cummings, Nathan
Hollocombe, Jonathan
Boskamp, Bram
Field, Ryan
Reddyhoff, Dennis
Zarebski, Kristian
Wilson, Antony
Viola, Bruno
Burke, Martin
Archibald, Blair
Bessell, Paul
Blackwell, Richard
Boden, Lisa A.
Brett, Alys
Brett, Sam
Dundas, Ruth
Enright, Jessica
Gonzalez-Beltran, Alejandra N.
Harris, Claire
Hinder, Ian
David Hughes, Christopher
Knight, Martin
Mano, Vino
McMonagle, Ciaran
Mellor, Dominic
Mohr, Sibylle
Marion, Glenn
Matthews, Louise
McKendrick, Iain J.
Mark Pooley, Christopher
Porphyre, Thibaud
Reeves, Aaron
Townsend, Edward
Turner, Robert
Walton, Jeremy
Reeve, Richard
author_facet Mitchell, Sonia Natalie
Lahiff, Andrew
Cummings, Nathan
Hollocombe, Jonathan
Boskamp, Bram
Field, Ryan
Reddyhoff, Dennis
Zarebski, Kristian
Wilson, Antony
Viola, Bruno
Burke, Martin
Archibald, Blair
Bessell, Paul
Blackwell, Richard
Boden, Lisa A.
Brett, Alys
Brett, Sam
Dundas, Ruth
Enright, Jessica
Gonzalez-Beltran, Alejandra N.
Harris, Claire
Hinder, Ian
David Hughes, Christopher
Knight, Martin
Mano, Vino
McMonagle, Ciaran
Mellor, Dominic
Mohr, Sibylle
Marion, Glenn
Matthews, Louise
McKendrick, Iain J.
Mark Pooley, Christopher
Porphyre, Thibaud
Reeves, Aaron
Townsend, Edward
Turner, Robert
Walton, Jeremy
Reeve, Richard
author_sort Mitchell, Sonia Natalie
collection PubMed
description Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of ‘following the science’ are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline. Although developed during the COVID-19 pandemic, it allows easy annotation of any data as they are consumed by analyses, or conversely traces the provenance of scientific outputs back through the analytical or modelling source code to primary data. Such a tool provides a mechanism for the public, and fellow scientists, to better assess scientific evidence by inspecting its provenance, while allowing scientists to support policymakers in openly justifying their decisions. We believe that such tools should be promoted for use across all areas of policy-facing research. This article is part of the theme issue ‘Technical challenges of modelling real-life epidemics and examples of overcoming these’.
format Online
Article
Text
id pubmed-9376726
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher The Royal Society
record_format MEDLINE/PubMed
spelling pubmed-93767262022-08-22 FAIR data pipeline: provenance-driven data management for traceable scientific workflows Mitchell, Sonia Natalie Lahiff, Andrew Cummings, Nathan Hollocombe, Jonathan Boskamp, Bram Field, Ryan Reddyhoff, Dennis Zarebski, Kristian Wilson, Antony Viola, Bruno Burke, Martin Archibald, Blair Bessell, Paul Blackwell, Richard Boden, Lisa A. Brett, Alys Brett, Sam Dundas, Ruth Enright, Jessica Gonzalez-Beltran, Alejandra N. Harris, Claire Hinder, Ian David Hughes, Christopher Knight, Martin Mano, Vino McMonagle, Ciaran Mellor, Dominic Mohr, Sibylle Marion, Glenn Matthews, Louise McKendrick, Iain J. Mark Pooley, Christopher Porphyre, Thibaud Reeves, Aaron Townsend, Edward Turner, Robert Walton, Jeremy Reeve, Richard Philos Trans A Math Phys Eng Sci Articles Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of ‘following the science’ are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline. Although developed during the COVID-19 pandemic, it allows easy annotation of any data as they are consumed by analyses, or conversely traces the provenance of scientific outputs back through the analytical or modelling source code to primary data. Such a tool provides a mechanism for the public, and fellow scientists, to better assess scientific evidence by inspecting its provenance, while allowing scientists to support policymakers in openly justifying their decisions. We believe that such tools should be promoted for use across all areas of policy-facing research. This article is part of the theme issue ‘Technical challenges of modelling real-life epidemics and examples of overcoming these’. The Royal Society 2022-10-03 2022-08-15 /pmc/articles/PMC9376726/ /pubmed/35965468 http://dx.doi.org/10.1098/rsta.2021.0300 Text en © 2022 The Authors. https://creativecommons.org/licenses/by/4.0/Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, provided the original author and source are credited.
spellingShingle Articles
Mitchell, Sonia Natalie
Lahiff, Andrew
Cummings, Nathan
Hollocombe, Jonathan
Boskamp, Bram
Field, Ryan
Reddyhoff, Dennis
Zarebski, Kristian
Wilson, Antony
Viola, Bruno
Burke, Martin
Archibald, Blair
Bessell, Paul
Blackwell, Richard
Boden, Lisa A.
Brett, Alys
Brett, Sam
Dundas, Ruth
Enright, Jessica
Gonzalez-Beltran, Alejandra N.
Harris, Claire
Hinder, Ian
David Hughes, Christopher
Knight, Martin
Mano, Vino
McMonagle, Ciaran
Mellor, Dominic
Mohr, Sibylle
Marion, Glenn
Matthews, Louise
McKendrick, Iain J.
Mark Pooley, Christopher
Porphyre, Thibaud
Reeves, Aaron
Townsend, Edward
Turner, Robert
Walton, Jeremy
Reeve, Richard
FAIR data pipeline: provenance-driven data management for traceable scientific workflows
title FAIR data pipeline: provenance-driven data management for traceable scientific workflows
title_full FAIR data pipeline: provenance-driven data management for traceable scientific workflows
title_fullStr FAIR data pipeline: provenance-driven data management for traceable scientific workflows
title_full_unstemmed FAIR data pipeline: provenance-driven data management for traceable scientific workflows
title_short FAIR data pipeline: provenance-driven data management for traceable scientific workflows
title_sort fair data pipeline: provenance-driven data management for traceable scientific workflows
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9376726/
https://www.ncbi.nlm.nih.gov/pubmed/35965468
http://dx.doi.org/10.1098/rsta.2021.0300
work_keys_str_mv AT mitchellsonianatalie fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT lahiffandrew fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT cummingsnathan fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT hollocombejonathan fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT boskampbram fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT fieldryan fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT reddyhoffdennis fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT zarebskikristian fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT wilsonantony fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT violabruno fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT burkemartin fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT archibaldblair fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT bessellpaul fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT blackwellrichard fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT bodenlisaa fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT brettalys fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT brettsam fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT dundasruth fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT enrightjessica fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT gonzalezbeltranalejandran fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT harrisclaire fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT hinderian fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT davidhugheschristopher fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT knightmartin fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT manovino fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT mcmonagleciaran fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT mellordominic fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT mohrsibylle fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT marionglenn fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT matthewslouise fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT mckendrickiainj fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT markpooleychristopher fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT porphyrethibaud fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT reevesaaron fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT townsendedward fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT turnerrobert fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT waltonjeremy fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows
AT reeverichard fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows