Cargando…
FAIR data pipeline: provenance-driven data management for traceable scientific workflows
Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecis...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
The Royal Society
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9376726/ https://www.ncbi.nlm.nih.gov/pubmed/35965468 http://dx.doi.org/10.1098/rsta.2021.0300 |
_version_ | 1784768195551297536 |
---|---|
author | Mitchell, Sonia Natalie Lahiff, Andrew Cummings, Nathan Hollocombe, Jonathan Boskamp, Bram Field, Ryan Reddyhoff, Dennis Zarebski, Kristian Wilson, Antony Viola, Bruno Burke, Martin Archibald, Blair Bessell, Paul Blackwell, Richard Boden, Lisa A. Brett, Alys Brett, Sam Dundas, Ruth Enright, Jessica Gonzalez-Beltran, Alejandra N. Harris, Claire Hinder, Ian David Hughes, Christopher Knight, Martin Mano, Vino McMonagle, Ciaran Mellor, Dominic Mohr, Sibylle Marion, Glenn Matthews, Louise McKendrick, Iain J. Mark Pooley, Christopher Porphyre, Thibaud Reeves, Aaron Townsend, Edward Turner, Robert Walton, Jeremy Reeve, Richard |
author_facet | Mitchell, Sonia Natalie Lahiff, Andrew Cummings, Nathan Hollocombe, Jonathan Boskamp, Bram Field, Ryan Reddyhoff, Dennis Zarebski, Kristian Wilson, Antony Viola, Bruno Burke, Martin Archibald, Blair Bessell, Paul Blackwell, Richard Boden, Lisa A. Brett, Alys Brett, Sam Dundas, Ruth Enright, Jessica Gonzalez-Beltran, Alejandra N. Harris, Claire Hinder, Ian David Hughes, Christopher Knight, Martin Mano, Vino McMonagle, Ciaran Mellor, Dominic Mohr, Sibylle Marion, Glenn Matthews, Louise McKendrick, Iain J. Mark Pooley, Christopher Porphyre, Thibaud Reeves, Aaron Townsend, Edward Turner, Robert Walton, Jeremy Reeve, Richard |
author_sort | Mitchell, Sonia Natalie |
collection | PubMed |
description | Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of ‘following the science’ are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline. Although developed during the COVID-19 pandemic, it allows easy annotation of any data as they are consumed by analyses, or conversely traces the provenance of scientific outputs back through the analytical or modelling source code to primary data. Such a tool provides a mechanism for the public, and fellow scientists, to better assess scientific evidence by inspecting its provenance, while allowing scientists to support policymakers in openly justifying their decisions. We believe that such tools should be promoted for use across all areas of policy-facing research. This article is part of the theme issue ‘Technical challenges of modelling real-life epidemics and examples of overcoming these’. |
format | Online Article Text |
id | pubmed-9376726 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | The Royal Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-93767262022-08-22 FAIR data pipeline: provenance-driven data management for traceable scientific workflows Mitchell, Sonia Natalie Lahiff, Andrew Cummings, Nathan Hollocombe, Jonathan Boskamp, Bram Field, Ryan Reddyhoff, Dennis Zarebski, Kristian Wilson, Antony Viola, Bruno Burke, Martin Archibald, Blair Bessell, Paul Blackwell, Richard Boden, Lisa A. Brett, Alys Brett, Sam Dundas, Ruth Enright, Jessica Gonzalez-Beltran, Alejandra N. Harris, Claire Hinder, Ian David Hughes, Christopher Knight, Martin Mano, Vino McMonagle, Ciaran Mellor, Dominic Mohr, Sibylle Marion, Glenn Matthews, Louise McKendrick, Iain J. Mark Pooley, Christopher Porphyre, Thibaud Reeves, Aaron Townsend, Edward Turner, Robert Walton, Jeremy Reeve, Richard Philos Trans A Math Phys Eng Sci Articles Modern epidemiological analyses to understand and combat the spread of disease depend critically on access to, and use of, data. Rapidly evolving data, such as data streams changing during a disease outbreak, are particularly challenging. Data management is further complicated by data being imprecisely identified when used. Public trust in policy decisions resulting from such analyses is easily damaged and is often low, with cynicism arising where claims of ‘following the science’ are made without accompanying evidence. Tracing the provenance of such decisions back through open software to primary data would clarify this evidence, enhancing the transparency of the decision-making process. Here, we demonstrate a Findable, Accessible, Interoperable and Reusable (FAIR) data pipeline. Although developed during the COVID-19 pandemic, it allows easy annotation of any data as they are consumed by analyses, or conversely traces the provenance of scientific outputs back through the analytical or modelling source code to primary data. Such a tool provides a mechanism for the public, and fellow scientists, to better assess scientific evidence by inspecting its provenance, while allowing scientists to support policymakers in openly justifying their decisions. We believe that such tools should be promoted for use across all areas of policy-facing research. This article is part of the theme issue ‘Technical challenges of modelling real-life epidemics and examples of overcoming these’. The Royal Society 2022-10-03 2022-08-15 /pmc/articles/PMC9376726/ /pubmed/35965468 http://dx.doi.org/10.1098/rsta.2021.0300 Text en © 2022 The Authors. https://creativecommons.org/licenses/by/4.0/Published by the Royal Society under the terms of the Creative Commons Attribution License http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, provided the original author and source are credited. |
spellingShingle | Articles Mitchell, Sonia Natalie Lahiff, Andrew Cummings, Nathan Hollocombe, Jonathan Boskamp, Bram Field, Ryan Reddyhoff, Dennis Zarebski, Kristian Wilson, Antony Viola, Bruno Burke, Martin Archibald, Blair Bessell, Paul Blackwell, Richard Boden, Lisa A. Brett, Alys Brett, Sam Dundas, Ruth Enright, Jessica Gonzalez-Beltran, Alejandra N. Harris, Claire Hinder, Ian David Hughes, Christopher Knight, Martin Mano, Vino McMonagle, Ciaran Mellor, Dominic Mohr, Sibylle Marion, Glenn Matthews, Louise McKendrick, Iain J. Mark Pooley, Christopher Porphyre, Thibaud Reeves, Aaron Townsend, Edward Turner, Robert Walton, Jeremy Reeve, Richard FAIR data pipeline: provenance-driven data management for traceable scientific workflows |
title | FAIR data pipeline: provenance-driven data management for traceable scientific workflows |
title_full | FAIR data pipeline: provenance-driven data management for traceable scientific workflows |
title_fullStr | FAIR data pipeline: provenance-driven data management for traceable scientific workflows |
title_full_unstemmed | FAIR data pipeline: provenance-driven data management for traceable scientific workflows |
title_short | FAIR data pipeline: provenance-driven data management for traceable scientific workflows |
title_sort | fair data pipeline: provenance-driven data management for traceable scientific workflows |
topic | Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9376726/ https://www.ncbi.nlm.nih.gov/pubmed/35965468 http://dx.doi.org/10.1098/rsta.2021.0300 |
work_keys_str_mv | AT mitchellsonianatalie fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT lahiffandrew fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT cummingsnathan fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT hollocombejonathan fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT boskampbram fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT fieldryan fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT reddyhoffdennis fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT zarebskikristian fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT wilsonantony fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT violabruno fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT burkemartin fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT archibaldblair fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT bessellpaul fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT blackwellrichard fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT bodenlisaa fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT brettalys fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT brettsam fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT dundasruth fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT enrightjessica fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT gonzalezbeltranalejandran fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT harrisclaire fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT hinderian fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT davidhugheschristopher fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT knightmartin fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT manovino fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT mcmonagleciaran fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT mellordominic fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT mohrsibylle fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT marionglenn fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT matthewslouise fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT mckendrickiainj fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT markpooleychristopher fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT porphyrethibaud fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT reevesaaron fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT townsendedward fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT turnerrobert fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT waltonjeremy fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows AT reeverichard fairdatapipelineprovenancedrivendatamanagementfortraceablescientificworkflows |