Cargando…
Reproducible big data science: A case study in continuous FAIRness
Big biomedical data create exciting opportunities for discovery, but make it difficult to capture analyses and outputs in forms that are findable, accessible, interoperable, and reusable (FAIR). In response, we describe tools that make it easy to capture, and assign identifiers to, data and code thr...
Autores principales: | , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6459504/ https://www.ncbi.nlm.nih.gov/pubmed/30973881 http://dx.doi.org/10.1371/journal.pone.0213013 |
_version_ | 1783410190626848768 |
---|---|
author | Madduri, Ravi Chard, Kyle D’Arcy, Mike Jung, Segun C. Rodriguez, Alexis Sulakhe, Dinanath Deutsch, Eric Funk, Cory Heavner, Ben Richards, Matthew Shannon, Paul Glusman, Gustavo Price, Nathan Kesselman, Carl Foster, Ian |
author_facet | Madduri, Ravi Chard, Kyle D’Arcy, Mike Jung, Segun C. Rodriguez, Alexis Sulakhe, Dinanath Deutsch, Eric Funk, Cory Heavner, Ben Richards, Matthew Shannon, Paul Glusman, Gustavo Price, Nathan Kesselman, Carl Foster, Ian |
author_sort | Madduri, Ravi |
collection | PubMed |
description | Big biomedical data create exciting opportunities for discovery, but make it difficult to capture analyses and outputs in forms that are findable, accessible, interoperable, and reusable (FAIR). In response, we describe tools that make it easy to capture, and assign identifiers to, data and code throughout the data lifecycle. We illustrate the use of these tools via a case study involving a multi-step analysis that creates an atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data. We show how the tools automate routine but complex tasks, capture analysis algorithms in understandable and reusable forms, and harness fast networks and powerful cloud computers to process data rapidly, all without sacrificing usability or reproducibility—thus ensuring that big data are not hard-to-(re)use data. We evaluate our approach via a user study, and show that 91% of participants were able to replicate a complex analysis involving considerable data volumes. |
format | Online Article Text |
id | pubmed-6459504 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-64595042019-05-03 Reproducible big data science: A case study in continuous FAIRness Madduri, Ravi Chard, Kyle D’Arcy, Mike Jung, Segun C. Rodriguez, Alexis Sulakhe, Dinanath Deutsch, Eric Funk, Cory Heavner, Ben Richards, Matthew Shannon, Paul Glusman, Gustavo Price, Nathan Kesselman, Carl Foster, Ian PLoS One Research Article Big biomedical data create exciting opportunities for discovery, but make it difficult to capture analyses and outputs in forms that are findable, accessible, interoperable, and reusable (FAIR). In response, we describe tools that make it easy to capture, and assign identifiers to, data and code throughout the data lifecycle. We illustrate the use of these tools via a case study involving a multi-step analysis that creates an atlas of putative transcription factor binding sites from terabytes of ENCODE DNase I hypersensitive sites sequencing data. We show how the tools automate routine but complex tasks, capture analysis algorithms in understandable and reusable forms, and harness fast networks and powerful cloud computers to process data rapidly, all without sacrificing usability or reproducibility—thus ensuring that big data are not hard-to-(re)use data. We evaluate our approach via a user study, and show that 91% of participants were able to replicate a complex analysis involving considerable data volumes. Public Library of Science 2019-04-11 /pmc/articles/PMC6459504/ /pubmed/30973881 http://dx.doi.org/10.1371/journal.pone.0213013 Text en © 2019 Madduri et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Madduri, Ravi Chard, Kyle D’Arcy, Mike Jung, Segun C. Rodriguez, Alexis Sulakhe, Dinanath Deutsch, Eric Funk, Cory Heavner, Ben Richards, Matthew Shannon, Paul Glusman, Gustavo Price, Nathan Kesselman, Carl Foster, Ian Reproducible big data science: A case study in continuous FAIRness |
title | Reproducible big data science: A case study in continuous FAIRness |
title_full | Reproducible big data science: A case study in continuous FAIRness |
title_fullStr | Reproducible big data science: A case study in continuous FAIRness |
title_full_unstemmed | Reproducible big data science: A case study in continuous FAIRness |
title_short | Reproducible big data science: A case study in continuous FAIRness |
title_sort | reproducible big data science: a case study in continuous fairness |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6459504/ https://www.ncbi.nlm.nih.gov/pubmed/30973881 http://dx.doi.org/10.1371/journal.pone.0213013 |
work_keys_str_mv | AT madduriravi reproduciblebigdatascienceacasestudyincontinuousfairness AT chardkyle reproduciblebigdatascienceacasestudyincontinuousfairness AT darcymike reproduciblebigdatascienceacasestudyincontinuousfairness AT jungsegunc reproduciblebigdatascienceacasestudyincontinuousfairness AT rodriguezalexis reproduciblebigdatascienceacasestudyincontinuousfairness AT sulakhedinanath reproduciblebigdatascienceacasestudyincontinuousfairness AT deutscheric reproduciblebigdatascienceacasestudyincontinuousfairness AT funkcory reproduciblebigdatascienceacasestudyincontinuousfairness AT heavnerben reproduciblebigdatascienceacasestudyincontinuousfairness AT richardsmatthew reproduciblebigdatascienceacasestudyincontinuousfairness AT shannonpaul reproduciblebigdatascienceacasestudyincontinuousfairness AT glusmangustavo reproduciblebigdatascienceacasestudyincontinuousfairness AT pricenathan reproduciblebigdatascienceacasestudyincontinuousfairness AT kesselmancarl reproduciblebigdatascienceacasestudyincontinuousfairness AT fosterian reproduciblebigdatascienceacasestudyincontinuousfairness |