Cargando…

Embedding Analytics within the Curation of Scientific Workflows

This paper reports on the ongoing activities and curation practices of the National Center for Biomolecular NMR Data Processing and Analysis(). Over the past several years, the Center has been developing and extending computational workflow management software for use by a community of biomolecular...

Descripción completa

Detalles Bibliográficos
Autores principales: Weatherby, Gerard, Gryk, Michael R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7990377/
https://www.ncbi.nlm.nih.gov/pubmed/33767737
http://dx.doi.org/10.2218/ijdc.v15i1.709
Descripción
Sumario:This paper reports on the ongoing activities and curation practices of the National Center for Biomolecular NMR Data Processing and Analysis(). Over the past several years, the Center has been developing and extending computational workflow management software for use by a community of biomolecular NMR spectroscopists. Previous work had been to refactor the workflow system to utilize the PREMIS framework for reporting retrospective provenance as well as for sharing workflows between scientists and to support data reuse. In this paper, we report on our recent efforts to embed analytics within the workflow execution and within provenance tracking. Important metrics for each of the intermediate datasets are included within the corresponding PREMIS intellectual object, which allows for both inspection of the operation of individual actors as well as visualization of the changes throughout a full processing workflow. These metrics can be viewed within the workflow management system or through standalone metadata widgets. Our approach is to support a hybrid approach of both automated, workflow execution as well as manual intervention and metadata management. In this combination, the workflow system and metadata widgets encourage the domain experts to be avid curators of the data which they create, fostering both computational reproducibility and scientific data reuse.