Cargando…

Semantic workflows for benchmark challenges: Enhancing comparability, reusability and reproducibility

Benchmark challenges, such as the Critical Assessment of Structure Prediction (CASP) and Dialogue for Reverse Engineering Assessments and Methods (DREAM) have been instrumental in driving the development of bioinformatics methods. Typically, challenges are posted, and then competitors perform a pred...

Descripción completa

Detalles Bibliográficos
Autores principales:	Srivastava, Arunima, Adusumilli, Ravali, Boyce, Hunter, Garijo, Daniel, Ratnakar, Varun, Mayani, Rajiv, Yu, Thomas, Machiraju, Raghu, Gil, Yolanda, Mallick, Parag
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2019
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6417805/ https://www.ncbi.nlm.nih.gov/pubmed/30864323

_version_	1783403626316693504
author	Srivastava, Arunima Adusumilli, Ravali Boyce, Hunter Garijo, Daniel Ratnakar, Varun Mayani, Rajiv Yu, Thomas Machiraju, Raghu Gil, Yolanda Mallick, Parag
author_facet	Srivastava, Arunima Adusumilli, Ravali Boyce, Hunter Garijo, Daniel Ratnakar, Varun Mayani, Rajiv Yu, Thomas Machiraju, Raghu Gil, Yolanda Mallick, Parag
author_sort	Srivastava, Arunima
collection	PubMed
description	Benchmark challenges, such as the Critical Assessment of Structure Prediction (CASP) and Dialogue for Reverse Engineering Assessments and Methods (DREAM) have been instrumental in driving the development of bioinformatics methods. Typically, challenges are posted, and then competitors perform a prediction based upon blinded test data. Challengers then submit their answers to a central server where they are scored. Recent efforts to automate these challenges have been enabled by systems in which challengers submit Docker containers, a unit of software that packages up code and all of its dependencies, to be run on the cloud. Despite their incredible value for providing an unbiased test-bed for the bioinformatics community, there remain opportunities to further enhance the potential impact of benchmark challenges. Specifically, current approaches only evaluate end-to-end performance; it is nearly impossible to directly compare methodologies or parameters. Furthermore, the scientific community cannot easily reuse challengers’ approaches, due to lack of specifics, ambiguity in tools and parameters as well as problems in sharing and maintenance. Lastly, the intuition behind why particular steps are used is not captured, as the proposed workflows are not explicitly defined, making it cumbersome to understand the flow and utilization of data. Here we introduce an approach to overcome these limitations based upon the WINGS semantic workflow system. Specifically, WINGS enables researchers to submit complete semantic workflows as challenge submissions. By submitting entries as workflows, it then becomes possible to compare not just the results and performance of a challenger, but also the methodology employed. This is particularly important when dozens of challenge entries may use nearly identical tools, but with only subtle changes in parameters (and radical differences in results). WINGS uses a component driven workflow design and offers intelligent parameter and data selection by reasoning about data characteristics. This proves to be especially critical in bioinformatics workflows where using default or incorrect parameter values is prone to drastically altering results. Different challenge entries may be readily compared through the use of abstract workflows, which also facilitate reuse. WINGS is housed on a cloud based setup, which stores data, dependencies and workflows for easy sharing and utility. It also has the ability to scale workflow executions using distributed computing through the Pegasus workflow execution system. We demonstrate the application of this architecture to the DREAM proteogenomic challenge.
format	Online Article Text
id	pubmed-6417805
institution	National Center for Biotechnology Information
language	English
publishDate	2019
record_format	MEDLINE/PubMed
spelling	pubmed-64178052019-03-14 Semantic workflows for benchmark challenges: Enhancing comparability, reusability and reproducibility Srivastava, Arunima Adusumilli, Ravali Boyce, Hunter Garijo, Daniel Ratnakar, Varun Mayani, Rajiv Yu, Thomas Machiraju, Raghu Gil, Yolanda Mallick, Parag Pac Symp Biocomput Article Benchmark challenges, such as the Critical Assessment of Structure Prediction (CASP) and Dialogue for Reverse Engineering Assessments and Methods (DREAM) have been instrumental in driving the development of bioinformatics methods. Typically, challenges are posted, and then competitors perform a prediction based upon blinded test data. Challengers then submit their answers to a central server where they are scored. Recent efforts to automate these challenges have been enabled by systems in which challengers submit Docker containers, a unit of software that packages up code and all of its dependencies, to be run on the cloud. Despite their incredible value for providing an unbiased test-bed for the bioinformatics community, there remain opportunities to further enhance the potential impact of benchmark challenges. Specifically, current approaches only evaluate end-to-end performance; it is nearly impossible to directly compare methodologies or parameters. Furthermore, the scientific community cannot easily reuse challengers’ approaches, due to lack of specifics, ambiguity in tools and parameters as well as problems in sharing and maintenance. Lastly, the intuition behind why particular steps are used is not captured, as the proposed workflows are not explicitly defined, making it cumbersome to understand the flow and utilization of data. Here we introduce an approach to overcome these limitations based upon the WINGS semantic workflow system. Specifically, WINGS enables researchers to submit complete semantic workflows as challenge submissions. By submitting entries as workflows, it then becomes possible to compare not just the results and performance of a challenger, but also the methodology employed. This is particularly important when dozens of challenge entries may use nearly identical tools, but with only subtle changes in parameters (and radical differences in results). WINGS uses a component driven workflow design and offers intelligent parameter and data selection by reasoning about data characteristics. This proves to be especially critical in bioinformatics workflows where using default or incorrect parameter values is prone to drastically altering results. Different challenge entries may be readily compared through the use of abstract workflows, which also facilitate reuse. WINGS is housed on a cloud based setup, which stores data, dependencies and workflows for easy sharing and utility. It also has the ability to scale workflow executions using distributed computing through the Pegasus workflow execution system. We demonstrate the application of this architecture to the DREAM proteogenomic challenge. 2019 /pmc/articles/PMC6417805/ /pubmed/30864323 Text en Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC) 4.0 License (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Article Srivastava, Arunima Adusumilli, Ravali Boyce, Hunter Garijo, Daniel Ratnakar, Varun Mayani, Rajiv Yu, Thomas Machiraju, Raghu Gil, Yolanda Mallick, Parag Semantic workflows for benchmark challenges: Enhancing comparability, reusability and reproducibility
title	Semantic workflows for benchmark challenges: Enhancing comparability, reusability and reproducibility
title_full	Semantic workflows for benchmark challenges: Enhancing comparability, reusability and reproducibility
title_fullStr	Semantic workflows for benchmark challenges: Enhancing comparability, reusability and reproducibility
title_full_unstemmed	Semantic workflows for benchmark challenges: Enhancing comparability, reusability and reproducibility
title_short	Semantic workflows for benchmark challenges: Enhancing comparability, reusability and reproducibility
title_sort	semantic workflows for benchmark challenges: enhancing comparability, reusability and reproducibility
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6417805/ https://www.ncbi.nlm.nih.gov/pubmed/30864323
work_keys_str_mv	AT srivastavaarunima semanticworkflowsforbenchmarkchallengesenhancingcomparabilityreusabilityandreproducibility AT adusumilliravali semanticworkflowsforbenchmarkchallengesenhancingcomparabilityreusabilityandreproducibility AT boycehunter semanticworkflowsforbenchmarkchallengesenhancingcomparabilityreusabilityandreproducibility AT garijodaniel semanticworkflowsforbenchmarkchallengesenhancingcomparabilityreusabilityandreproducibility AT ratnakarvarun semanticworkflowsforbenchmarkchallengesenhancingcomparabilityreusabilityandreproducibility AT mayanirajiv semanticworkflowsforbenchmarkchallengesenhancingcomparabilityreusabilityandreproducibility AT yuthomas semanticworkflowsforbenchmarkchallengesenhancingcomparabilityreusabilityandreproducibility AT machirajuraghu semanticworkflowsforbenchmarkchallengesenhancingcomparabilityreusabilityandreproducibility AT gilyolanda semanticworkflowsforbenchmarkchallengesenhancingcomparabilityreusabilityandreproducibility AT mallickparag semanticworkflowsforbenchmarkchallengesenhancingcomparabilityreusabilityandreproducibility

Semantic workflows for benchmark challenges: Enhancing comparability, reusability and reproducibility

Ejemplares similares