Cargando…

Towards an Internet of Science

Big data and complex analysis workflows (pipelines) are common issues in data driven science such as bioinformatics. Large amounts of computational tools are available for data analysis. Additionally, many workflow management systems to piece together such tools into data analysis pipelines have bee...

Descripción completa

Detalles Bibliográficos
Autor principal: Allmer, Jens
Formato: Online Artículo Texto
Lenguaje:English
Publicado: De Gruyter 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6798852/
https://www.ncbi.nlm.nih.gov/pubmed/31145694
http://dx.doi.org/10.1515/jib-2019-0024
Descripción
Sumario:Big data and complex analysis workflows (pipelines) are common issues in data driven science such as bioinformatics. Large amounts of computational tools are available for data analysis. Additionally, many workflow management systems to piece together such tools into data analysis pipelines have been developed. For example, more than 50 computational tools for read mapping are available representing a large amount of duplicated effort. Furthermore, it is unclear whether these tools are correct and only a few have a user base large enough to have encountered and reported most of the potential problems. Bringing together many largely untested tools in a computational pipeline must lead to unpredictable results. Yet, this is the current state. While presently data analysis is performed on personal computers/workstations/clusters, the future will see development and analysis shift to the cloud. None of the workflow management systems is ready for this transition. This presents the opportunity to build a new system, which will overcome current duplications of effort, introduce proper testing, allow for development and analysis in public and private clouds, and include reporting features leading to interactive documents.