Cargando…

Improving data workflow systems with cloud services and use of open data for bioinformatics research

Data workflow systems (DWFSs) enable bioinformatics researchers to combine components for data access and data analytics, and to share the final data analytics approach with their collaborators. Increasingly, such systems have to cope with large-scale data, such as full genomes (about 200 GB each),...

Descripción completa

Detalles Bibliográficos
Autores principales: Karim, Md Rezaul, Michel, Audrey, Zappa, Achille, Baranov, Pavel, Sahay, Ratnesh, Rebholz-Schuhmann, Dietrich
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6169675/
https://www.ncbi.nlm.nih.gov/pubmed/28419324
http://dx.doi.org/10.1093/bib/bbx039
_version_ 1783360548435394560
author Karim, Md Rezaul
Michel, Audrey
Zappa, Achille
Baranov, Pavel
Sahay, Ratnesh
Rebholz-Schuhmann, Dietrich
author_facet Karim, Md Rezaul
Michel, Audrey
Zappa, Achille
Baranov, Pavel
Sahay, Ratnesh
Rebholz-Schuhmann, Dietrich
author_sort Karim, Md Rezaul
collection PubMed
description Data workflow systems (DWFSs) enable bioinformatics researchers to combine components for data access and data analytics, and to share the final data analytics approach with their collaborators. Increasingly, such systems have to cope with large-scale data, such as full genomes (about 200 GB each), public fact repositories (about 100 TB of data) and 3D imaging data at even larger scales. As moving the data becomes cumbersome, the DWFS needs to embed its processes into a cloud infrastructure, where the data are already hosted. As the standardized public data play an increasingly important role, the DWFS needs to comply with Semantic Web technologies. This advancement to DWFS would reduce overhead costs and accelerate the progress in bioinformatics research based on large-scale data and public resources, as researchers would require less specialized IT knowledge for the implementation. Furthermore, the high data growth rates in bioinformatics research drive the demand for parallel and distributed computing, which then imposes a need for scalability and high-throughput capabilities onto the DWFS. As a result, requirements for data sharing and access to public knowledge bases suggest that compliance of the DWFS with Semantic Web standards is necessary. In this article, we will analyze the existing DWFS with regard to their capabilities toward public open data use as well as large-scale computational and human interface requirements. We untangle the parameters for selecting a preferable solution for bioinformatics research with particular consideration to using cloud services and Semantic Web technologies. Our analysis leads to research guidelines and recommendations toward the development of future DWFS for the bioinformatics research community.
format Online
Article
Text
id pubmed-6169675
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-61696752018-10-10 Improving data workflow systems with cloud services and use of open data for bioinformatics research Karim, Md Rezaul Michel, Audrey Zappa, Achille Baranov, Pavel Sahay, Ratnesh Rebholz-Schuhmann, Dietrich Brief Bioinform Paper Data workflow systems (DWFSs) enable bioinformatics researchers to combine components for data access and data analytics, and to share the final data analytics approach with their collaborators. Increasingly, such systems have to cope with large-scale data, such as full genomes (about 200 GB each), public fact repositories (about 100 TB of data) and 3D imaging data at even larger scales. As moving the data becomes cumbersome, the DWFS needs to embed its processes into a cloud infrastructure, where the data are already hosted. As the standardized public data play an increasingly important role, the DWFS needs to comply with Semantic Web technologies. This advancement to DWFS would reduce overhead costs and accelerate the progress in bioinformatics research based on large-scale data and public resources, as researchers would require less specialized IT knowledge for the implementation. Furthermore, the high data growth rates in bioinformatics research drive the demand for parallel and distributed computing, which then imposes a need for scalability and high-throughput capabilities onto the DWFS. As a result, requirements for data sharing and access to public knowledge bases suggest that compliance of the DWFS with Semantic Web standards is necessary. In this article, we will analyze the existing DWFS with regard to their capabilities toward public open data use as well as large-scale computational and human interface requirements. We untangle the parameters for selecting a preferable solution for bioinformatics research with particular consideration to using cloud services and Semantic Web technologies. Our analysis leads to research guidelines and recommendations toward the development of future DWFS for the bioinformatics research community. Oxford University Press 2017-04-16 /pmc/articles/PMC6169675/ /pubmed/28419324 http://dx.doi.org/10.1093/bib/bbx039 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Paper
Karim, Md Rezaul
Michel, Audrey
Zappa, Achille
Baranov, Pavel
Sahay, Ratnesh
Rebholz-Schuhmann, Dietrich
Improving data workflow systems with cloud services and use of open data for bioinformatics research
title Improving data workflow systems with cloud services and use of open data for bioinformatics research
title_full Improving data workflow systems with cloud services and use of open data for bioinformatics research
title_fullStr Improving data workflow systems with cloud services and use of open data for bioinformatics research
title_full_unstemmed Improving data workflow systems with cloud services and use of open data for bioinformatics research
title_short Improving data workflow systems with cloud services and use of open data for bioinformatics research
title_sort improving data workflow systems with cloud services and use of open data for bioinformatics research
topic Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6169675/
https://www.ncbi.nlm.nih.gov/pubmed/28419324
http://dx.doi.org/10.1093/bib/bbx039
work_keys_str_mv AT karimmdrezaul improvingdataworkflowsystemswithcloudservicesanduseofopendataforbioinformaticsresearch
AT michelaudrey improvingdataworkflowsystemswithcloudservicesanduseofopendataforbioinformaticsresearch
AT zappaachille improvingdataworkflowsystemswithcloudservicesanduseofopendataforbioinformaticsresearch
AT baranovpavel improvingdataworkflowsystemswithcloudservicesanduseofopendataforbioinformaticsresearch
AT sahayratnesh improvingdataworkflowsystemswithcloudservicesanduseofopendataforbioinformaticsresearch
AT rebholzschuhmanndietrich improvingdataworkflowsystemswithcloudservicesanduseofopendataforbioinformaticsresearch