Cargando…

Bioinformatics Application with Kubeflow for Batch Processing in Clouds

Bioinformatics pipelines make extensive use of HPC batch processing. The rapid growth of data volumes and computational complexity, especially for modern applications such as machine learning algorithms, imposes significant challenges to local HPC facilities. Many attempts have been made to burst HP...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yuan, David Yu, Wildish, Tony
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7571545/ http://dx.doi.org/10.1007/978-3-030-59851-8_24

_version_	1783597189086314496
author	Yuan, David Yu Wildish, Tony
author_facet	Yuan, David Yu Wildish, Tony
author_sort	Yuan, David Yu
collection	PubMed
description	Bioinformatics pipelines make extensive use of HPC batch processing. The rapid growth of data volumes and computational complexity, especially for modern applications such as machine learning algorithms, imposes significant challenges to local HPC facilities. Many attempts have been made to burst HPC batch processing into clouds with virtual machines. They all suffer from some common issues, for example: very high overhead, slow to scale up and slow to scale down, and nearly impossible to be cloud-agnostic. We have successfully deployed and run several pipelines on Kubernetes in OpenStack, Google Cloud Platform and Amazon Web Services. In particular, we use Kubeflow on top of Kubernetes for more sophisticated job scheduling, workflow management, and first class support for machine learning. We choose Kubeflow/Kubernetes to avoid the overhead of provisioning of virtual machines, to achieve rapid scaling with containers, and to be truly cloud-agnostic in all cloud environments. Kubeflow on Kubernetes also creates some new challenges in deployment, data access, performance monitoring, etc. We will discuss the details of these challenges and provide our solutions. We will demonstrate how our solutions work across all three very different clouds for both classical pipelines and new ones for machine learning.
format	Online Article Text
id	pubmed-7571545
institution	National Center for Biotechnology Information
language	English
publishDate	2020
record_format	MEDLINE/PubMed
spelling	pubmed-75715452020-10-20 Bioinformatics Application with Kubeflow for Batch Processing in Clouds Yuan, David Yu Wildish, Tony High Performance Computing Article Bioinformatics pipelines make extensive use of HPC batch processing. The rapid growth of data volumes and computational complexity, especially for modern applications such as machine learning algorithms, imposes significant challenges to local HPC facilities. Many attempts have been made to burst HPC batch processing into clouds with virtual machines. They all suffer from some common issues, for example: very high overhead, slow to scale up and slow to scale down, and nearly impossible to be cloud-agnostic. We have successfully deployed and run several pipelines on Kubernetes in OpenStack, Google Cloud Platform and Amazon Web Services. In particular, we use Kubeflow on top of Kubernetes for more sophisticated job scheduling, workflow management, and first class support for machine learning. We choose Kubeflow/Kubernetes to avoid the overhead of provisioning of virtual machines, to achieve rapid scaling with containers, and to be truly cloud-agnostic in all cloud environments. Kubeflow on Kubernetes also creates some new challenges in deployment, data access, performance monitoring, etc. We will discuss the details of these challenges and provide our solutions. We will demonstrate how our solutions work across all three very different clouds for both classical pipelines and new ones for machine learning. 2020-09-15 /pmc/articles/PMC7571545/ http://dx.doi.org/10.1007/978-3-030-59851-8_24 Text en © The Author(s) 2020 Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made. The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
spellingShingle	Article Yuan, David Yu Wildish, Tony Bioinformatics Application with Kubeflow for Batch Processing in Clouds
title	Bioinformatics Application with Kubeflow for Batch Processing in Clouds
title_full	Bioinformatics Application with Kubeflow for Batch Processing in Clouds
title_fullStr	Bioinformatics Application with Kubeflow for Batch Processing in Clouds
title_full_unstemmed	Bioinformatics Application with Kubeflow for Batch Processing in Clouds
title_short	Bioinformatics Application with Kubeflow for Batch Processing in Clouds
title_sort	bioinformatics application with kubeflow for batch processing in clouds
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7571545/ http://dx.doi.org/10.1007/978-3-030-59851-8_24
work_keys_str_mv	AT yuandavidyu bioinformaticsapplicationwithkubeflowforbatchprocessinginclouds AT wildishtony bioinformaticsapplicationwithkubeflowforbatchprocessinginclouds

Bioinformatics Application with Kubeflow for Batch Processing in Clouds

Ejemplares similares