Cargando…

An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer

Big data, cloud computing, and high-performance computing (HPC) are at the verge of convergence. Cloud computing is already playing an active part in big data processing with the help of big data frameworks like Hadoop and Spark. The recent upsurge of high-performance computing in China provides ext...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Xi, Wu, Chengkun, Lu, Kai, Fang, Lin, Zhang, Yong, Li, Shengkang, Guo, Guixin, Du, YunFei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6149962/
https://www.ncbi.nlm.nih.gov/pubmed/29194413
http://dx.doi.org/10.3390/molecules22122116
_version_ 1783356906004283392
author Yang, Xi
Wu, Chengkun
Lu, Kai
Fang, Lin
Zhang, Yong
Li, Shengkang
Guo, Guixin
Du, YunFei
author_facet Yang, Xi
Wu, Chengkun
Lu, Kai
Fang, Lin
Zhang, Yong
Li, Shengkang
Guo, Guixin
Du, YunFei
author_sort Yang, Xi
collection PubMed
description Big data, cloud computing, and high-performance computing (HPC) are at the verge of convergence. Cloud computing is already playing an active part in big data processing with the help of big data frameworks like Hadoop and Spark. The recent upsurge of high-performance computing in China provides extra possibilities and capacity to address the challenges associated with big data. In this paper, we propose Orion—a big data interface on the Tianhe-2 supercomputer—to enable big data applications to run on Tianhe-2 via a single command or a shell script. Orion supports multiple users, and each user can launch multiple tasks. It minimizes the effort needed to initiate big data applications on the Tianhe-2 supercomputer via automated configuration. Orion follows the “allocate-when-needed” paradigm, and it avoids the idle occupation of computational resources. We tested the utility and performance of Orion using a big genomic dataset and achieved a satisfactory performance on Tianhe-2 with very few modifications to existing applications that were implemented in Hadoop/Spark. In summary, Orion provides a practical and economical interface for big data processing on Tianhe-2.
format Online
Article
Text
id pubmed-6149962
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-61499622018-11-13 An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer Yang, Xi Wu, Chengkun Lu, Kai Fang, Lin Zhang, Yong Li, Shengkang Guo, Guixin Du, YunFei Molecules Article Big data, cloud computing, and high-performance computing (HPC) are at the verge of convergence. Cloud computing is already playing an active part in big data processing with the help of big data frameworks like Hadoop and Spark. The recent upsurge of high-performance computing in China provides extra possibilities and capacity to address the challenges associated with big data. In this paper, we propose Orion—a big data interface on the Tianhe-2 supercomputer—to enable big data applications to run on Tianhe-2 via a single command or a shell script. Orion supports multiple users, and each user can launch multiple tasks. It minimizes the effort needed to initiate big data applications on the Tianhe-2 supercomputer via automated configuration. Orion follows the “allocate-when-needed” paradigm, and it avoids the idle occupation of computational resources. We tested the utility and performance of Orion using a big genomic dataset and achieved a satisfactory performance on Tianhe-2 with very few modifications to existing applications that were implemented in Hadoop/Spark. In summary, Orion provides a practical and economical interface for big data processing on Tianhe-2. MDPI 2017-12-01 /pmc/articles/PMC6149962/ /pubmed/29194413 http://dx.doi.org/10.3390/molecules22122116 Text en © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Yang, Xi
Wu, Chengkun
Lu, Kai
Fang, Lin
Zhang, Yong
Li, Shengkang
Guo, Guixin
Du, YunFei
An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer
title An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer
title_full An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer
title_fullStr An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer
title_full_unstemmed An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer
title_short An Interface for Biomedical Big Data Processing on the Tianhe-2 Supercomputer
title_sort interface for biomedical big data processing on the tianhe-2 supercomputer
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6149962/
https://www.ncbi.nlm.nih.gov/pubmed/29194413
http://dx.doi.org/10.3390/molecules22122116
work_keys_str_mv AT yangxi aninterfaceforbiomedicalbigdataprocessingonthetianhe2supercomputer
AT wuchengkun aninterfaceforbiomedicalbigdataprocessingonthetianhe2supercomputer
AT lukai aninterfaceforbiomedicalbigdataprocessingonthetianhe2supercomputer
AT fanglin aninterfaceforbiomedicalbigdataprocessingonthetianhe2supercomputer
AT zhangyong aninterfaceforbiomedicalbigdataprocessingonthetianhe2supercomputer
AT lishengkang aninterfaceforbiomedicalbigdataprocessingonthetianhe2supercomputer
AT guoguixin aninterfaceforbiomedicalbigdataprocessingonthetianhe2supercomputer
AT duyunfei aninterfaceforbiomedicalbigdataprocessingonthetianhe2supercomputer
AT yangxi interfaceforbiomedicalbigdataprocessingonthetianhe2supercomputer
AT wuchengkun interfaceforbiomedicalbigdataprocessingonthetianhe2supercomputer
AT lukai interfaceforbiomedicalbigdataprocessingonthetianhe2supercomputer
AT fanglin interfaceforbiomedicalbigdataprocessingonthetianhe2supercomputer
AT zhangyong interfaceforbiomedicalbigdataprocessingonthetianhe2supercomputer
AT lishengkang interfaceforbiomedicalbigdataprocessingonthetianhe2supercomputer
AT guoguixin interfaceforbiomedicalbigdataprocessingonthetianhe2supercomputer
AT duyunfei interfaceforbiomedicalbigdataprocessingonthetianhe2supercomputer