Cargando…

Hybrid cloud and cluster computing paradigms for life science applications

BACKGROUND: Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an ite...

Descripción completa

Detalles Bibliográficos
Autores principales: Qiu, Judy, Ekanayake, Jaliya, Gunarathne, Thilina, Choi, Jong Youl, Bae, Seung-Hee, Li, Hui, Zhang, Bingjing, Wu, Tak-Lon, Ruan, Yang, Ekanayake, Saliya, Hughes, Adam, Fox, Geoffrey
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3040529/
https://www.ncbi.nlm.nih.gov/pubmed/21210982
http://dx.doi.org/10.1186/1471-2105-11-S12-S3
_version_ 1782198330447626240
author Qiu, Judy
Ekanayake, Jaliya
Gunarathne, Thilina
Choi, Jong Youl
Bae, Seung-Hee
Li, Hui
Zhang, Bingjing
Wu, Tak-Lon
Ruan, Yang
Ekanayake, Saliya
Hughes, Adam
Fox, Geoffrey
author_facet Qiu, Judy
Ekanayake, Jaliya
Gunarathne, Thilina
Choi, Jong Youl
Bae, Seung-Hee
Li, Hui
Zhang, Bingjing
Wu, Tak-Lon
Ruan, Yang
Ekanayake, Saliya
Hughes, Adam
Fox, Geoffrey
author_sort Qiu, Judy
collection PubMed
description BACKGROUND: Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. RESULTS: Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. CONCLUSIONS: The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. METHODS: We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments.
format Text
id pubmed-3040529
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30405292011-02-18 Hybrid cloud and cluster computing paradigms for life science applications Qiu, Judy Ekanayake, Jaliya Gunarathne, Thilina Choi, Jong Youl Bae, Seung-Hee Li, Hui Zhang, Bingjing Wu, Tak-Lon Ruan, Yang Ekanayake, Saliya Hughes, Adam Fox, Geoffrey BMC Bioinformatics Proceedings BACKGROUND: Clouds and MapReduce have shown themselves to be a broadly useful approach to scientific computing especially for parallel data intensive applications. However they have limited applicability to some areas such as data mining because MapReduce has poor performance on problems with an iterative structure present in the linear algebra that underlies much data analysis. Such problems can be run efficiently on clusters using MPI leading to a hybrid cloud and cluster environment. This motivates the design and implementation of an open source Iterative MapReduce system Twister. RESULTS: Comparisons of Amazon, Azure, and traditional Linux and Windows environments on common applications have shown encouraging performance and usability comparisons in several important non iterative cases. These are linked to MPI applications for final stages of the data analysis. Further we have released the open source Twister Iterative MapReduce and benchmarked it against basic MapReduce (Hadoop) and MPI in information retrieval and life sciences applications. CONCLUSIONS: The hybrid cloud (MapReduce) and cluster (MPI) approach offers an attractive production environment while Twister promises a uniform programming environment for many Life Sciences applications. METHODS: We used commercial clouds Amazon and Azure and the NSF resource FutureGrid to perform detailed comparisons and evaluations of different approaches to data intensive computing. Several applications were developed in MPI, MapReduce and Twister in these different environments. BioMed Central 2010-12-21 /pmc/articles/PMC3040529/ /pubmed/21210982 http://dx.doi.org/10.1186/1471-2105-11-S12-S3 Text en Copyright ©2010 Qiu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Qiu, Judy
Ekanayake, Jaliya
Gunarathne, Thilina
Choi, Jong Youl
Bae, Seung-Hee
Li, Hui
Zhang, Bingjing
Wu, Tak-Lon
Ruan, Yang
Ekanayake, Saliya
Hughes, Adam
Fox, Geoffrey
Hybrid cloud and cluster computing paradigms for life science applications
title Hybrid cloud and cluster computing paradigms for life science applications
title_full Hybrid cloud and cluster computing paradigms for life science applications
title_fullStr Hybrid cloud and cluster computing paradigms for life science applications
title_full_unstemmed Hybrid cloud and cluster computing paradigms for life science applications
title_short Hybrid cloud and cluster computing paradigms for life science applications
title_sort hybrid cloud and cluster computing paradigms for life science applications
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3040529/
https://www.ncbi.nlm.nih.gov/pubmed/21210982
http://dx.doi.org/10.1186/1471-2105-11-S12-S3
work_keys_str_mv AT qiujudy hybridcloudandclustercomputingparadigmsforlifescienceapplications
AT ekanayakejaliya hybridcloudandclustercomputingparadigmsforlifescienceapplications
AT gunarathnethilina hybridcloudandclustercomputingparadigmsforlifescienceapplications
AT choijongyoul hybridcloudandclustercomputingparadigmsforlifescienceapplications
AT baeseunghee hybridcloudandclustercomputingparadigmsforlifescienceapplications
AT lihui hybridcloudandclustercomputingparadigmsforlifescienceapplications
AT zhangbingjing hybridcloudandclustercomputingparadigmsforlifescienceapplications
AT wutaklon hybridcloudandclustercomputingparadigmsforlifescienceapplications
AT ruanyang hybridcloudandclustercomputingparadigmsforlifescienceapplications
AT ekanayakesaliya hybridcloudandclustercomputingparadigmsforlifescienceapplications
AT hughesadam hybridcloudandclustercomputingparadigmsforlifescienceapplications
AT foxgeoffrey hybridcloudandclustercomputingparadigmsforlifescienceapplications