Cargando…

Sector and Sphere: the design and implementation of a high-performance data cloud

Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply, given the right programming model and infrastructure. In this paper, we describe the design and implementation of the Sector storage cloud and the Sphere compute cloud. By contrast with t...

Descripción completa

Detalles Bibliográficos
Autores principales: Gu, Yunhong, Grossman, Robert L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: The Royal Society 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3391065/
https://www.ncbi.nlm.nih.gov/pubmed/19451100
http://dx.doi.org/10.1098/rsta.2009.0053
_version_ 1782237484188434432
author Gu, Yunhong
Grossman, Robert L.
author_facet Gu, Yunhong
Grossman, Robert L.
author_sort Gu, Yunhong
collection PubMed
description Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply, given the right programming model and infrastructure. In this paper, we describe the design and implementation of the Sector storage cloud and the Sphere compute cloud. By contrast with the existing storage and compute clouds, Sector can manage data not only within a data centre, but also across geographically distributed data centres. Similarly, the Sphere compute cloud supports user-defined functions (UDFs) over data both within and across data centres. As a special case, MapReduce-style programming can be implemented in Sphere by using a Map UDF followed by a Reduce UDF. We describe some experimental studies comparing Sector/Sphere and Hadoop using the Terasort benchmark. In these studies, Sector is approximately twice as fast as Hadoop. Sector/Sphere is open source.
format Online
Article
Text
id pubmed-3391065
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher The Royal Society
record_format MEDLINE/PubMed
spelling pubmed-33910652012-07-12 Sector and Sphere: the design and implementation of a high-performance data cloud Gu, Yunhong Grossman, Robert L. Philos Trans A Math Phys Eng Sci Articles Cloud computing has demonstrated that processing very large datasets over commodity clusters can be done simply, given the right programming model and infrastructure. In this paper, we describe the design and implementation of the Sector storage cloud and the Sphere compute cloud. By contrast with the existing storage and compute clouds, Sector can manage data not only within a data centre, but also across geographically distributed data centres. Similarly, the Sphere compute cloud supports user-defined functions (UDFs) over data both within and across data centres. As a special case, MapReduce-style programming can be implemented in Sphere by using a Map UDF followed by a Reduce UDF. We describe some experimental studies comparing Sector/Sphere and Hadoop using the Terasort benchmark. In these studies, Sector is approximately twice as fast as Hadoop. Sector/Sphere is open source. The Royal Society 2009-06-28 /pmc/articles/PMC3391065/ /pubmed/19451100 http://dx.doi.org/10.1098/rsta.2009.0053 Text en Copyright © 2009 The Royal Society http://creativecommons.org/licenses/by/2.5/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Articles
Gu, Yunhong
Grossman, Robert L.
Sector and Sphere: the design and implementation of a high-performance data cloud
title Sector and Sphere: the design and implementation of a high-performance data cloud
title_full Sector and Sphere: the design and implementation of a high-performance data cloud
title_fullStr Sector and Sphere: the design and implementation of a high-performance data cloud
title_full_unstemmed Sector and Sphere: the design and implementation of a high-performance data cloud
title_short Sector and Sphere: the design and implementation of a high-performance data cloud
title_sort sector and sphere: the design and implementation of a high-performance data cloud
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3391065/
https://www.ncbi.nlm.nih.gov/pubmed/19451100
http://dx.doi.org/10.1098/rsta.2009.0053
work_keys_str_mv AT guyunhong sectorandspherethedesignandimplementationofahighperformancedatacloud
AT grossmanrobertl sectorandspherethedesignandimplementationofahighperformancedatacloud