Cargando…

Analyzing petabytes of data with Hadoop

Abstract The open source Apache Hadoop project provides a powerful suite of tools for storing and analyzing petabytes of data using commodity hardware. After several years of production use inside of web companies like Yahoo! and Facebook and nearly a year of commercial support an...

Descripción completa

Detalles Bibliográficos
Autor principal:	Jeff Hammerbacher
Lenguaje:	eng
Publicado:	2009
Materias:	CERN Computing Colloquium
Acceso en línea:	http://cds.cern.ch/record/1201649

_version_	1780917627241103360
author	Jeff Hammerbacher
author_facet	Jeff Hammerbacher
author_sort	Jeff Hammerbacher
collection	CERN
description	<!--HTML-->Abstract The open source Apache Hadoop project provides a powerful suite of tools for storing and analyzing petabytes of data using commodity hardware. After several years of production use inside of web companies like Yahoo! and Facebook and nearly a year of commercial support and development by Cloudera, the technology is spreading rapidly through other disciplines, from financial services and government to life sciences and high energy physics. The talk will motivate the design of Hadoop and discuss some key implementation details in depth. It will also cover the major subprojects in the Hadoop ecosystem, go over some example applications, highlight best practices for deploying Hadoop in your environment, discuss plans for the future of the technology, and provide pointers to the many resources available for learning more. In addition to providing more information about the Hadoop platform, a major goal of this talk is to begin a dialogue with the ATLAS research team on how the tools commonly used in their environment compare to Hadoop, and how Hadoop could improve better to serve the high energy physics community. Short Biography Jeff Hammerbacher is Vice President of Products and Chief Scientist at Cloudera. Jeff was an Entrepreneur in Residence at Accel Partners immediately prior to founding Cloudera. Before Accel, he conceived, built, and led the Data team at Facebook. The Data team was responsible for driving many of the applications of statistics and machine learning at Facebook, as well as building out the infrastructure to support these tasks for massive data sets. The team produced two open source projects: Hive, a system for offline analysis built above Hadoop, and Cassandra, a structured storage system on a P2P network. Before joining Facebook, Jeff was a quantitative analyst on Wall Street. Jeff earned his Bachelor's Degree in Mathematics from Harvard University and recently served as contributing editor to the book "Beautiful Data", published by O'Reilly in July 2009.
id	cern-1201649
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2009
record_format	invenio
spelling	cern-12016492022-11-02T22:20:56Zhttp://cds.cern.ch/record/1201649engJeff HammerbacherAnalyzing petabytes of data with HadoopAnalyzing petabytes of data with HadoopCERN Computing Colloquium<!--HTML-->Abstract The open source Apache Hadoop project provides a powerful suite of tools for storing and analyzing petabytes of data using commodity hardware. After several years of production use inside of web companies like Yahoo! and Facebook and nearly a year of commercial support and development by Cloudera, the technology is spreading rapidly through other disciplines, from financial services and government to life sciences and high energy physics. The talk will motivate the design of Hadoop and discuss some key implementation details in depth. It will also cover the major subprojects in the Hadoop ecosystem, go over some example applications, highlight best practices for deploying Hadoop in your environment, discuss plans for the future of the technology, and provide pointers to the many resources available for learning more. In addition to providing more information about the Hadoop platform, a major goal of this talk is to begin a dialogue with the ATLAS research team on how the tools commonly used in their environment compare to Hadoop, and how Hadoop could improve better to serve the high energy physics community. Short Biography Jeff Hammerbacher is Vice President of Products and Chief Scientist at Cloudera. Jeff was an Entrepreneur in Residence at Accel Partners immediately prior to founding Cloudera. Before Accel, he conceived, built, and led the Data team at Facebook. The Data team was responsible for driving many of the applications of statistics and machine learning at Facebook, as well as building out the infrastructure to support these tasks for massive data sets. The team produced two open source projects: Hive, a system for offline analysis built above Hadoop, and Cassandra, a structured storage system on a P2P network. Before joining Facebook, Jeff was a quantitative analyst on Wall Street. Jeff earned his Bachelor's Degree in Mathematics from Harvard University and recently served as contributing editor to the book "Beautiful Data", published by O'Reilly in July 2009.oai:cds.cern.ch:12016492009
spellingShingle	CERN Computing Colloquium Jeff Hammerbacher Analyzing petabytes of data with Hadoop
title	Analyzing petabytes of data with Hadoop
title_full	Analyzing petabytes of data with Hadoop
title_fullStr	Analyzing petabytes of data with Hadoop
title_full_unstemmed	Analyzing petabytes of data with Hadoop
title_short	Analyzing petabytes of data with Hadoop
title_sort	analyzing petabytes of data with hadoop
topic	CERN Computing Colloquium
url	http://cds.cern.ch/record/1201649
work_keys_str_mv	AT jeffhammerbacher analyzingpetabytesofdatawithhadoop

Analyzing petabytes of data with Hadoop

Ejemplares similares