Cargando…

Advancements in Big Data Processing

The ever-increasing volumes of scientific data present new challenges for Distributed Computing and Grid-technologies. The emerging Big Data revolution drives new discoveries in scientific fields including nanotechnology, astrophysics, high-energy physics, biology and medicine. New initiatives are t...

Descripción completa

Detalles Bibliográficos
Autor principal:	Vaniachine, A
Lenguaje:	eng
Publicado:	2012
Materias:	Detectors and Experimental Techniques
Acceso en línea:	http://cds.cern.ch/record/1462234

_version_	1780925298695471104
author	Vaniachine, A
author_facet	Vaniachine, A
author_sort	Vaniachine, A
collection	CERN
description	The ever-increasing volumes of scientific data present new challenges for Distributed Computing and Grid-technologies. The emerging Big Data revolution drives new discoveries in scientific fields including nanotechnology, astrophysics, high-energy physics, biology and medicine. New initiatives are transforming data-driven scientific fields by pushing Bid Data limits enabling massive data analysis in new ways. In petascale data processing scientists deal with datasets, not individual files. As a result, a task (comprised of many jobs) became a unit of petascale data processing on the Grid. Splitting of a large data processing task into jobs enabled fine-granularity checkpointing analogous to the splitting of a large file into smaller TCP/IP packets during data transfers. Transferring large data in small packets achieves reliability through automatic re-sending of the dropped TCP/IP packets. Similarly, transient job failures on the Grid can be recovered by automatic re-tries to achieve reliable Six Sigma production quality in petascale data processing on the Grid. The LHC computing experience provides foundation for reliability engineering scaling up Grid-technologies for data processing beyond the petascale.
id	cern-1462234
institution	Organización Europea para la Investigación Nuclear
language	eng
publishDate	2012
record_format	invenio
spelling	cern-14622342019-09-30T06:29:59Zhttp://cds.cern.ch/record/1462234engVaniachine, AAdvancements in Big Data ProcessingDetectors and Experimental TechniquesThe ever-increasing volumes of scientific data present new challenges for Distributed Computing and Grid-technologies. The emerging Big Data revolution drives new discoveries in scientific fields including nanotechnology, astrophysics, high-energy physics, biology and medicine. New initiatives are transforming data-driven scientific fields by pushing Bid Data limits enabling massive data analysis in new ways. In petascale data processing scientists deal with datasets, not individual files. As a result, a task (comprised of many jobs) became a unit of petascale data processing on the Grid. Splitting of a large data processing task into jobs enabled fine-granularity checkpointing analogous to the splitting of a large file into smaller TCP/IP packets during data transfers. Transferring large data in small packets achieves reliability through automatic re-sending of the dropped TCP/IP packets. Similarly, transient job failures on the Grid can be recovered by automatic re-tries to achieve reliable Six Sigma production quality in petascale data processing on the Grid. The LHC computing experience provides foundation for reliability engineering scaling up Grid-technologies for data processing beyond the petascale.ATL-SOFT-SLIDE-2012-441oai:cds.cern.ch:14622342012-07-17
spellingShingle	Detectors and Experimental Techniques Vaniachine, A Advancements in Big Data Processing
title	Advancements in Big Data Processing
title_full	Advancements in Big Data Processing
title_fullStr	Advancements in Big Data Processing
title_full_unstemmed	Advancements in Big Data Processing
title_short	Advancements in Big Data Processing
title_sort	advancements in big data processing
topic	Detectors and Experimental Techniques
url	http://cds.cern.ch/record/1462234
work_keys_str_mv	AT vaniachinea advancementsinbigdataprocessing

Advancements in Big Data Processing

Ejemplares similares