Cargando…

Advancements in Big Data Processing

The ever-increasing volumes of scientific data present new challenges for Distributed Computing and Grid-technologies. The emerging Big Data revolution drives new discoveries in scientific fields including nanotechnology, astrophysics, high-energy physics, biology and medicine. New initiatives are t...

Descripción completa

Detalles Bibliográficos
Autor principal: Vaniachine, A
Lenguaje:eng
Publicado: 2012
Materias:
Acceso en línea:http://cds.cern.ch/record/1462234
_version_ 1780925298695471104
author Vaniachine, A
author_facet Vaniachine, A
author_sort Vaniachine, A
collection CERN
description The ever-increasing volumes of scientific data present new challenges for Distributed Computing and Grid-technologies. The emerging Big Data revolution drives new discoveries in scientific fields including nanotechnology, astrophysics, high-energy physics, biology and medicine. New initiatives are transforming data-driven scientific fields by pushing Bid Data limits enabling massive data analysis in new ways. In petascale data processing scientists deal with datasets, not individual files. As a result, a task (comprised of many jobs) became a unit of petascale data processing on the Grid. Splitting of a large data processing task into jobs enabled fine-granularity checkpointing analogous to the splitting of a large file into smaller TCP/IP packets during data transfers. Transferring large data in small packets achieves reliability through automatic re-sending of the dropped TCP/IP packets. Similarly, transient job failures on the Grid can be recovered by automatic re-tries to achieve reliable Six Sigma production quality in petascale data processing on the Grid. The LHC computing experience provides foundation for reliability engineering scaling up Grid-technologies for data processing beyond the petascale.
id cern-1462234
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2012
record_format invenio
spelling cern-14622342019-09-30T06:29:59Zhttp://cds.cern.ch/record/1462234engVaniachine, AAdvancements in Big Data ProcessingDetectors and Experimental TechniquesThe ever-increasing volumes of scientific data present new challenges for Distributed Computing and Grid-technologies. The emerging Big Data revolution drives new discoveries in scientific fields including nanotechnology, astrophysics, high-energy physics, biology and medicine. New initiatives are transforming data-driven scientific fields by pushing Bid Data limits enabling massive data analysis in new ways. In petascale data processing scientists deal with datasets, not individual files. As a result, a task (comprised of many jobs) became a unit of petascale data processing on the Grid. Splitting of a large data processing task into jobs enabled fine-granularity checkpointing analogous to the splitting of a large file into smaller TCP/IP packets during data transfers. Transferring large data in small packets achieves reliability through automatic re-sending of the dropped TCP/IP packets. Similarly, transient job failures on the Grid can be recovered by automatic re-tries to achieve reliable Six Sigma production quality in petascale data processing on the Grid. The LHC computing experience provides foundation for reliability engineering scaling up Grid-technologies for data processing beyond the petascale.ATL-SOFT-SLIDE-2012-441oai:cds.cern.ch:14622342012-07-17
spellingShingle Detectors and Experimental Techniques
Vaniachine, A
Advancements in Big Data Processing
title Advancements in Big Data Processing
title_full Advancements in Big Data Processing
title_fullStr Advancements in Big Data Processing
title_full_unstemmed Advancements in Big Data Processing
title_short Advancements in Big Data Processing
title_sort advancements in big data processing
topic Detectors and Experimental Techniques
url http://cds.cern.ch/record/1462234
work_keys_str_mv AT vaniachinea advancementsinbigdataprocessing