Cargando…
Advancements in Big Data Processing
The ever-increasing volumes of scientific data present new challenges for Distributed Computing and Grid-technologies. The emerging Big Data revolution drives new discoveries in scientific fields including nanotechnology, astrophysics, high-energy physics, biology and medicine. New initiatives are t...
Autor principal: | |
---|---|
Lenguaje: | eng |
Publicado: |
2012
|
Materias: | |
Acceso en línea: | http://cds.cern.ch/record/1462234 |
_version_ | 1780925298695471104 |
---|---|
author | Vaniachine, A |
author_facet | Vaniachine, A |
author_sort | Vaniachine, A |
collection | CERN |
description | The ever-increasing volumes of scientific data present new challenges for Distributed Computing and Grid-technologies. The emerging Big Data revolution drives new discoveries in scientific fields including nanotechnology, astrophysics, high-energy physics, biology and medicine. New initiatives are transforming data-driven scientific fields by pushing Bid Data limits enabling massive data analysis in new ways. In petascale data processing scientists deal with datasets, not individual files. As a result, a task (comprised of many jobs) became a unit of petascale data processing on the Grid. Splitting of a large data processing task into jobs enabled fine-granularity checkpointing analogous to the splitting of a large file into smaller TCP/IP packets during data transfers. Transferring large data in small packets achieves reliability through automatic re-sending of the dropped TCP/IP packets. Similarly, transient job failures on the Grid can be recovered by automatic re-tries to achieve reliable Six Sigma production quality in petascale data processing on the Grid. The LHC computing experience provides foundation for reliability engineering scaling up Grid-technologies for data processing beyond the petascale. |
id | cern-1462234 |
institution | Organización Europea para la Investigación Nuclear |
language | eng |
publishDate | 2012 |
record_format | invenio |
spelling | cern-14622342019-09-30T06:29:59Zhttp://cds.cern.ch/record/1462234engVaniachine, AAdvancements in Big Data ProcessingDetectors and Experimental TechniquesThe ever-increasing volumes of scientific data present new challenges for Distributed Computing and Grid-technologies. The emerging Big Data revolution drives new discoveries in scientific fields including nanotechnology, astrophysics, high-energy physics, biology and medicine. New initiatives are transforming data-driven scientific fields by pushing Bid Data limits enabling massive data analysis in new ways. In petascale data processing scientists deal with datasets, not individual files. As a result, a task (comprised of many jobs) became a unit of petascale data processing on the Grid. Splitting of a large data processing task into jobs enabled fine-granularity checkpointing analogous to the splitting of a large file into smaller TCP/IP packets during data transfers. Transferring large data in small packets achieves reliability through automatic re-sending of the dropped TCP/IP packets. Similarly, transient job failures on the Grid can be recovered by automatic re-tries to achieve reliable Six Sigma production quality in petascale data processing on the Grid. The LHC computing experience provides foundation for reliability engineering scaling up Grid-technologies for data processing beyond the petascale.ATL-SOFT-SLIDE-2012-441oai:cds.cern.ch:14622342012-07-17 |
spellingShingle | Detectors and Experimental Techniques Vaniachine, A Advancements in Big Data Processing |
title | Advancements in Big Data Processing |
title_full | Advancements in Big Data Processing |
title_fullStr | Advancements in Big Data Processing |
title_full_unstemmed | Advancements in Big Data Processing |
title_short | Advancements in Big Data Processing |
title_sort | advancements in big data processing |
topic | Detectors and Experimental Techniques |
url | http://cds.cern.ch/record/1462234 |
work_keys_str_mv | AT vaniachinea advancementsinbigdataprocessing |