Cargando…

Advancements in Big Data Processing

The ever-increasing volumes of scientific data present new challenges for Distributed Computing and Grid-technologies. The emerging Big Data revolution drives new discoveries in scientific fields including nanotechnology, astrophysics, high-energy physics, biology and medicine. New initiatives are t...

Descripción completa

Detalles Bibliográficos
Autor principal: Vaniachine, A
Lenguaje:eng
Publicado: 2012
Materias:
Acceso en línea:http://cds.cern.ch/record/1462234
Descripción
Sumario:The ever-increasing volumes of scientific data present new challenges for Distributed Computing and Grid-technologies. The emerging Big Data revolution drives new discoveries in scientific fields including nanotechnology, astrophysics, high-energy physics, biology and medicine. New initiatives are transforming data-driven scientific fields by pushing Bid Data limits enabling massive data analysis in new ways. In petascale data processing scientists deal with datasets, not individual files. As a result, a task (comprised of many jobs) became a unit of petascale data processing on the Grid. Splitting of a large data processing task into jobs enabled fine-granularity checkpointing analogous to the splitting of a large file into smaller TCP/IP packets during data transfers. Transferring large data in small packets achieves reliability through automatic re-sending of the dropped TCP/IP packets. Similarly, transient job failures on the Grid can be recovered by automatic re-tries to achieve reliable Six Sigma production quality in petascale data processing on the Grid. The LHC computing experience provides foundation for reliability engineering scaling up Grid-technologies for data processing beyond the petascale.