Cargando…

Disk failures in the EOS setup at CERN - A first systematic look at 1 year of collected data

The EOS deployment at CERN is a core service used for both scientific data processing, analysis and as back-end for general end-user storage (eg home directories/CERNBOX). The collected disk failure metrics over a period of 1 year from a deployment size of some 70k disks allows a first systematic an...

Descripción completa

Detalles Bibliográficos
Autores principales: Duellmann, Dirk, Portabales, Alfonso
Lenguaje:eng
Publicado: 2019
Materias:
Acceso en línea:https://dx.doi.org/10.1051/epjconf/201921404046
http://cds.cern.ch/record/2701408
_version_ 1780964602296664064
author Duellmann, Dirk
Portabales, Alfonso
author_facet Duellmann, Dirk
Portabales, Alfonso
author_sort Duellmann, Dirk
collection CERN
description The EOS deployment at CERN is a core service used for both scientific data processing, analysis and as back-end for general end-user storage (eg home directories/CERNBOX). The collected disk failure metrics over a period of 1 year from a deployment size of some 70k disks allows a first systematic analysis of the behaviour of different hard disk types for the large CERN use-cases. In this contribution we describe the data collection and analysis, summarise the measured rates and compare them with other large disk deployments. We further describe initial steps to use the collected failure and SMART metrics to develop a machine learning model predicting imminent failures and hence avoid service degradation and repair costs.
id oai-inspirehep.net-1760998
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2019
record_format invenio
spelling oai-inspirehep.net-17609982022-08-10T12:20:33Zdoi:10.1051/epjconf/201921404046http://cds.cern.ch/record/2701408engDuellmann, DirkPortabales, AlfonsoDisk failures in the EOS setup at CERN - A first systematic look at 1 year of collected dataComputing and ComputersThe EOS deployment at CERN is a core service used for both scientific data processing, analysis and as back-end for general end-user storage (eg home directories/CERNBOX). The collected disk failure metrics over a period of 1 year from a deployment size of some 70k disks allows a first systematic analysis of the behaviour of different hard disk types for the large CERN use-cases. In this contribution we describe the data collection and analysis, summarise the measured rates and compare them with other large disk deployments. We further describe initial steps to use the collected failure and SMART metrics to develop a machine learning model predicting imminent failures and hence avoid service degradation and repair costs.oai:inspirehep.net:17609982019
spellingShingle Computing and Computers
Duellmann, Dirk
Portabales, Alfonso
Disk failures in the EOS setup at CERN - A first systematic look at 1 year of collected data
title Disk failures in the EOS setup at CERN - A first systematic look at 1 year of collected data
title_full Disk failures in the EOS setup at CERN - A first systematic look at 1 year of collected data
title_fullStr Disk failures in the EOS setup at CERN - A first systematic look at 1 year of collected data
title_full_unstemmed Disk failures in the EOS setup at CERN - A first systematic look at 1 year of collected data
title_short Disk failures in the EOS setup at CERN - A first systematic look at 1 year of collected data
title_sort disk failures in the eos setup at cern - a first systematic look at 1 year of collected data
topic Computing and Computers
url https://dx.doi.org/10.1051/epjconf/201921404046
http://cds.cern.ch/record/2701408
work_keys_str_mv AT duellmanndirk diskfailuresintheeossetupatcernafirstsystematiclookat1yearofcollecteddata
AT portabalesalfonso diskfailuresintheeossetupatcernafirstsystematiclookat1yearofcollecteddata