Cargando…

Predicting Hard Disk Failure by Means of Automatized Labeling and Machine Learning Approach

Today, cloud systems provide many key services to development and production environments; reliable storage services are crucial for a multitude of applications ranging from commercial manufacturing, distribution and sales up to scientific research, which is often at the forefront of computing resou...

Descripción completa

Detalles Bibliográficos
Autores principales: Gargiulo, Federico, Duellmann, Dirk, Arpaia, Pasquale, Lo Moriello, Rosario Schiano
Lenguaje:eng
Publicado: 2021
Acceso en línea:https://dx.doi.org/10.3390/app11188293
http://cds.cern.ch/record/2783635
_version_ 1780972063389908992
author Gargiulo, Federico
Duellmann, Dirk
Arpaia, Pasquale
Lo Moriello, Rosario Schiano
author_facet Gargiulo, Federico
Duellmann, Dirk
Arpaia, Pasquale
Lo Moriello, Rosario Schiano
author_sort Gargiulo, Federico
collection CERN
description Today, cloud systems provide many key services to development and production environments; reliable storage services are crucial for a multitude of applications ranging from commercial manufacturing, distribution and sales up to scientific research, which is often at the forefront of computing resource demands. In large-scale computer centers, the storage system requires particular attention and investment; usually, a large number of diverse storage devices need to be deployed in order to match the varying performance and volume requirements of changing user applications. As of today, magnetic drives still play a dominant role in terms of deployed storage volume and of service outages due to device failure. In this paper, we study methods to facilitate automated proactive disk replacement. We propose a method to identify disks with media failures in a production environment and describe an application of supervised machine learning to predict disk failures. In particular, a proper stage to automatically label (healthy/at-risk) the disks during the training and validation stage is presented along with tuning strategy to optimize the hyperparameters of the associated machine learning classifier. The approach is trained and validated against a large set of 65,000 hard drives in the CERN computer center, and the achieved results are discussed.
id cern-2783635
institution Organización Europea para la Investigación Nuclear
language eng
publishDate 2021
record_format invenio
spelling cern-27836352021-10-09T20:37:20Zdoi:10.3390/app11188293http://cds.cern.ch/record/2783635engGargiulo, FedericoDuellmann, DirkArpaia, PasqualeLo Moriello, Rosario SchianoPredicting Hard Disk Failure by Means of Automatized Labeling and Machine Learning ApproachToday, cloud systems provide many key services to development and production environments; reliable storage services are crucial for a multitude of applications ranging from commercial manufacturing, distribution and sales up to scientific research, which is often at the forefront of computing resource demands. In large-scale computer centers, the storage system requires particular attention and investment; usually, a large number of diverse storage devices need to be deployed in order to match the varying performance and volume requirements of changing user applications. As of today, magnetic drives still play a dominant role in terms of deployed storage volume and of service outages due to device failure. In this paper, we study methods to facilitate automated proactive disk replacement. We propose a method to identify disks with media failures in a production environment and describe an application of supervised machine learning to predict disk failures. In particular, a proper stage to automatically label (healthy/at-risk) the disks during the training and validation stage is presented along with tuning strategy to optimize the hyperparameters of the associated machine learning classifier. The approach is trained and validated against a large set of 65,000 hard drives in the CERN computer center, and the achieved results are discussed.oai:cds.cern.ch:27836352021
spellingShingle Gargiulo, Federico
Duellmann, Dirk
Arpaia, Pasquale
Lo Moriello, Rosario Schiano
Predicting Hard Disk Failure by Means of Automatized Labeling and Machine Learning Approach
title Predicting Hard Disk Failure by Means of Automatized Labeling and Machine Learning Approach
title_full Predicting Hard Disk Failure by Means of Automatized Labeling and Machine Learning Approach
title_fullStr Predicting Hard Disk Failure by Means of Automatized Labeling and Machine Learning Approach
title_full_unstemmed Predicting Hard Disk Failure by Means of Automatized Labeling and Machine Learning Approach
title_short Predicting Hard Disk Failure by Means of Automatized Labeling and Machine Learning Approach
title_sort predicting hard disk failure by means of automatized labeling and machine learning approach
url https://dx.doi.org/10.3390/app11188293
http://cds.cern.ch/record/2783635
work_keys_str_mv AT gargiulofederico predictingharddiskfailurebymeansofautomatizedlabelingandmachinelearningapproach
AT duellmanndirk predictingharddiskfailurebymeansofautomatizedlabelingandmachinelearningapproach
AT arpaiapasquale predictingharddiskfailurebymeansofautomatizedlabelingandmachinelearningapproach
AT lomoriellorosarioschiano predictingharddiskfailurebymeansofautomatizedlabelingandmachinelearningapproach