Cargando…

An alternative approach to dimension reduction for pareto distributed data: a case study

Deep learning models are tools for data analysis suitable for approximating (non-linear) relationships among variables for the best prediction of an outcome. While these models can be used to answer many important questions, their utility is still harshly criticized, being extremely challenging to i...

Descripción completa

Detalles Bibliográficos
Autores principales:	Roccetti, Marco, Delnevo, Giovanni, Casini, Luca, Mirri, Silvia
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2021
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7905765/ https://www.ncbi.nlm.nih.gov/pubmed/33649714 http://dx.doi.org/10.1186/s40537-021-00428-8

_version_	1783655170627862528
author	Roccetti, Marco Delnevo, Giovanni Casini, Luca Mirri, Silvia
author_facet	Roccetti, Marco Delnevo, Giovanni Casini, Luca Mirri, Silvia
author_sort	Roccetti, Marco
collection	PubMed
description	Deep learning models are tools for data analysis suitable for approximating (non-linear) relationships among variables for the best prediction of an outcome. While these models can be used to answer many important questions, their utility is still harshly criticized, being extremely challenging to identify which data descriptors are the most adequate to represent a given specific phenomenon of interest. With a recent experience in the development of a deep learning model designed to detect failures in mechanical water meter devices, we have learnt that a sensible deterioration of the prediction accuracy can occur if one tries to train a deep learning model by adding specific device descriptors, based on categorical data. This can happen because of an excessive increase in the dimensions of the data, with a correspondent loss of statistical significance. After several unsuccessful experiments conducted with alternative methodologies that either permit to reduce the data space dimensionality or employ more traditional machine learning algorithms, we changed the training strategy, reconsidering that categorical data, in the light of a Pareto analysis. In essence, we used those categorical descriptors, not as an input on which to train our deep learning model, but as a tool to give a new shape to the dataset, based on the Pareto rule. With this data adjustment, we trained a more performative deep learning model able to detect defective water meter devices with a prediction accuracy in the range 87–90%, even in the presence of categorical descriptors.
format	Online Article Text
id	pubmed-7905765
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-79057652021-02-25 An alternative approach to dimension reduction for pareto distributed data: a case study Roccetti, Marco Delnevo, Giovanni Casini, Luca Mirri, Silvia J Big Data Research Deep learning models are tools for data analysis suitable for approximating (non-linear) relationships among variables for the best prediction of an outcome. While these models can be used to answer many important questions, their utility is still harshly criticized, being extremely challenging to identify which data descriptors are the most adequate to represent a given specific phenomenon of interest. With a recent experience in the development of a deep learning model designed to detect failures in mechanical water meter devices, we have learnt that a sensible deterioration of the prediction accuracy can occur if one tries to train a deep learning model by adding specific device descriptors, based on categorical data. This can happen because of an excessive increase in the dimensions of the data, with a correspondent loss of statistical significance. After several unsuccessful experiments conducted with alternative methodologies that either permit to reduce the data space dimensionality or employ more traditional machine learning algorithms, we changed the training strategy, reconsidering that categorical data, in the light of a Pareto analysis. In essence, we used those categorical descriptors, not as an input on which to train our deep learning model, but as a tool to give a new shape to the dataset, based on the Pareto rule. With this data adjustment, we trained a more performative deep learning model able to detect defective water meter devices with a prediction accuracy in the range 87–90%, even in the presence of categorical descriptors. Springer International Publishing 2021-02-25 2021 /pmc/articles/PMC7905765/ /pubmed/33649714 http://dx.doi.org/10.1186/s40537-021-00428-8 Text en © The Author(s) 2021 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle	Research Roccetti, Marco Delnevo, Giovanni Casini, Luca Mirri, Silvia An alternative approach to dimension reduction for pareto distributed data: a case study
title	An alternative approach to dimension reduction for pareto distributed data: a case study
title_full	An alternative approach to dimension reduction for pareto distributed data: a case study
title_fullStr	An alternative approach to dimension reduction for pareto distributed data: a case study
title_full_unstemmed	An alternative approach to dimension reduction for pareto distributed data: a case study
title_short	An alternative approach to dimension reduction for pareto distributed data: a case study
title_sort	alternative approach to dimension reduction for pareto distributed data: a case study
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7905765/ https://www.ncbi.nlm.nih.gov/pubmed/33649714 http://dx.doi.org/10.1186/s40537-021-00428-8
work_keys_str_mv	AT roccettimarco analternativeapproachtodimensionreductionforparetodistributeddataacasestudy AT delnevogiovanni analternativeapproachtodimensionreductionforparetodistributeddataacasestudy AT casiniluca analternativeapproachtodimensionreductionforparetodistributeddataacasestudy AT mirrisilvia analternativeapproachtodimensionreductionforparetodistributeddataacasestudy AT roccettimarco alternativeapproachtodimensionreductionforparetodistributeddataacasestudy AT delnevogiovanni alternativeapproachtodimensionreductionforparetodistributeddataacasestudy AT casiniluca alternativeapproachtodimensionreductionforparetodistributeddataacasestudy AT mirrisilvia alternativeapproachtodimensionreductionforparetodistributeddataacasestudy

An alternative approach to dimension reduction for pareto distributed data: a case study

Ejemplares similares