Cargando…

Leakage Prediction in Machine Learning Models When Using Data from Sports Wearable Sensors

One of the major problems in machine learning is data leakage, which can be directly related to adversarial type attacks, raising serious concerns about the validity and reliability of artificial intelligence. Data leakage occurs when the independent variables used to teach the machine learning algo...

Descripción completa

Detalles Bibliográficos
Autor principal:	Dong, Qizheng
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9129943/ https://www.ncbi.nlm.nih.gov/pubmed/35619770 http://dx.doi.org/10.1155/2022/5314671

_version_	1784712876915687424
author	Dong, Qizheng
author_facet	Dong, Qizheng
author_sort	Dong, Qizheng
collection	PubMed
description	One of the major problems in machine learning is data leakage, which can be directly related to adversarial type attacks, raising serious concerns about the validity and reliability of artificial intelligence. Data leakage occurs when the independent variables used to teach the machine learning algorithm include either the dependent variable itself or a variable that contains clear information that the model is trying to predict. This data leakage results in unreliable and poor predictive results after the development and use of the model. It prevents the model from generalizing, which is required in a machine learning problem and thus causes false assumptions about its performance. To have a solid and generalized forecasting model, which will be able to produce remarkable forecasting results, we must pay great attention to detecting and preventing data leakage. This study presents an innovative system of leakage prediction in machine learning models, which is based on Bayesian inference to produce a thorough approach to calculating the reverse probability of unseen variables in order to make statistical conclusions about the relevant correlated variables and to calculate accordingly a lower limit on the marginal likelihood of the observed variables being derived from some coupling method. The main notion is that a higher marginal probability for a set of variables suggests a better fit of the data and thus a greater likelihood of a data leak in the model. The methodology is evaluated in a specialized dataset derived from sports wearable sensors.
format	Online Article Text
id	pubmed-9129943
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Hindawi
record_format	MEDLINE/PubMed
spelling	pubmed-91299432022-05-25 Leakage Prediction in Machine Learning Models When Using Data from Sports Wearable Sensors Dong, Qizheng Comput Intell Neurosci Research Article One of the major problems in machine learning is data leakage, which can be directly related to adversarial type attacks, raising serious concerns about the validity and reliability of artificial intelligence. Data leakage occurs when the independent variables used to teach the machine learning algorithm include either the dependent variable itself or a variable that contains clear information that the model is trying to predict. This data leakage results in unreliable and poor predictive results after the development and use of the model. It prevents the model from generalizing, which is required in a machine learning problem and thus causes false assumptions about its performance. To have a solid and generalized forecasting model, which will be able to produce remarkable forecasting results, we must pay great attention to detecting and preventing data leakage. This study presents an innovative system of leakage prediction in machine learning models, which is based on Bayesian inference to produce a thorough approach to calculating the reverse probability of unseen variables in order to make statistical conclusions about the relevant correlated variables and to calculate accordingly a lower limit on the marginal likelihood of the observed variables being derived from some coupling method. The main notion is that a higher marginal probability for a set of variables suggests a better fit of the data and thus a greater likelihood of a data leak in the model. The methodology is evaluated in a specialized dataset derived from sports wearable sensors. Hindawi 2022-05-17 /pmc/articles/PMC9129943/ /pubmed/35619770 http://dx.doi.org/10.1155/2022/5314671 Text en Copyright © 2022 Qizheng Dong. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Dong, Qizheng Leakage Prediction in Machine Learning Models When Using Data from Sports Wearable Sensors
title	Leakage Prediction in Machine Learning Models When Using Data from Sports Wearable Sensors
title_full	Leakage Prediction in Machine Learning Models When Using Data from Sports Wearable Sensors
title_fullStr	Leakage Prediction in Machine Learning Models When Using Data from Sports Wearable Sensors
title_full_unstemmed	Leakage Prediction in Machine Learning Models When Using Data from Sports Wearable Sensors
title_short	Leakage Prediction in Machine Learning Models When Using Data from Sports Wearable Sensors
title_sort	leakage prediction in machine learning models when using data from sports wearable sensors
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9129943/ https://www.ncbi.nlm.nih.gov/pubmed/35619770 http://dx.doi.org/10.1155/2022/5314671
work_keys_str_mv	AT dongqizheng leakagepredictioninmachinelearningmodelswhenusingdatafromsportswearablesensors

Leakage Prediction in Machine Learning Models When Using Data from Sports Wearable Sensors

Ejemplares similares