Cargando…

Bayesian network-based over-sampling method (BOSME) with application to indirect cost-sensitive learning

Traditional supervised learning algorithms do not satisfactorily solve the classification problem on imbalanced data sets, since they tend to assign the majority class, to the detriment of the minority class classification. In this paper, we introduce the Bayesian network-based over-sampling method...

Descripción completa

Detalles Bibliográficos
Autores principales: Delgado, Rosario, Núñez-González, J. David
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9130330/
https://www.ncbi.nlm.nih.gov/pubmed/35610323
http://dx.doi.org/10.1038/s41598-022-12682-8
_version_ 1784712965278138368
author Delgado, Rosario
Núñez-González, J. David
author_facet Delgado, Rosario
Núñez-González, J. David
author_sort Delgado, Rosario
collection PubMed
description Traditional supervised learning algorithms do not satisfactorily solve the classification problem on imbalanced data sets, since they tend to assign the majority class, to the detriment of the minority class classification. In this paper, we introduce the Bayesian network-based over-sampling method (BOSME), which is a new over-sampling methodology based on Bayesian networks. Over-sampling methods handle imbalanced data by generating synthetic minority instances, with the benefit that classifiers learned from a more balanced data set have a better ability to predict the minority class. What makes BOSME different is that it relies on a new approach, generating artificial instances of the minority class following the probability distribution of a Bayesian network that is learned from the original minority classes by likelihood maximization. We compare BOSME with the benchmark synthetic minority over-sampling technique (SMOTE) through a series of experiments in the context of indirect cost-sensitive learning, with some state-of-the-art classifiers and various data sets, showing statistical evidence in favor of BOSME, with respect to the expected (misclassification) cost.
format Online
Article
Text
id pubmed-9130330
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-91303302022-05-26 Bayesian network-based over-sampling method (BOSME) with application to indirect cost-sensitive learning Delgado, Rosario Núñez-González, J. David Sci Rep Article Traditional supervised learning algorithms do not satisfactorily solve the classification problem on imbalanced data sets, since they tend to assign the majority class, to the detriment of the minority class classification. In this paper, we introduce the Bayesian network-based over-sampling method (BOSME), which is a new over-sampling methodology based on Bayesian networks. Over-sampling methods handle imbalanced data by generating synthetic minority instances, with the benefit that classifiers learned from a more balanced data set have a better ability to predict the minority class. What makes BOSME different is that it relies on a new approach, generating artificial instances of the minority class following the probability distribution of a Bayesian network that is learned from the original minority classes by likelihood maximization. We compare BOSME with the benchmark synthetic minority over-sampling technique (SMOTE) through a series of experiments in the context of indirect cost-sensitive learning, with some state-of-the-art classifiers and various data sets, showing statistical evidence in favor of BOSME, with respect to the expected (misclassification) cost. Nature Publishing Group UK 2022-05-24 /pmc/articles/PMC9130330/ /pubmed/35610323 http://dx.doi.org/10.1038/s41598-022-12682-8 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Delgado, Rosario
Núñez-González, J. David
Bayesian network-based over-sampling method (BOSME) with application to indirect cost-sensitive learning
title Bayesian network-based over-sampling method (BOSME) with application to indirect cost-sensitive learning
title_full Bayesian network-based over-sampling method (BOSME) with application to indirect cost-sensitive learning
title_fullStr Bayesian network-based over-sampling method (BOSME) with application to indirect cost-sensitive learning
title_full_unstemmed Bayesian network-based over-sampling method (BOSME) with application to indirect cost-sensitive learning
title_short Bayesian network-based over-sampling method (BOSME) with application to indirect cost-sensitive learning
title_sort bayesian network-based over-sampling method (bosme) with application to indirect cost-sensitive learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9130330/
https://www.ncbi.nlm.nih.gov/pubmed/35610323
http://dx.doi.org/10.1038/s41598-022-12682-8
work_keys_str_mv AT delgadorosario bayesiannetworkbasedoversamplingmethodbosmewithapplicationtoindirectcostsensitivelearning
AT nunezgonzalezjdavid bayesiannetworkbasedoversamplingmethodbosmewithapplicationtoindirectcostsensitivelearning