Cargando…

DISGROU: an algorithm for discontinuous subgroup discovery

In this paper, we focus on the problem of the search for subgroups in numerical data. This approach aims to identify the subsets of objects, called subgroups, which exhibit interesting characteristics compared to the average, according to a quality measure calculated on a target variable. In this ar...

Descripción completa

Detalles Bibliográficos
Autores principales: Eugenie, Reynald, Stattner, Erick
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8093955/
https://www.ncbi.nlm.nih.gov/pubmed/33987462
http://dx.doi.org/10.7717/peerj-cs.512
_version_ 1783687922284756992
author Eugenie, Reynald
Stattner, Erick
author_facet Eugenie, Reynald
Stattner, Erick
author_sort Eugenie, Reynald
collection PubMed
description In this paper, we focus on the problem of the search for subgroups in numerical data. This approach aims to identify the subsets of objects, called subgroups, which exhibit interesting characteristics compared to the average, according to a quality measure calculated on a target variable. In this article, we present DISGROU, a new approach that identifies subgroups whose attribute intervals may be discontinuous. Unlike the main algorithms in the field, the originality of our proposal lies in the way it breaks down the intervals of the attributes during the subgroup research phase. The basic assumption of our approach is that the range of attributes defining the groups can be disjoint to improve the quality of the identified subgroups. Indeed the traditional methods in the field perform the subgroup search process only over continuous intervals, which results in the identification of subgroups defined over wider intervals thus containing some irrelevant objects that degrade the quality function. In this way, another advantage of our approach is that it does not require a prior discretization of the attributes, since it works directly on the numerical attributes. The efficiency of our proposal is first demonstrated by comparing the results with two algorithms that are references in the field and then by applying to a case study.
format Online
Article
Text
id pubmed-8093955
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-80939552021-05-12 DISGROU: an algorithm for discontinuous subgroup discovery Eugenie, Reynald Stattner, Erick PeerJ Comput Sci Algorithms and Analysis of Algorithms In this paper, we focus on the problem of the search for subgroups in numerical data. This approach aims to identify the subsets of objects, called subgroups, which exhibit interesting characteristics compared to the average, according to a quality measure calculated on a target variable. In this article, we present DISGROU, a new approach that identifies subgroups whose attribute intervals may be discontinuous. Unlike the main algorithms in the field, the originality of our proposal lies in the way it breaks down the intervals of the attributes during the subgroup research phase. The basic assumption of our approach is that the range of attributes defining the groups can be disjoint to improve the quality of the identified subgroups. Indeed the traditional methods in the field perform the subgroup search process only over continuous intervals, which results in the identification of subgroups defined over wider intervals thus containing some irrelevant objects that degrade the quality function. In this way, another advantage of our approach is that it does not require a prior discretization of the attributes, since it works directly on the numerical attributes. The efficiency of our proposal is first demonstrated by comparing the results with two algorithms that are references in the field and then by applying to a case study. PeerJ Inc. 2021-04-27 /pmc/articles/PMC8093955/ /pubmed/33987462 http://dx.doi.org/10.7717/peerj-cs.512 Text en © 2021 Eugenie and Stattner https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Algorithms and Analysis of Algorithms
Eugenie, Reynald
Stattner, Erick
DISGROU: an algorithm for discontinuous subgroup discovery
title DISGROU: an algorithm for discontinuous subgroup discovery
title_full DISGROU: an algorithm for discontinuous subgroup discovery
title_fullStr DISGROU: an algorithm for discontinuous subgroup discovery
title_full_unstemmed DISGROU: an algorithm for discontinuous subgroup discovery
title_short DISGROU: an algorithm for discontinuous subgroup discovery
title_sort disgrou: an algorithm for discontinuous subgroup discovery
topic Algorithms and Analysis of Algorithms
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8093955/
https://www.ncbi.nlm.nih.gov/pubmed/33987462
http://dx.doi.org/10.7717/peerj-cs.512
work_keys_str_mv AT eugeniereynald disgrouanalgorithmfordiscontinuoussubgroupdiscovery
AT stattnererick disgrouanalgorithmfordiscontinuoussubgroupdiscovery