Cargando…
DISGROU: an algorithm for discontinuous subgroup discovery
In this paper, we focus on the problem of the search for subgroups in numerical data. This approach aims to identify the subsets of objects, called subgroups, which exhibit interesting characteristics compared to the average, according to a quality measure calculated on a target variable. In this ar...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8093955/ https://www.ncbi.nlm.nih.gov/pubmed/33987462 http://dx.doi.org/10.7717/peerj-cs.512 |
_version_ | 1783687922284756992 |
---|---|
author | Eugenie, Reynald Stattner, Erick |
author_facet | Eugenie, Reynald Stattner, Erick |
author_sort | Eugenie, Reynald |
collection | PubMed |
description | In this paper, we focus on the problem of the search for subgroups in numerical data. This approach aims to identify the subsets of objects, called subgroups, which exhibit interesting characteristics compared to the average, according to a quality measure calculated on a target variable. In this article, we present DISGROU, a new approach that identifies subgroups whose attribute intervals may be discontinuous. Unlike the main algorithms in the field, the originality of our proposal lies in the way it breaks down the intervals of the attributes during the subgroup research phase. The basic assumption of our approach is that the range of attributes defining the groups can be disjoint to improve the quality of the identified subgroups. Indeed the traditional methods in the field perform the subgroup search process only over continuous intervals, which results in the identification of subgroups defined over wider intervals thus containing some irrelevant objects that degrade the quality function. In this way, another advantage of our approach is that it does not require a prior discretization of the attributes, since it works directly on the numerical attributes. The efficiency of our proposal is first demonstrated by comparing the results with two algorithms that are references in the field and then by applying to a case study. |
format | Online Article Text |
id | pubmed-8093955 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-80939552021-05-12 DISGROU: an algorithm for discontinuous subgroup discovery Eugenie, Reynald Stattner, Erick PeerJ Comput Sci Algorithms and Analysis of Algorithms In this paper, we focus on the problem of the search for subgroups in numerical data. This approach aims to identify the subsets of objects, called subgroups, which exhibit interesting characteristics compared to the average, according to a quality measure calculated on a target variable. In this article, we present DISGROU, a new approach that identifies subgroups whose attribute intervals may be discontinuous. Unlike the main algorithms in the field, the originality of our proposal lies in the way it breaks down the intervals of the attributes during the subgroup research phase. The basic assumption of our approach is that the range of attributes defining the groups can be disjoint to improve the quality of the identified subgroups. Indeed the traditional methods in the field perform the subgroup search process only over continuous intervals, which results in the identification of subgroups defined over wider intervals thus containing some irrelevant objects that degrade the quality function. In this way, another advantage of our approach is that it does not require a prior discretization of the attributes, since it works directly on the numerical attributes. The efficiency of our proposal is first demonstrated by comparing the results with two algorithms that are references in the field and then by applying to a case study. PeerJ Inc. 2021-04-27 /pmc/articles/PMC8093955/ /pubmed/33987462 http://dx.doi.org/10.7717/peerj-cs.512 Text en © 2021 Eugenie and Stattner https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Algorithms and Analysis of Algorithms Eugenie, Reynald Stattner, Erick DISGROU: an algorithm for discontinuous subgroup discovery |
title | DISGROU: an algorithm for discontinuous subgroup discovery |
title_full | DISGROU: an algorithm for discontinuous subgroup discovery |
title_fullStr | DISGROU: an algorithm for discontinuous subgroup discovery |
title_full_unstemmed | DISGROU: an algorithm for discontinuous subgroup discovery |
title_short | DISGROU: an algorithm for discontinuous subgroup discovery |
title_sort | disgrou: an algorithm for discontinuous subgroup discovery |
topic | Algorithms and Analysis of Algorithms |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8093955/ https://www.ncbi.nlm.nih.gov/pubmed/33987462 http://dx.doi.org/10.7717/peerj-cs.512 |
work_keys_str_mv | AT eugeniereynald disgrouanalgorithmfordiscontinuoussubgroupdiscovery AT stattnererick disgrouanalgorithmfordiscontinuoussubgroupdiscovery |