Cargando…
Optimal Subgroup Discovery in Purely Numerical Data
Subgroup discovery in labeled data is the task of discovering patterns in the description space of objects to find subsets of objects whose labels show an interesting distribution, for example the disproportionate representation of a label value. Discovering interesting subgroups in purely numerical...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206336/ http://dx.doi.org/10.1007/978-3-030-47436-2_9 |
_version_ | 1783530395288993792 |
---|---|
author | Millot, Alexandre Cazabet, Rémy Boulicaut, Jean-François |
author_facet | Millot, Alexandre Cazabet, Rémy Boulicaut, Jean-François |
author_sort | Millot, Alexandre |
collection | PubMed |
description | Subgroup discovery in labeled data is the task of discovering patterns in the description space of objects to find subsets of objects whose labels show an interesting distribution, for example the disproportionate representation of a label value. Discovering interesting subgroups in purely numerical data - attributes and target label - has received little attention so far. Existing methods make use of discretization methods that lead to a loss of information and suboptimal results. This is the case for the reference algorithm SD-Map*. We consider here the discovery of optimal subgroups according to an interestingness measure in purely numerical data. We leverage the concept of closed interval patterns and advanced enumeration and pruning techniques. The performances of our algorithm are studied empirically and its added-value w.r.t. SD-Map* is illustrated. |
format | Online Article Text |
id | pubmed-7206336 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-72063362020-05-08 Optimal Subgroup Discovery in Purely Numerical Data Millot, Alexandre Cazabet, Rémy Boulicaut, Jean-François Advances in Knowledge Discovery and Data Mining Article Subgroup discovery in labeled data is the task of discovering patterns in the description space of objects to find subsets of objects whose labels show an interesting distribution, for example the disproportionate representation of a label value. Discovering interesting subgroups in purely numerical data - attributes and target label - has received little attention so far. Existing methods make use of discretization methods that lead to a loss of information and suboptimal results. This is the case for the reference algorithm SD-Map*. We consider here the discovery of optimal subgroups according to an interestingness measure in purely numerical data. We leverage the concept of closed interval patterns and advanced enumeration and pruning techniques. The performances of our algorithm are studied empirically and its added-value w.r.t. SD-Map* is illustrated. 2020-04-17 /pmc/articles/PMC7206336/ http://dx.doi.org/10.1007/978-3-030-47436-2_9 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Millot, Alexandre Cazabet, Rémy Boulicaut, Jean-François Optimal Subgroup Discovery in Purely Numerical Data |
title | Optimal Subgroup Discovery in Purely Numerical Data |
title_full | Optimal Subgroup Discovery in Purely Numerical Data |
title_fullStr | Optimal Subgroup Discovery in Purely Numerical Data |
title_full_unstemmed | Optimal Subgroup Discovery in Purely Numerical Data |
title_short | Optimal Subgroup Discovery in Purely Numerical Data |
title_sort | optimal subgroup discovery in purely numerical data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206336/ http://dx.doi.org/10.1007/978-3-030-47436-2_9 |
work_keys_str_mv | AT millotalexandre optimalsubgroupdiscoveryinpurelynumericaldata AT cazabetremy optimalsubgroupdiscoveryinpurelynumericaldata AT boulicautjeanfrancois optimalsubgroupdiscoveryinpurelynumericaldata |