Cargando…

Optimal Subgroup Discovery in Purely Numerical Data

Subgroup discovery in labeled data is the task of discovering patterns in the description space of objects to find subsets of objects whose labels show an interesting distribution, for example the disproportionate representation of a label value. Discovering interesting subgroups in purely numerical...

Descripción completa

Detalles Bibliográficos
Autores principales: Millot, Alexandre, Cazabet, Rémy, Boulicaut, Jean-François
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206336/
http://dx.doi.org/10.1007/978-3-030-47436-2_9
_version_ 1783530395288993792
author Millot, Alexandre
Cazabet, Rémy
Boulicaut, Jean-François
author_facet Millot, Alexandre
Cazabet, Rémy
Boulicaut, Jean-François
author_sort Millot, Alexandre
collection PubMed
description Subgroup discovery in labeled data is the task of discovering patterns in the description space of objects to find subsets of objects whose labels show an interesting distribution, for example the disproportionate representation of a label value. Discovering interesting subgroups in purely numerical data - attributes and target label - has received little attention so far. Existing methods make use of discretization methods that lead to a loss of information and suboptimal results. This is the case for the reference algorithm SD-Map*. We consider here the discovery of optimal subgroups according to an interestingness measure in purely numerical data. We leverage the concept of closed interval patterns and advanced enumeration and pruning techniques. The performances of our algorithm are studied empirically and its added-value w.r.t. SD-Map* is illustrated.
format Online
Article
Text
id pubmed-7206336
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-72063362020-05-08 Optimal Subgroup Discovery in Purely Numerical Data Millot, Alexandre Cazabet, Rémy Boulicaut, Jean-François Advances in Knowledge Discovery and Data Mining Article Subgroup discovery in labeled data is the task of discovering patterns in the description space of objects to find subsets of objects whose labels show an interesting distribution, for example the disproportionate representation of a label value. Discovering interesting subgroups in purely numerical data - attributes and target label - has received little attention so far. Existing methods make use of discretization methods that lead to a loss of information and suboptimal results. This is the case for the reference algorithm SD-Map*. We consider here the discovery of optimal subgroups according to an interestingness measure in purely numerical data. We leverage the concept of closed interval patterns and advanced enumeration and pruning techniques. The performances of our algorithm are studied empirically and its added-value w.r.t. SD-Map* is illustrated. 2020-04-17 /pmc/articles/PMC7206336/ http://dx.doi.org/10.1007/978-3-030-47436-2_9 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Millot, Alexandre
Cazabet, Rémy
Boulicaut, Jean-François
Optimal Subgroup Discovery in Purely Numerical Data
title Optimal Subgroup Discovery in Purely Numerical Data
title_full Optimal Subgroup Discovery in Purely Numerical Data
title_fullStr Optimal Subgroup Discovery in Purely Numerical Data
title_full_unstemmed Optimal Subgroup Discovery in Purely Numerical Data
title_short Optimal Subgroup Discovery in Purely Numerical Data
title_sort optimal subgroup discovery in purely numerical data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7206336/
http://dx.doi.org/10.1007/978-3-030-47436-2_9
work_keys_str_mv AT millotalexandre optimalsubgroupdiscoveryinpurelynumericaldata
AT cazabetremy optimalsubgroupdiscoveryinpurelynumericaldata
AT boulicautjeanfrancois optimalsubgroupdiscoveryinpurelynumericaldata