Cargando…

Approximate Decision Tree Induction over Approximately Engineered Data Features

We propose a simple SQL-based decision tree induction algorithm which makes its heuristic choices how to split the data basing on the results of automatically generated analytical queries. We run this algorithm using standard SQL and the approximate SQL engine which works on granulated data summarie...

Descripción completa

Detalles Bibliográficos
Autores principales: Ślęzak, Dominik, Chądzyńska-Krasowska, Agnieszka
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7338164/
http://dx.doi.org/10.1007/978-3-030-52705-1_28
_version_ 1783554623569657856
author Ślęzak, Dominik
Chądzyńska-Krasowska, Agnieszka
author_facet Ślęzak, Dominik
Chądzyńska-Krasowska, Agnieszka
author_sort Ślęzak, Dominik
collection PubMed
description We propose a simple SQL-based decision tree induction algorithm which makes its heuristic choices how to split the data basing on the results of automatically generated analytical queries. We run this algorithm using standard SQL and the approximate SQL engine which works on granulated data summaries. We compare the accuracy of trees obtained in these two modes on the real-world dataset provided to participants of the Suspicious Network Event Recognition competition organized at IEEE BigData 2019. We investigate whether trees induced using approximate SQL queries – although execution of such queries is incomparably faster – may yield poorer accuracy than in the standard scenario. Next, we investigate features – inputs to the decision tree induction algorithm – derived using SQL from a bigger associated data table which was provided in the aforementioned competition too. As before, we run standard and approximate SQL, although again, that latter mode needs to be checked with respect to the accuracy of trees learnt over the data with approximately extracted features.
format Online
Article
Text
id pubmed-7338164
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-73381642020-07-07 Approximate Decision Tree Induction over Approximately Engineered Data Features Ślęzak, Dominik Chądzyńska-Krasowska, Agnieszka Rough Sets Article We propose a simple SQL-based decision tree induction algorithm which makes its heuristic choices how to split the data basing on the results of automatically generated analytical queries. We run this algorithm using standard SQL and the approximate SQL engine which works on granulated data summaries. We compare the accuracy of trees obtained in these two modes on the real-world dataset provided to participants of the Suspicious Network Event Recognition competition organized at IEEE BigData 2019. We investigate whether trees induced using approximate SQL queries – although execution of such queries is incomparably faster – may yield poorer accuracy than in the standard scenario. Next, we investigate features – inputs to the decision tree induction algorithm – derived using SQL from a bigger associated data table which was provided in the aforementioned competition too. As before, we run standard and approximate SQL, although again, that latter mode needs to be checked with respect to the accuracy of trees learnt over the data with approximately extracted features. 2020-06-10 /pmc/articles/PMC7338164/ http://dx.doi.org/10.1007/978-3-030-52705-1_28 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Ślęzak, Dominik
Chądzyńska-Krasowska, Agnieszka
Approximate Decision Tree Induction over Approximately Engineered Data Features
title Approximate Decision Tree Induction over Approximately Engineered Data Features
title_full Approximate Decision Tree Induction over Approximately Engineered Data Features
title_fullStr Approximate Decision Tree Induction over Approximately Engineered Data Features
title_full_unstemmed Approximate Decision Tree Induction over Approximately Engineered Data Features
title_short Approximate Decision Tree Induction over Approximately Engineered Data Features
title_sort approximate decision tree induction over approximately engineered data features
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7338164/
http://dx.doi.org/10.1007/978-3-030-52705-1_28
work_keys_str_mv AT slezakdominik approximatedecisiontreeinductionoverapproximatelyengineereddatafeatures
AT chadzynskakrasowskaagnieszka approximatedecisiontreeinductionoverapproximatelyengineereddatafeatures