Cargando…
Approximate Decision Tree Induction over Approximately Engineered Data Features
We propose a simple SQL-based decision tree induction algorithm which makes its heuristic choices how to split the data basing on the results of automatically generated analytical queries. We run this algorithm using standard SQL and the approximate SQL engine which works on granulated data summarie...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7338164/ http://dx.doi.org/10.1007/978-3-030-52705-1_28 |
_version_ | 1783554623569657856 |
---|---|
author | Ślęzak, Dominik Chądzyńska-Krasowska, Agnieszka |
author_facet | Ślęzak, Dominik Chądzyńska-Krasowska, Agnieszka |
author_sort | Ślęzak, Dominik |
collection | PubMed |
description | We propose a simple SQL-based decision tree induction algorithm which makes its heuristic choices how to split the data basing on the results of automatically generated analytical queries. We run this algorithm using standard SQL and the approximate SQL engine which works on granulated data summaries. We compare the accuracy of trees obtained in these two modes on the real-world dataset provided to participants of the Suspicious Network Event Recognition competition organized at IEEE BigData 2019. We investigate whether trees induced using approximate SQL queries – although execution of such queries is incomparably faster – may yield poorer accuracy than in the standard scenario. Next, we investigate features – inputs to the decision tree induction algorithm – derived using SQL from a bigger associated data table which was provided in the aforementioned competition too. As before, we run standard and approximate SQL, although again, that latter mode needs to be checked with respect to the accuracy of trees learnt over the data with approximately extracted features. |
format | Online Article Text |
id | pubmed-7338164 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-73381642020-07-07 Approximate Decision Tree Induction over Approximately Engineered Data Features Ślęzak, Dominik Chądzyńska-Krasowska, Agnieszka Rough Sets Article We propose a simple SQL-based decision tree induction algorithm which makes its heuristic choices how to split the data basing on the results of automatically generated analytical queries. We run this algorithm using standard SQL and the approximate SQL engine which works on granulated data summaries. We compare the accuracy of trees obtained in these two modes on the real-world dataset provided to participants of the Suspicious Network Event Recognition competition organized at IEEE BigData 2019. We investigate whether trees induced using approximate SQL queries – although execution of such queries is incomparably faster – may yield poorer accuracy than in the standard scenario. Next, we investigate features – inputs to the decision tree induction algorithm – derived using SQL from a bigger associated data table which was provided in the aforementioned competition too. As before, we run standard and approximate SQL, although again, that latter mode needs to be checked with respect to the accuracy of trees learnt over the data with approximately extracted features. 2020-06-10 /pmc/articles/PMC7338164/ http://dx.doi.org/10.1007/978-3-030-52705-1_28 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Ślęzak, Dominik Chądzyńska-Krasowska, Agnieszka Approximate Decision Tree Induction over Approximately Engineered Data Features |
title | Approximate Decision Tree Induction over Approximately Engineered Data Features |
title_full | Approximate Decision Tree Induction over Approximately Engineered Data Features |
title_fullStr | Approximate Decision Tree Induction over Approximately Engineered Data Features |
title_full_unstemmed | Approximate Decision Tree Induction over Approximately Engineered Data Features |
title_short | Approximate Decision Tree Induction over Approximately Engineered Data Features |
title_sort | approximate decision tree induction over approximately engineered data features |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7338164/ http://dx.doi.org/10.1007/978-3-030-52705-1_28 |
work_keys_str_mv | AT slezakdominik approximatedecisiontreeinductionoverapproximatelyengineereddatafeatures AT chadzynskakrasowskaagnieszka approximatedecisiontreeinductionoverapproximatelyengineereddatafeatures |