Cargando…

Comprehensive Decision Tree Models in Bioinformatics

PURPOSE: Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model a...

Descripción completa

Detalles Bibliográficos
Autores principales: Stiglic, Gregor, Kocbek, Simon, Pernek, Igor, Kokol, Peter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3316502/
https://www.ncbi.nlm.nih.gov/pubmed/22479449
http://dx.doi.org/10.1371/journal.pone.0033812
_version_ 1782228420479942656
author Stiglic, Gregor
Kocbek, Simon
Pernek, Igor
Kokol, Peter
author_facet Stiglic, Gregor
Kocbek, Simon
Pernek, Igor
Kokol, Peter
author_sort Stiglic, Gregor
collection PubMed
description PURPOSE: Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. METHODS: This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. RESULTS: The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. CONCLUSIONS: The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics.
format Online
Article
Text
id pubmed-3316502
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-33165022012-04-04 Comprehensive Decision Tree Models in Bioinformatics Stiglic, Gregor Kocbek, Simon Pernek, Igor Kokol, Peter PLoS One Research Article PURPOSE: Classification is an important and widely used machine learning technique in bioinformatics. Researchers and other end-users of machine learning software often prefer to work with comprehensible models where knowledge extraction and explanation of reasoning behind the classification model are possible. METHODS: This paper presents an extension to an existing machine learning environment and a study on visual tuning of decision tree classifiers. The motivation for this research comes from the need to build effective and easily interpretable decision tree models by so called one-button data mining approach where no parameter tuning is needed. To avoid bias in classification, no classification performance measure is used during the tuning of the model that is constrained exclusively by the dimensions of the produced decision tree. RESULTS: The proposed visual tuning of decision trees was evaluated on 40 datasets containing classical machine learning problems and 31 datasets from the field of bioinformatics. Although we did not expected significant differences in classification performance, the results demonstrate a significant increase of accuracy in less complex visually tuned decision trees. In contrast to classical machine learning benchmarking datasets, we observe higher accuracy gains in bioinformatics datasets. Additionally, a user study was carried out to confirm the assumption that the tree tuning times are significantly lower for the proposed method in comparison to manual tuning of the decision tree. CONCLUSIONS: The empirical results demonstrate that by building simple models constrained by predefined visual boundaries, one not only achieves good comprehensibility, but also very good classification performance that does not differ from usually more complex models built using default settings of the classical decision tree algorithm. In addition, our study demonstrates the suitability of visually tuned decision trees for datasets with binary class attributes and a high number of possibly redundant attributes that are very common in bioinformatics. Public Library of Science 2012-03-30 /pmc/articles/PMC3316502/ /pubmed/22479449 http://dx.doi.org/10.1371/journal.pone.0033812 Text en Stiglic et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Stiglic, Gregor
Kocbek, Simon
Pernek, Igor
Kokol, Peter
Comprehensive Decision Tree Models in Bioinformatics
title Comprehensive Decision Tree Models in Bioinformatics
title_full Comprehensive Decision Tree Models in Bioinformatics
title_fullStr Comprehensive Decision Tree Models in Bioinformatics
title_full_unstemmed Comprehensive Decision Tree Models in Bioinformatics
title_short Comprehensive Decision Tree Models in Bioinformatics
title_sort comprehensive decision tree models in bioinformatics
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3316502/
https://www.ncbi.nlm.nih.gov/pubmed/22479449
http://dx.doi.org/10.1371/journal.pone.0033812
work_keys_str_mv AT stiglicgregor comprehensivedecisiontreemodelsinbioinformatics
AT kocbeksimon comprehensivedecisiontreemodelsinbioinformatics
AT pernekigor comprehensivedecisiontreemodelsinbioinformatics
AT kokolpeter comprehensivedecisiontreemodelsinbioinformatics