Cargando…

Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics

Machine learning classification algorithms are widely used for the prediction and classification of the different properties of molecules such as toxicity or biological activity. The prediction of toxic vs. non-toxic molecules is important due to testing on living animals, which has ethical and cost...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rácz, Anita, Bajusz, Dávid, Héberger, Károly
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2019
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6695655/ https://www.ncbi.nlm.nih.gov/pubmed/31374986 http://dx.doi.org/10.3390/molecules24152811

_version_	1783444086259187712
author	Rácz, Anita Bajusz, Dávid Héberger, Károly
author_facet	Rácz, Anita Bajusz, Dávid Héberger, Károly
author_sort	Rácz, Anita
collection	PubMed
description	Machine learning classification algorithms are widely used for the prediction and classification of the different properties of molecules such as toxicity or biological activity. The prediction of toxic vs. non-toxic molecules is important due to testing on living animals, which has ethical and cost drawbacks as well. The quality of classification models can be determined with several performance parameters. which often give conflicting results. In this study, we performed a multi-level comparison with the use of different performance metrics and machine learning classification methods. Well-established and standardized protocols for the machine learning tasks were used in each case. The comparison was applied to three datasets (acute and aquatic toxicities) and the robust, yet sensitive, sum of ranking differences (SRD) and analysis of variance (ANOVA) were applied for evaluation. The effect of dataset composition (balanced vs. imbalanced) and 2-class vs. multiclass classification scenarios was also studied. Most of the performance metrics are sensitive to dataset composition, especially in 2-class classification problems. The optimal machine learning algorithm also depends significantly on the composition of the dataset.
format	Online Article Text
id	pubmed-6695655
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-66956552019-09-05 Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics Rácz, Anita Bajusz, Dávid Héberger, Károly Molecules Article Machine learning classification algorithms are widely used for the prediction and classification of the different properties of molecules such as toxicity or biological activity. The prediction of toxic vs. non-toxic molecules is important due to testing on living animals, which has ethical and cost drawbacks as well. The quality of classification models can be determined with several performance parameters. which often give conflicting results. In this study, we performed a multi-level comparison with the use of different performance metrics and machine learning classification methods. Well-established and standardized protocols for the machine learning tasks were used in each case. The comparison was applied to three datasets (acute and aquatic toxicities) and the robust, yet sensitive, sum of ranking differences (SRD) and analysis of variance (ANOVA) were applied for evaluation. The effect of dataset composition (balanced vs. imbalanced) and 2-class vs. multiclass classification scenarios was also studied. Most of the performance metrics are sensitive to dataset composition, especially in 2-class classification problems. The optimal machine learning algorithm also depends significantly on the composition of the dataset. MDPI 2019-08-01 /pmc/articles/PMC6695655/ /pubmed/31374986 http://dx.doi.org/10.3390/molecules24152811 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Rácz, Anita Bajusz, Dávid Héberger, Károly Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics
title	Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics
title_full	Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics
title_fullStr	Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics
title_full_unstemmed	Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics
title_short	Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics
title_sort	multi-level comparison of machine learning classifiers and their performance metrics
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6695655/ https://www.ncbi.nlm.nih.gov/pubmed/31374986 http://dx.doi.org/10.3390/molecules24152811
work_keys_str_mv	AT raczanita multilevelcomparisonofmachinelearningclassifiersandtheirperformancemetrics AT bajuszdavid multilevelcomparisonofmachinelearningclassifiersandtheirperformancemetrics AT hebergerkaroly multilevelcomparisonofmachinelearningclassifiersandtheirperformancemetrics

Multi-Level Comparison of Machine Learning Classifiers and Their Performance Metrics

Ejemplares similares