Cargando…

On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles

It has been shown that unsupervised outlier detection methods can be adapted to the one-class classification problem (Janssens and Postma, in: Proceedings of the 18th annual Belgian-Dutch on machine learning, pp 56–64, 2009; Janssens et al. in: Proceedings of the 2009 ICMLA international conference...

Descripción completa

Detalles Bibliográficos
Autores principales:	Marques, Henrique O., Swersky, Lorne, Sander, Jörg, Campello, Ricardo J. G. B., Zimek, Arthur
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer US 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10326160/ https://www.ncbi.nlm.nih.gov/pubmed/37424877 http://dx.doi.org/10.1007/s10618-023-00931-x

_version_	1785069370109591552
author	Marques, Henrique O. Swersky, Lorne Sander, Jörg Campello, Ricardo J. G. B. Zimek, Arthur
author_facet	Marques, Henrique O. Swersky, Lorne Sander, Jörg Campello, Ricardo J. G. B. Zimek, Arthur
author_sort	Marques, Henrique O.
collection	PubMed
description	It has been shown that unsupervised outlier detection methods can be adapted to the one-class classification problem (Janssens and Postma, in: Proceedings of the 18th annual Belgian-Dutch on machine learning, pp 56–64, 2009; Janssens et al. in: Proceedings of the 2009 ICMLA international conference on machine learning and applications, IEEE Computer Society, pp 147–153, 2009. 10.1109/ICMLA.2009.16). In this paper, we focus on the comparison of one-class classification algorithms with such adapted unsupervised outlier detection methods, improving on previous comparison studies in several important aspects. We study a number of one-class classification and unsupervised outlier detection methods in a rigorous experimental setup, comparing them on a large number of datasets with different characteristics, using different performance measures. In contrast to previous comparison studies, where the models (algorithms, parameters) are selected by using examples from both classes (outlier and inlier), here we also study and compare different approaches for model selection in the absence of examples from the outlier class, which is more realistic for practical applications since labeled outliers are rarely available. Our results showed that, overall, SVDD and GMM are top-performers, regardless of whether the ground truth is used for parameter selection or not. However, in specific application scenarios, other methods exhibited better performance. Combining one-class classifiers into ensembles showed better performance than individual methods in terms of accuracy, as long as the ensemble members are properly selected. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s10618-023-00931-x.
format	Online Article Text
id	pubmed-10326160
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Springer US
record_format	MEDLINE/PubMed
spelling	pubmed-103261602023-07-08 On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles Marques, Henrique O. Swersky, Lorne Sander, Jörg Campello, Ricardo J. G. B. Zimek, Arthur Data Min Knowl Discov Article It has been shown that unsupervised outlier detection methods can be adapted to the one-class classification problem (Janssens and Postma, in: Proceedings of the 18th annual Belgian-Dutch on machine learning, pp 56–64, 2009; Janssens et al. in: Proceedings of the 2009 ICMLA international conference on machine learning and applications, IEEE Computer Society, pp 147–153, 2009. 10.1109/ICMLA.2009.16). In this paper, we focus on the comparison of one-class classification algorithms with such adapted unsupervised outlier detection methods, improving on previous comparison studies in several important aspects. We study a number of one-class classification and unsupervised outlier detection methods in a rigorous experimental setup, comparing them on a large number of datasets with different characteristics, using different performance measures. In contrast to previous comparison studies, where the models (algorithms, parameters) are selected by using examples from both classes (outlier and inlier), here we also study and compare different approaches for model selection in the absence of examples from the outlier class, which is more realistic for practical applications since labeled outliers are rarely available. Our results showed that, overall, SVDD and GMM are top-performers, regardless of whether the ground truth is used for parameter selection or not. However, in specific application scenarios, other methods exhibited better performance. Combining one-class classifiers into ensembles showed better performance than individual methods in terms of accuracy, as long as the ensemble members are properly selected. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s10618-023-00931-x. Springer US 2023-05-16 2023 /pmc/articles/PMC10326160/ /pubmed/37424877 http://dx.doi.org/10.1007/s10618-023-00931-x Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Marques, Henrique O. Swersky, Lorne Sander, Jörg Campello, Ricardo J. G. B. Zimek, Arthur On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles
title	On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles
title_full	On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles
title_fullStr	On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles
title_full_unstemmed	On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles
title_short	On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles
title_sort	on the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10326160/ https://www.ncbi.nlm.nih.gov/pubmed/37424877 http://dx.doi.org/10.1007/s10618-023-00931-x
work_keys_str_mv	AT marqueshenriqueo ontheevaluationofoutlierdetectionandoneclassclassificationacomparativestudyofalgorithmsmodelselectionandensembles AT swerskylorne ontheevaluationofoutlierdetectionandoneclassclassificationacomparativestudyofalgorithmsmodelselectionandensembles AT sanderjorg ontheevaluationofoutlierdetectionandoneclassclassificationacomparativestudyofalgorithmsmodelselectionandensembles AT campelloricardojgb ontheevaluationofoutlierdetectionandoneclassclassificationacomparativestudyofalgorithmsmodelselectionandensembles AT zimekarthur ontheevaluationofoutlierdetectionandoneclassclassificationacomparativestudyofalgorithmsmodelselectionandensembles

On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles

Ejemplares similares