Cargando…

On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles

It has been shown that unsupervised outlier detection methods can be adapted to the one-class classification problem (Janssens and Postma, in: Proceedings of the 18th annual Belgian-Dutch on machine learning, pp 56–64, 2009; Janssens et al. in: Proceedings of the 2009 ICMLA international conference...

Descripción completa

Detalles Bibliográficos
Autores principales: Marques, Henrique O., Swersky, Lorne, Sander, Jörg, Campello, Ricardo J. G. B., Zimek, Arthur
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10326160/
https://www.ncbi.nlm.nih.gov/pubmed/37424877
http://dx.doi.org/10.1007/s10618-023-00931-x
_version_ 1785069370109591552
author Marques, Henrique O.
Swersky, Lorne
Sander, Jörg
Campello, Ricardo J. G. B.
Zimek, Arthur
author_facet Marques, Henrique O.
Swersky, Lorne
Sander, Jörg
Campello, Ricardo J. G. B.
Zimek, Arthur
author_sort Marques, Henrique O.
collection PubMed
description It has been shown that unsupervised outlier detection methods can be adapted to the one-class classification problem (Janssens and Postma, in: Proceedings of the 18th annual Belgian-Dutch on machine learning, pp 56–64, 2009; Janssens et al. in: Proceedings of the 2009 ICMLA international conference on machine learning and applications, IEEE Computer Society, pp 147–153, 2009. 10.1109/ICMLA.2009.16). In this paper, we focus on the comparison of one-class classification algorithms with such adapted unsupervised outlier detection methods, improving on previous comparison studies in several important aspects. We study a number of one-class classification and unsupervised outlier detection methods in a rigorous experimental setup, comparing them on a large number of datasets with different characteristics, using different performance measures. In contrast to previous comparison studies, where the models (algorithms, parameters) are selected by using examples from both classes (outlier and inlier), here we also study and compare different approaches for model selection in the absence of examples from the outlier class, which is more realistic for practical applications since labeled outliers are rarely available. Our results showed that, overall, SVDD and GMM are top-performers, regardless of whether the ground truth is used for parameter selection or not. However, in specific application scenarios, other methods exhibited better performance. Combining one-class classifiers into ensembles showed better performance than individual methods in terms of accuracy, as long as the ensemble members are properly selected. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s10618-023-00931-x.
format Online
Article
Text
id pubmed-10326160
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-103261602023-07-08 On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles Marques, Henrique O. Swersky, Lorne Sander, Jörg Campello, Ricardo J. G. B. Zimek, Arthur Data Min Knowl Discov Article It has been shown that unsupervised outlier detection methods can be adapted to the one-class classification problem (Janssens and Postma, in: Proceedings of the 18th annual Belgian-Dutch on machine learning, pp 56–64, 2009; Janssens et al. in: Proceedings of the 2009 ICMLA international conference on machine learning and applications, IEEE Computer Society, pp 147–153, 2009. 10.1109/ICMLA.2009.16). In this paper, we focus on the comparison of one-class classification algorithms with such adapted unsupervised outlier detection methods, improving on previous comparison studies in several important aspects. We study a number of one-class classification and unsupervised outlier detection methods in a rigorous experimental setup, comparing them on a large number of datasets with different characteristics, using different performance measures. In contrast to previous comparison studies, where the models (algorithms, parameters) are selected by using examples from both classes (outlier and inlier), here we also study and compare different approaches for model selection in the absence of examples from the outlier class, which is more realistic for practical applications since labeled outliers are rarely available. Our results showed that, overall, SVDD and GMM are top-performers, regardless of whether the ground truth is used for parameter selection or not. However, in specific application scenarios, other methods exhibited better performance. Combining one-class classifiers into ensembles showed better performance than individual methods in terms of accuracy, as long as the ensemble members are properly selected. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s10618-023-00931-x. Springer US 2023-05-16 2023 /pmc/articles/PMC10326160/ /pubmed/37424877 http://dx.doi.org/10.1007/s10618-023-00931-x Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Marques, Henrique O.
Swersky, Lorne
Sander, Jörg
Campello, Ricardo J. G. B.
Zimek, Arthur
On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles
title On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles
title_full On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles
title_fullStr On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles
title_full_unstemmed On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles
title_short On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles
title_sort on the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10326160/
https://www.ncbi.nlm.nih.gov/pubmed/37424877
http://dx.doi.org/10.1007/s10618-023-00931-x
work_keys_str_mv AT marqueshenriqueo ontheevaluationofoutlierdetectionandoneclassclassificationacomparativestudyofalgorithmsmodelselectionandensembles
AT swerskylorne ontheevaluationofoutlierdetectionandoneclassclassificationacomparativestudyofalgorithmsmodelselectionandensembles
AT sanderjorg ontheevaluationofoutlierdetectionandoneclassclassificationacomparativestudyofalgorithmsmodelselectionandensembles
AT campelloricardojgb ontheevaluationofoutlierdetectionandoneclassclassificationacomparativestudyofalgorithmsmodelselectionandensembles
AT zimekarthur ontheevaluationofoutlierdetectionandoneclassclassificationacomparativestudyofalgorithmsmodelselectionandensembles