Cargando…
On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles
It has been shown that unsupervised outlier detection methods can be adapted to the one-class classification problem (Janssens and Postma, in: Proceedings of the 18th annual Belgian-Dutch on machine learning, pp 56–64, 2009; Janssens et al. in: Proceedings of the 2009 ICMLA international conference...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer US
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10326160/ https://www.ncbi.nlm.nih.gov/pubmed/37424877 http://dx.doi.org/10.1007/s10618-023-00931-x |
_version_ | 1785069370109591552 |
---|---|
author | Marques, Henrique O. Swersky, Lorne Sander, Jörg Campello, Ricardo J. G. B. Zimek, Arthur |
author_facet | Marques, Henrique O. Swersky, Lorne Sander, Jörg Campello, Ricardo J. G. B. Zimek, Arthur |
author_sort | Marques, Henrique O. |
collection | PubMed |
description | It has been shown that unsupervised outlier detection methods can be adapted to the one-class classification problem (Janssens and Postma, in: Proceedings of the 18th annual Belgian-Dutch on machine learning, pp 56–64, 2009; Janssens et al. in: Proceedings of the 2009 ICMLA international conference on machine learning and applications, IEEE Computer Society, pp 147–153, 2009. 10.1109/ICMLA.2009.16). In this paper, we focus on the comparison of one-class classification algorithms with such adapted unsupervised outlier detection methods, improving on previous comparison studies in several important aspects. We study a number of one-class classification and unsupervised outlier detection methods in a rigorous experimental setup, comparing them on a large number of datasets with different characteristics, using different performance measures. In contrast to previous comparison studies, where the models (algorithms, parameters) are selected by using examples from both classes (outlier and inlier), here we also study and compare different approaches for model selection in the absence of examples from the outlier class, which is more realistic for practical applications since labeled outliers are rarely available. Our results showed that, overall, SVDD and GMM are top-performers, regardless of whether the ground truth is used for parameter selection or not. However, in specific application scenarios, other methods exhibited better performance. Combining one-class classifiers into ensembles showed better performance than individual methods in terms of accuracy, as long as the ensemble members are properly selected. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s10618-023-00931-x. |
format | Online Article Text |
id | pubmed-10326160 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Springer US |
record_format | MEDLINE/PubMed |
spelling | pubmed-103261602023-07-08 On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles Marques, Henrique O. Swersky, Lorne Sander, Jörg Campello, Ricardo J. G. B. Zimek, Arthur Data Min Knowl Discov Article It has been shown that unsupervised outlier detection methods can be adapted to the one-class classification problem (Janssens and Postma, in: Proceedings of the 18th annual Belgian-Dutch on machine learning, pp 56–64, 2009; Janssens et al. in: Proceedings of the 2009 ICMLA international conference on machine learning and applications, IEEE Computer Society, pp 147–153, 2009. 10.1109/ICMLA.2009.16). In this paper, we focus on the comparison of one-class classification algorithms with such adapted unsupervised outlier detection methods, improving on previous comparison studies in several important aspects. We study a number of one-class classification and unsupervised outlier detection methods in a rigorous experimental setup, comparing them on a large number of datasets with different characteristics, using different performance measures. In contrast to previous comparison studies, where the models (algorithms, parameters) are selected by using examples from both classes (outlier and inlier), here we also study and compare different approaches for model selection in the absence of examples from the outlier class, which is more realistic for practical applications since labeled outliers are rarely available. Our results showed that, overall, SVDD and GMM are top-performers, regardless of whether the ground truth is used for parameter selection or not. However, in specific application scenarios, other methods exhibited better performance. Combining one-class classifiers into ensembles showed better performance than individual methods in terms of accuracy, as long as the ensemble members are properly selected. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s10618-023-00931-x. Springer US 2023-05-16 2023 /pmc/articles/PMC10326160/ /pubmed/37424877 http://dx.doi.org/10.1007/s10618-023-00931-x Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Marques, Henrique O. Swersky, Lorne Sander, Jörg Campello, Ricardo J. G. B. Zimek, Arthur On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles |
title | On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles |
title_full | On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles |
title_fullStr | On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles |
title_full_unstemmed | On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles |
title_short | On the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles |
title_sort | on the evaluation of outlier detection and one-class classification: a comparative study of algorithms, model selection, and ensembles |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10326160/ https://www.ncbi.nlm.nih.gov/pubmed/37424877 http://dx.doi.org/10.1007/s10618-023-00931-x |
work_keys_str_mv | AT marqueshenriqueo ontheevaluationofoutlierdetectionandoneclassclassificationacomparativestudyofalgorithmsmodelselectionandensembles AT swerskylorne ontheevaluationofoutlierdetectionandoneclassclassificationacomparativestudyofalgorithmsmodelselectionandensembles AT sanderjorg ontheevaluationofoutlierdetectionandoneclassclassificationacomparativestudyofalgorithmsmodelselectionandensembles AT campelloricardojgb ontheevaluationofoutlierdetectionandoneclassclassificationacomparativestudyofalgorithmsmodelselectionandensembles AT zimekarthur ontheevaluationofoutlierdetectionandoneclassclassificationacomparativestudyofalgorithmsmodelselectionandensembles |