Cargando…

Cluster-Based Improved Isolation Forest

Outlier detection is an important research direction in the field of data mining. Aiming at the problem of unstable detection results and low efficiency caused by randomly dividing features of the data set in the Isolation Forest algorithm in outlier detection, an algorithm CIIF (Cluster-based Impro...

Descripción completa

Detalles Bibliográficos
Autores principales: Shao, Chen, Du, Xusheng, Yu, Jiong, Chen, Jiaying
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9141139/
https://www.ncbi.nlm.nih.gov/pubmed/35626495
http://dx.doi.org/10.3390/e24050611
_version_ 1784715271092568064
author Shao, Chen
Du, Xusheng
Yu, Jiong
Chen, Jiaying
author_facet Shao, Chen
Du, Xusheng
Yu, Jiong
Chen, Jiaying
author_sort Shao, Chen
collection PubMed
description Outlier detection is an important research direction in the field of data mining. Aiming at the problem of unstable detection results and low efficiency caused by randomly dividing features of the data set in the Isolation Forest algorithm in outlier detection, an algorithm CIIF (Cluster-based Improved Isolation Forest) that combines clustering and Isolation Forest is proposed. CIIF first uses the k-means method to cluster the data set, selects a specific cluster to construct a selection matrix based on the results of the clustering, and implements the selection mechanism of the algorithm through the selection matrix; then builds multiple isolation trees. Finally, the outliers are calculated according to the average search length of each sample in different isolation trees, and the Top-n objects with the highest outlier scores are regarded as outliers. Through comparative experiments with six algorithms in eleven real data sets, the results show that the CIIF algorithm has better performance. Compared to the Isolation Forest algorithm, the average AUC (Area under the Curve of ROC) value of our proposed CIIF algorithm is improved by 7%.
format Online
Article
Text
id pubmed-9141139
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-91411392022-05-28 Cluster-Based Improved Isolation Forest Shao, Chen Du, Xusheng Yu, Jiong Chen, Jiaying Entropy (Basel) Article Outlier detection is an important research direction in the field of data mining. Aiming at the problem of unstable detection results and low efficiency caused by randomly dividing features of the data set in the Isolation Forest algorithm in outlier detection, an algorithm CIIF (Cluster-based Improved Isolation Forest) that combines clustering and Isolation Forest is proposed. CIIF first uses the k-means method to cluster the data set, selects a specific cluster to construct a selection matrix based on the results of the clustering, and implements the selection mechanism of the algorithm through the selection matrix; then builds multiple isolation trees. Finally, the outliers are calculated according to the average search length of each sample in different isolation trees, and the Top-n objects with the highest outlier scores are regarded as outliers. Through comparative experiments with six algorithms in eleven real data sets, the results show that the CIIF algorithm has better performance. Compared to the Isolation Forest algorithm, the average AUC (Area under the Curve of ROC) value of our proposed CIIF algorithm is improved by 7%. MDPI 2022-04-27 /pmc/articles/PMC9141139/ /pubmed/35626495 http://dx.doi.org/10.3390/e24050611 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Shao, Chen
Du, Xusheng
Yu, Jiong
Chen, Jiaying
Cluster-Based Improved Isolation Forest
title Cluster-Based Improved Isolation Forest
title_full Cluster-Based Improved Isolation Forest
title_fullStr Cluster-Based Improved Isolation Forest
title_full_unstemmed Cluster-Based Improved Isolation Forest
title_short Cluster-Based Improved Isolation Forest
title_sort cluster-based improved isolation forest
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9141139/
https://www.ncbi.nlm.nih.gov/pubmed/35626495
http://dx.doi.org/10.3390/e24050611
work_keys_str_mv AT shaochen clusterbasedimprovedisolationforest
AT duxusheng clusterbasedimprovedisolationforest
AT yujiong clusterbasedimprovedisolationforest
AT chenjiaying clusterbasedimprovedisolationforest