Cargando…

Detecting Group Anomalies in Tera-Scale Multi-Aspect Data via Dense-Subtensor Mining

How can we detect fraudulent lockstep behavior in large-scale multi-aspect data (i.e., tensors)? Can we detect it when data are too large to fit in memory or even on a disk? Past studies have shown that dense subtensors in real-world tensors (e.g., social media, Wikipedia, TCP dumps, etc.) signal an...

Descripción completa

Detalles Bibliográficos
Autores principales: Shin, Kijung, Hooi, Bryan, Kim, Jisu, Faloutsos, Christos
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8118605/
https://www.ncbi.nlm.nih.gov/pubmed/33997776
http://dx.doi.org/10.3389/fdata.2020.594302
_version_ 1783691780192993280
author Shin, Kijung
Hooi, Bryan
Kim, Jisu
Faloutsos, Christos
author_facet Shin, Kijung
Hooi, Bryan
Kim, Jisu
Faloutsos, Christos
author_sort Shin, Kijung
collection PubMed
description How can we detect fraudulent lockstep behavior in large-scale multi-aspect data (i.e., tensors)? Can we detect it when data are too large to fit in memory or even on a disk? Past studies have shown that dense subtensors in real-world tensors (e.g., social media, Wikipedia, TCP dumps, etc.) signal anomalous or fraudulent behavior such as retweet boosting, bot activities, and network attacks. Thus, various approaches, including tensor decomposition and search, have been proposed for detecting dense subtensors rapidly and accurately. However, existing methods suffer from low accuracy, or they assume that tensors are small enough to fit in main memory, which is unrealistic in many real-world applications such as social media and web. To overcome these limitations, we propose D-Cube, a disk-based dense-subtensor detection method, which also can run in a distributed manner across multiple machines. Compared to state-of-the-art methods, D-Cube is (1) Memory Efficient: requires up to 1,561× less memory and handles 1,000× larger data (2.6TB), (2) Fast: up to 7× faster due to its near-linear scalability, (3) Provably Accurate: gives a guarantee on the densities of the detected subtensors, and (4) Effective: spotted network attacks from TCP dumps and synchronized behavior in rating data most accurately.
format Online
Article
Text
id pubmed-8118605
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-81186052021-05-14 Detecting Group Anomalies in Tera-Scale Multi-Aspect Data via Dense-Subtensor Mining Shin, Kijung Hooi, Bryan Kim, Jisu Faloutsos, Christos Front Big Data Big Data How can we detect fraudulent lockstep behavior in large-scale multi-aspect data (i.e., tensors)? Can we detect it when data are too large to fit in memory or even on a disk? Past studies have shown that dense subtensors in real-world tensors (e.g., social media, Wikipedia, TCP dumps, etc.) signal anomalous or fraudulent behavior such as retweet boosting, bot activities, and network attacks. Thus, various approaches, including tensor decomposition and search, have been proposed for detecting dense subtensors rapidly and accurately. However, existing methods suffer from low accuracy, or they assume that tensors are small enough to fit in main memory, which is unrealistic in many real-world applications such as social media and web. To overcome these limitations, we propose D-Cube, a disk-based dense-subtensor detection method, which also can run in a distributed manner across multiple machines. Compared to state-of-the-art methods, D-Cube is (1) Memory Efficient: requires up to 1,561× less memory and handles 1,000× larger data (2.6TB), (2) Fast: up to 7× faster due to its near-linear scalability, (3) Provably Accurate: gives a guarantee on the densities of the detected subtensors, and (4) Effective: spotted network attacks from TCP dumps and synchronized behavior in rating data most accurately. Frontiers Media S.A. 2021-04-29 /pmc/articles/PMC8118605/ /pubmed/33997776 http://dx.doi.org/10.3389/fdata.2020.594302 Text en Copyright © 2021 Shin, Hooi, Kim and Faloutsos. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Big Data
Shin, Kijung
Hooi, Bryan
Kim, Jisu
Faloutsos, Christos
Detecting Group Anomalies in Tera-Scale Multi-Aspect Data via Dense-Subtensor Mining
title Detecting Group Anomalies in Tera-Scale Multi-Aspect Data via Dense-Subtensor Mining
title_full Detecting Group Anomalies in Tera-Scale Multi-Aspect Data via Dense-Subtensor Mining
title_fullStr Detecting Group Anomalies in Tera-Scale Multi-Aspect Data via Dense-Subtensor Mining
title_full_unstemmed Detecting Group Anomalies in Tera-Scale Multi-Aspect Data via Dense-Subtensor Mining
title_short Detecting Group Anomalies in Tera-Scale Multi-Aspect Data via Dense-Subtensor Mining
title_sort detecting group anomalies in tera-scale multi-aspect data via dense-subtensor mining
topic Big Data
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8118605/
https://www.ncbi.nlm.nih.gov/pubmed/33997776
http://dx.doi.org/10.3389/fdata.2020.594302
work_keys_str_mv AT shinkijung detectinggroupanomaliesinterascalemultiaspectdataviadensesubtensormining
AT hooibryan detectinggroupanomaliesinterascalemultiaspectdataviadensesubtensormining
AT kimjisu detectinggroupanomaliesinterascalemultiaspectdataviadensesubtensormining
AT faloutsoschristos detectinggroupanomaliesinterascalemultiaspectdataviadensesubtensormining