Cargando…

An Analysis of Android Malware Classification Services

The increasing number of Android malware forced antivirus (AV) companies to rely on automated classification techniques to determine the family and class of suspicious samples. The research community relies heavily on such labels to carry out prevalence studies of the threat ecosystem and to build d...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rashed, Mohammed, Suarez-Tangil, Guillermo
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2021
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8402456/ https://www.ncbi.nlm.nih.gov/pubmed/34451112 http://dx.doi.org/10.3390/s21165671

_version_	1783745794023620608
author	Rashed, Mohammed Suarez-Tangil, Guillermo
author_facet	Rashed, Mohammed Suarez-Tangil, Guillermo
author_sort	Rashed, Mohammed
collection	PubMed
description	The increasing number of Android malware forced antivirus (AV) companies to rely on automated classification techniques to determine the family and class of suspicious samples. The research community relies heavily on such labels to carry out prevalence studies of the threat ecosystem and to build datasets that are used to validate and benchmark novel detection and classification methods. In this work, we carry out an extensive study of the Android malware ecosystem by surveying white papers and reports from 6 key players in the industry, as well as 81 papers from 8 top security conferences, to understand how malware datasets are used by both. We, then, explore the limitations associated with the use of available malware classification services, namely VirusTotal (VT) engines, for determining the family of an Android sample. Using a dataset of 2.47 M Android malware samples, we find that the detection coverage of VT’s AVs is generally very low, that the percentage of samples flagged by any 2 AV engines does not go beyond 52%, and that common families between any pair of AV engines is at best 29%. We rely on clustering to determine the extent to which different AV engine pairs agree upon which samples belong to the same family (regardless of the actual family name) and find that there are discrepancies that can introduce noise in automatic label unification schemes. We also observe the usage of generic labels and inconsistencies within the labels of top AV engines, suggesting that their efforts are directed towards accurate detection rather than classification. Our results contribute to a better understanding of the limitations of using Android malware family labels as supplied by common AV engines.
format	Online Article Text
id	pubmed-8402456
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-84024562021-08-29 An Analysis of Android Malware Classification Services Rashed, Mohammed Suarez-Tangil, Guillermo Sensors (Basel) Article The increasing number of Android malware forced antivirus (AV) companies to rely on automated classification techniques to determine the family and class of suspicious samples. The research community relies heavily on such labels to carry out prevalence studies of the threat ecosystem and to build datasets that are used to validate and benchmark novel detection and classification methods. In this work, we carry out an extensive study of the Android malware ecosystem by surveying white papers and reports from 6 key players in the industry, as well as 81 papers from 8 top security conferences, to understand how malware datasets are used by both. We, then, explore the limitations associated with the use of available malware classification services, namely VirusTotal (VT) engines, for determining the family of an Android sample. Using a dataset of 2.47 M Android malware samples, we find that the detection coverage of VT’s AVs is generally very low, that the percentage of samples flagged by any 2 AV engines does not go beyond 52%, and that common families between any pair of AV engines is at best 29%. We rely on clustering to determine the extent to which different AV engine pairs agree upon which samples belong to the same family (regardless of the actual family name) and find that there are discrepancies that can introduce noise in automatic label unification schemes. We also observe the usage of generic labels and inconsistencies within the labels of top AV engines, suggesting that their efforts are directed towards accurate detection rather than classification. Our results contribute to a better understanding of the limitations of using Android malware family labels as supplied by common AV engines. MDPI 2021-08-23 /pmc/articles/PMC8402456/ /pubmed/34451112 http://dx.doi.org/10.3390/s21165671 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Rashed, Mohammed Suarez-Tangil, Guillermo An Analysis of Android Malware Classification Services
title	An Analysis of Android Malware Classification Services
title_full	An Analysis of Android Malware Classification Services
title_fullStr	An Analysis of Android Malware Classification Services
title_full_unstemmed	An Analysis of Android Malware Classification Services
title_short	An Analysis of Android Malware Classification Services
title_sort	analysis of android malware classification services
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8402456/ https://www.ncbi.nlm.nih.gov/pubmed/34451112 http://dx.doi.org/10.3390/s21165671
work_keys_str_mv	AT rashedmohammed ananalysisofandroidmalwareclassificationservices AT suareztangilguillermo ananalysisofandroidmalwareclassificationservices AT rashedmohammed analysisofandroidmalwareclassificationservices AT suareztangilguillermo analysisofandroidmalwareclassificationservices

An Analysis of Android Malware Classification Services

Ejemplares similares