Cargando…

A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning

In Machine Learning, the datasets used to build models are one of the main factors limiting what these models can achieve and how good their predictive performance is. Machine Learning applications for cyber-security or computer security are numerous including cyber threat mitigation and security in...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mvula, Paul K., Branco, Paula, Jourdan, Guy-Vincent, Viktor, Herna L.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer International Publishing 2023
Materias:	Review
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10079755/ https://www.ncbi.nlm.nih.gov/pubmed/37038388 http://dx.doi.org/10.1007/s44248-023-00003-x

_version_	1785020777788080128
author	Mvula, Paul K. Branco, Paula Jourdan, Guy-Vincent Viktor, Herna L.
author_facet	Mvula, Paul K. Branco, Paula Jourdan, Guy-Vincent Viktor, Herna L.
author_sort	Mvula, Paul K.
collection	PubMed
description	In Machine Learning, the datasets used to build models are one of the main factors limiting what these models can achieve and how good their predictive performance is. Machine Learning applications for cyber-security or computer security are numerous including cyber threat mitigation and security infrastructure enhancement through pattern recognition, real-time attack detection, and in-depth penetration testing. Therefore, for these applications in particular, the datasets used to build the models must be carefully thought to be representative of real-world data. However, because of the scarcity of labelled data and the cost of manually labelling positive examples, there is a growing corpus of literature utilizing Semi-Supervised Learning with cyber-security data repositories. In this work, we provide a comprehensive overview of publicly available data repositories and datasets used for building computer security or cyber-security systems based on Semi-Supervised Learning, where only a few labels are necessary or available for building strong models. We highlight the strengths and limitations of the data repositories and sets and provide an analysis of the performance assessment metrics used to evaluate the built models. Finally, we discuss open challenges and provide future research directions for using cyber-security datasets and evaluating models built upon them.
format	Online Article Text
id	pubmed-10079755
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Springer International Publishing
record_format	MEDLINE/PubMed
spelling	pubmed-100797552023-04-08 A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning Mvula, Paul K. Branco, Paula Jourdan, Guy-Vincent Viktor, Herna L. Discov Data Review In Machine Learning, the datasets used to build models are one of the main factors limiting what these models can achieve and how good their predictive performance is. Machine Learning applications for cyber-security or computer security are numerous including cyber threat mitigation and security infrastructure enhancement through pattern recognition, real-time attack detection, and in-depth penetration testing. Therefore, for these applications in particular, the datasets used to build the models must be carefully thought to be representative of real-world data. However, because of the scarcity of labelled data and the cost of manually labelling positive examples, there is a growing corpus of literature utilizing Semi-Supervised Learning with cyber-security data repositories. In this work, we provide a comprehensive overview of publicly available data repositories and datasets used for building computer security or cyber-security systems based on Semi-Supervised Learning, where only a few labels are necessary or available for building strong models. We highlight the strengths and limitations of the data repositories and sets and provide an analysis of the performance assessment metrics used to evaluate the built models. Finally, we discuss open challenges and provide future research directions for using cyber-security datasets and evaluating models built upon them. Springer International Publishing 2023-04-06 2023 /pmc/articles/PMC10079755/ /pubmed/37038388 http://dx.doi.org/10.1007/s44248-023-00003-x Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Review Mvula, Paul K. Branco, Paula Jourdan, Guy-Vincent Viktor, Herna L. A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning
title	A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning
title_full	A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning
title_fullStr	A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning
title_full_unstemmed	A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning
title_short	A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning
title_sort	systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning
topic	Review
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10079755/ https://www.ncbi.nlm.nih.gov/pubmed/37038388 http://dx.doi.org/10.1007/s44248-023-00003-x
work_keys_str_mv	AT mvulapaulk asystematicliteraturereviewofcybersecuritydatarepositoriesandperformanceassessmentmetricsforsemisupervisedlearning AT brancopaula asystematicliteraturereviewofcybersecuritydatarepositoriesandperformanceassessmentmetricsforsemisupervisedlearning AT jourdanguyvincent asystematicliteraturereviewofcybersecuritydatarepositoriesandperformanceassessmentmetricsforsemisupervisedlearning AT viktorhernal asystematicliteraturereviewofcybersecuritydatarepositoriesandperformanceassessmentmetricsforsemisupervisedlearning AT mvulapaulk systematicliteraturereviewofcybersecuritydatarepositoriesandperformanceassessmentmetricsforsemisupervisedlearning AT brancopaula systematicliteraturereviewofcybersecuritydatarepositoriesandperformanceassessmentmetricsforsemisupervisedlearning AT jourdanguyvincent systematicliteraturereviewofcybersecuritydatarepositoriesandperformanceassessmentmetricsforsemisupervisedlearning AT viktorhernal systematicliteraturereviewofcybersecuritydatarepositoriesandperformanceassessmentmetricsforsemisupervisedlearning

A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning

Ejemplares similares