Cargando…
A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning
In Machine Learning, the datasets used to build models are one of the main factors limiting what these models can achieve and how good their predictive performance is. Machine Learning applications for cyber-security or computer security are numerous including cyber threat mitigation and security in...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer International Publishing
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10079755/ https://www.ncbi.nlm.nih.gov/pubmed/37038388 http://dx.doi.org/10.1007/s44248-023-00003-x |
_version_ | 1785020777788080128 |
---|---|
author | Mvula, Paul K. Branco, Paula Jourdan, Guy-Vincent Viktor, Herna L. |
author_facet | Mvula, Paul K. Branco, Paula Jourdan, Guy-Vincent Viktor, Herna L. |
author_sort | Mvula, Paul K. |
collection | PubMed |
description | In Machine Learning, the datasets used to build models are one of the main factors limiting what these models can achieve and how good their predictive performance is. Machine Learning applications for cyber-security or computer security are numerous including cyber threat mitigation and security infrastructure enhancement through pattern recognition, real-time attack detection, and in-depth penetration testing. Therefore, for these applications in particular, the datasets used to build the models must be carefully thought to be representative of real-world data. However, because of the scarcity of labelled data and the cost of manually labelling positive examples, there is a growing corpus of literature utilizing Semi-Supervised Learning with cyber-security data repositories. In this work, we provide a comprehensive overview of publicly available data repositories and datasets used for building computer security or cyber-security systems based on Semi-Supervised Learning, where only a few labels are necessary or available for building strong models. We highlight the strengths and limitations of the data repositories and sets and provide an analysis of the performance assessment metrics used to evaluate the built models. Finally, we discuss open challenges and provide future research directions for using cyber-security datasets and evaluating models built upon them. |
format | Online Article Text |
id | pubmed-10079755 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Springer International Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-100797552023-04-08 A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning Mvula, Paul K. Branco, Paula Jourdan, Guy-Vincent Viktor, Herna L. Discov Data Review In Machine Learning, the datasets used to build models are one of the main factors limiting what these models can achieve and how good their predictive performance is. Machine Learning applications for cyber-security or computer security are numerous including cyber threat mitigation and security infrastructure enhancement through pattern recognition, real-time attack detection, and in-depth penetration testing. Therefore, for these applications in particular, the datasets used to build the models must be carefully thought to be representative of real-world data. However, because of the scarcity of labelled data and the cost of manually labelling positive examples, there is a growing corpus of literature utilizing Semi-Supervised Learning with cyber-security data repositories. In this work, we provide a comprehensive overview of publicly available data repositories and datasets used for building computer security or cyber-security systems based on Semi-Supervised Learning, where only a few labels are necessary or available for building strong models. We highlight the strengths and limitations of the data repositories and sets and provide an analysis of the performance assessment metrics used to evaluate the built models. Finally, we discuss open challenges and provide future research directions for using cyber-security datasets and evaluating models built upon them. Springer International Publishing 2023-04-06 2023 /pmc/articles/PMC10079755/ /pubmed/37038388 http://dx.doi.org/10.1007/s44248-023-00003-x Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Review Mvula, Paul K. Branco, Paula Jourdan, Guy-Vincent Viktor, Herna L. A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning |
title | A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning |
title_full | A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning |
title_fullStr | A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning |
title_full_unstemmed | A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning |
title_short | A systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning |
title_sort | systematic literature review of cyber-security data repositories and performance assessment metrics for semi-supervised learning |
topic | Review |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10079755/ https://www.ncbi.nlm.nih.gov/pubmed/37038388 http://dx.doi.org/10.1007/s44248-023-00003-x |
work_keys_str_mv | AT mvulapaulk asystematicliteraturereviewofcybersecuritydatarepositoriesandperformanceassessmentmetricsforsemisupervisedlearning AT brancopaula asystematicliteraturereviewofcybersecuritydatarepositoriesandperformanceassessmentmetricsforsemisupervisedlearning AT jourdanguyvincent asystematicliteraturereviewofcybersecuritydatarepositoriesandperformanceassessmentmetricsforsemisupervisedlearning AT viktorhernal asystematicliteraturereviewofcybersecuritydatarepositoriesandperformanceassessmentmetricsforsemisupervisedlearning AT mvulapaulk systematicliteraturereviewofcybersecuritydatarepositoriesandperformanceassessmentmetricsforsemisupervisedlearning AT brancopaula systematicliteraturereviewofcybersecuritydatarepositoriesandperformanceassessmentmetricsforsemisupervisedlearning AT jourdanguyvincent systematicliteraturereviewofcybersecuritydatarepositoriesandperformanceassessmentmetricsforsemisupervisedlearning AT viktorhernal systematicliteraturereviewofcybersecuritydatarepositoriesandperformanceassessmentmetricsforsemisupervisedlearning |