Cargando…
Using Embedded Feature Selection and CNN for Classification on CCD-INID-V1—A New IoT Dataset
As Internet of Things (IoT) networks expand globally with an annual increase of active devices, providing better safeguards to threats is becoming more prominent. An intrusion detection system (IDS) is the most viable solution that mitigates the threats of cyberattacks. Given the many constraints of...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8309834/ https://www.ncbi.nlm.nih.gov/pubmed/34300574 http://dx.doi.org/10.3390/s21144834 |
_version_ | 1783728616453963776 |
---|---|
author | Liu, Zhipeng Thapa, Niraj Shaver, Addison Roy, Kaushik Siddula, Madhuri Yuan, Xiaohong Yu, Anna |
author_facet | Liu, Zhipeng Thapa, Niraj Shaver, Addison Roy, Kaushik Siddula, Madhuri Yuan, Xiaohong Yu, Anna |
author_sort | Liu, Zhipeng |
collection | PubMed |
description | As Internet of Things (IoT) networks expand globally with an annual increase of active devices, providing better safeguards to threats is becoming more prominent. An intrusion detection system (IDS) is the most viable solution that mitigates the threats of cyberattacks. Given the many constraints of the ever-changing network environment of IoT devices, an effective yet lightweight IDS is required to detect cyber anomalies and categorize various cyberattacks. Additionally, most publicly available datasets used for research do not reflect the recent network behaviors, nor are they made from IoT networks. To address these issues, in this paper, we have the following contributions: (1) we create a dataset from IoT networks, namely, the Center for Cyber Defense (CCD) IoT Network Intrusion Dataset V1 (CCD-INID-V1); (2) we propose a hybrid lightweight form of IDS—an embedded model (EM) for feature selection and a convolutional neural network (CNN) for attack detection and classification. The proposed method has two models: (a) RCNN: Random Forest (RF) is combined with CNN and (b) XCNN: eXtreme Gradient Boosting (XGBoost) is combined with CNN. RF and XGBoost are the embedded models to reduce less impactful features. (3) We attempt anomaly (binary) classifications and attack-based (multiclass) classifications on CCD-INID-V1 and two other IoT datasets, the detection_of_IoT_botnet_attacks_N_BaIoT dataset (Balot) and the CIRA-CIC-DoHBrw-2020 dataset (DoH20), to explore the effectiveness of these learning-based security models. Using RCNN, we achieved an Area under the Receiver Characteristic Operator (ROC) Curve (AUC) score of 0.956 with a runtime of 32.28 s on CCD-INID-V1, 0.999 with a runtime of 71.46 s on Balot, and 0.986 with a runtime of 35.45 s on DoH20. Using XCNN, we achieved an AUC score of 0.998 with a runtime of 51.38 s for CCD-INID-V1, 0.999 with a runtime of 72.12 s for Balot, and 0.999 with a runtime of 72.91 s for DoH20. Compared to KNN, XCNN required 86.98% less computational time, and RCNN required 91.74% less computational time to achieve equal or better accurate anomaly detections. We find XCNN and RCNN are consistently efficient and handle scalability well; in particular, 1000 times faster than KNN when dealing with a relatively larger dataset-Balot. Finally, we highlight RCNN and XCNN’s ability to accurately detect anomalies with a significant reduction in computational time. This advantage grants flexibility for the IDS placement strategy. Our IDS can be placed at a central server as well as resource-constrained edge devices. Our lightweight IDS requires low train time and hence decreases reaction time to zero-day attacks. |
format | Online Article Text |
id | pubmed-8309834 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-83098342021-07-25 Using Embedded Feature Selection and CNN for Classification on CCD-INID-V1—A New IoT Dataset Liu, Zhipeng Thapa, Niraj Shaver, Addison Roy, Kaushik Siddula, Madhuri Yuan, Xiaohong Yu, Anna Sensors (Basel) Article As Internet of Things (IoT) networks expand globally with an annual increase of active devices, providing better safeguards to threats is becoming more prominent. An intrusion detection system (IDS) is the most viable solution that mitigates the threats of cyberattacks. Given the many constraints of the ever-changing network environment of IoT devices, an effective yet lightweight IDS is required to detect cyber anomalies and categorize various cyberattacks. Additionally, most publicly available datasets used for research do not reflect the recent network behaviors, nor are they made from IoT networks. To address these issues, in this paper, we have the following contributions: (1) we create a dataset from IoT networks, namely, the Center for Cyber Defense (CCD) IoT Network Intrusion Dataset V1 (CCD-INID-V1); (2) we propose a hybrid lightweight form of IDS—an embedded model (EM) for feature selection and a convolutional neural network (CNN) for attack detection and classification. The proposed method has two models: (a) RCNN: Random Forest (RF) is combined with CNN and (b) XCNN: eXtreme Gradient Boosting (XGBoost) is combined with CNN. RF and XGBoost are the embedded models to reduce less impactful features. (3) We attempt anomaly (binary) classifications and attack-based (multiclass) classifications on CCD-INID-V1 and two other IoT datasets, the detection_of_IoT_botnet_attacks_N_BaIoT dataset (Balot) and the CIRA-CIC-DoHBrw-2020 dataset (DoH20), to explore the effectiveness of these learning-based security models. Using RCNN, we achieved an Area under the Receiver Characteristic Operator (ROC) Curve (AUC) score of 0.956 with a runtime of 32.28 s on CCD-INID-V1, 0.999 with a runtime of 71.46 s on Balot, and 0.986 with a runtime of 35.45 s on DoH20. Using XCNN, we achieved an AUC score of 0.998 with a runtime of 51.38 s for CCD-INID-V1, 0.999 with a runtime of 72.12 s for Balot, and 0.999 with a runtime of 72.91 s for DoH20. Compared to KNN, XCNN required 86.98% less computational time, and RCNN required 91.74% less computational time to achieve equal or better accurate anomaly detections. We find XCNN and RCNN are consistently efficient and handle scalability well; in particular, 1000 times faster than KNN when dealing with a relatively larger dataset-Balot. Finally, we highlight RCNN and XCNN’s ability to accurately detect anomalies with a significant reduction in computational time. This advantage grants flexibility for the IDS placement strategy. Our IDS can be placed at a central server as well as resource-constrained edge devices. Our lightweight IDS requires low train time and hence decreases reaction time to zero-day attacks. MDPI 2021-07-15 /pmc/articles/PMC8309834/ /pubmed/34300574 http://dx.doi.org/10.3390/s21144834 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Liu, Zhipeng Thapa, Niraj Shaver, Addison Roy, Kaushik Siddula, Madhuri Yuan, Xiaohong Yu, Anna Using Embedded Feature Selection and CNN for Classification on CCD-INID-V1—A New IoT Dataset |
title | Using Embedded Feature Selection and CNN for Classification on CCD-INID-V1—A New IoT Dataset |
title_full | Using Embedded Feature Selection and CNN for Classification on CCD-INID-V1—A New IoT Dataset |
title_fullStr | Using Embedded Feature Selection and CNN for Classification on CCD-INID-V1—A New IoT Dataset |
title_full_unstemmed | Using Embedded Feature Selection and CNN for Classification on CCD-INID-V1—A New IoT Dataset |
title_short | Using Embedded Feature Selection and CNN for Classification on CCD-INID-V1—A New IoT Dataset |
title_sort | using embedded feature selection and cnn for classification on ccd-inid-v1—a new iot dataset |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8309834/ https://www.ncbi.nlm.nih.gov/pubmed/34300574 http://dx.doi.org/10.3390/s21144834 |
work_keys_str_mv | AT liuzhipeng usingembeddedfeatureselectionandcnnforclassificationonccdinidv1anewiotdataset AT thapaniraj usingembeddedfeatureselectionandcnnforclassificationonccdinidv1anewiotdataset AT shaveraddison usingembeddedfeatureselectionandcnnforclassificationonccdinidv1anewiotdataset AT roykaushik usingembeddedfeatureselectionandcnnforclassificationonccdinidv1anewiotdataset AT siddulamadhuri usingembeddedfeatureselectionandcnnforclassificationonccdinidv1anewiotdataset AT yuanxiaohong usingembeddedfeatureselectionandcnnforclassificationonccdinidv1anewiotdataset AT yuanna usingembeddedfeatureselectionandcnnforclassificationonccdinidv1anewiotdataset |