Cargando…

Using Embedded Feature Selection and CNN for Classification on CCD-INID-V1—A New IoT Dataset

As Internet of Things (IoT) networks expand globally with an annual increase of active devices, providing better safeguards to threats is becoming more prominent. An intrusion detection system (IDS) is the most viable solution that mitigates the threats of cyberattacks. Given the many constraints of...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Zhipeng, Thapa, Niraj, Shaver, Addison, Roy, Kaushik, Siddula, Madhuri, Yuan, Xiaohong, Yu, Anna
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8309834/
https://www.ncbi.nlm.nih.gov/pubmed/34300574
http://dx.doi.org/10.3390/s21144834
_version_ 1783728616453963776
author Liu, Zhipeng
Thapa, Niraj
Shaver, Addison
Roy, Kaushik
Siddula, Madhuri
Yuan, Xiaohong
Yu, Anna
author_facet Liu, Zhipeng
Thapa, Niraj
Shaver, Addison
Roy, Kaushik
Siddula, Madhuri
Yuan, Xiaohong
Yu, Anna
author_sort Liu, Zhipeng
collection PubMed
description As Internet of Things (IoT) networks expand globally with an annual increase of active devices, providing better safeguards to threats is becoming more prominent. An intrusion detection system (IDS) is the most viable solution that mitigates the threats of cyberattacks. Given the many constraints of the ever-changing network environment of IoT devices, an effective yet lightweight IDS is required to detect cyber anomalies and categorize various cyberattacks. Additionally, most publicly available datasets used for research do not reflect the recent network behaviors, nor are they made from IoT networks. To address these issues, in this paper, we have the following contributions: (1) we create a dataset from IoT networks, namely, the Center for Cyber Defense (CCD) IoT Network Intrusion Dataset V1 (CCD-INID-V1); (2) we propose a hybrid lightweight form of IDS—an embedded model (EM) for feature selection and a convolutional neural network (CNN) for attack detection and classification. The proposed method has two models: (a) RCNN: Random Forest (RF) is combined with CNN and (b) XCNN: eXtreme Gradient Boosting (XGBoost) is combined with CNN. RF and XGBoost are the embedded models to reduce less impactful features. (3) We attempt anomaly (binary) classifications and attack-based (multiclass) classifications on CCD-INID-V1 and two other IoT datasets, the detection_of_IoT_botnet_attacks_N_BaIoT dataset (Balot) and the CIRA-CIC-DoHBrw-2020 dataset (DoH20), to explore the effectiveness of these learning-based security models. Using RCNN, we achieved an Area under the Receiver Characteristic Operator (ROC) Curve (AUC) score of 0.956 with a runtime of 32.28 s on CCD-INID-V1, 0.999 with a runtime of 71.46 s on Balot, and 0.986 with a runtime of 35.45 s on DoH20. Using XCNN, we achieved an AUC score of 0.998 with a runtime of 51.38 s for CCD-INID-V1, 0.999 with a runtime of 72.12 s for Balot, and 0.999 with a runtime of 72.91 s for DoH20. Compared to KNN, XCNN required 86.98% less computational time, and RCNN required 91.74% less computational time to achieve equal or better accurate anomaly detections. We find XCNN and RCNN are consistently efficient and handle scalability well; in particular, 1000 times faster than KNN when dealing with a relatively larger dataset-Balot. Finally, we highlight RCNN and XCNN’s ability to accurately detect anomalies with a significant reduction in computational time. This advantage grants flexibility for the IDS placement strategy. Our IDS can be placed at a central server as well as resource-constrained edge devices. Our lightweight IDS requires low train time and hence decreases reaction time to zero-day attacks.
format Online
Article
Text
id pubmed-8309834
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-83098342021-07-25 Using Embedded Feature Selection and CNN for Classification on CCD-INID-V1—A New IoT Dataset Liu, Zhipeng Thapa, Niraj Shaver, Addison Roy, Kaushik Siddula, Madhuri Yuan, Xiaohong Yu, Anna Sensors (Basel) Article As Internet of Things (IoT) networks expand globally with an annual increase of active devices, providing better safeguards to threats is becoming more prominent. An intrusion detection system (IDS) is the most viable solution that mitigates the threats of cyberattacks. Given the many constraints of the ever-changing network environment of IoT devices, an effective yet lightweight IDS is required to detect cyber anomalies and categorize various cyberattacks. Additionally, most publicly available datasets used for research do not reflect the recent network behaviors, nor are they made from IoT networks. To address these issues, in this paper, we have the following contributions: (1) we create a dataset from IoT networks, namely, the Center for Cyber Defense (CCD) IoT Network Intrusion Dataset V1 (CCD-INID-V1); (2) we propose a hybrid lightweight form of IDS—an embedded model (EM) for feature selection and a convolutional neural network (CNN) for attack detection and classification. The proposed method has two models: (a) RCNN: Random Forest (RF) is combined with CNN and (b) XCNN: eXtreme Gradient Boosting (XGBoost) is combined with CNN. RF and XGBoost are the embedded models to reduce less impactful features. (3) We attempt anomaly (binary) classifications and attack-based (multiclass) classifications on CCD-INID-V1 and two other IoT datasets, the detection_of_IoT_botnet_attacks_N_BaIoT dataset (Balot) and the CIRA-CIC-DoHBrw-2020 dataset (DoH20), to explore the effectiveness of these learning-based security models. Using RCNN, we achieved an Area under the Receiver Characteristic Operator (ROC) Curve (AUC) score of 0.956 with a runtime of 32.28 s on CCD-INID-V1, 0.999 with a runtime of 71.46 s on Balot, and 0.986 with a runtime of 35.45 s on DoH20. Using XCNN, we achieved an AUC score of 0.998 with a runtime of 51.38 s for CCD-INID-V1, 0.999 with a runtime of 72.12 s for Balot, and 0.999 with a runtime of 72.91 s for DoH20. Compared to KNN, XCNN required 86.98% less computational time, and RCNN required 91.74% less computational time to achieve equal or better accurate anomaly detections. We find XCNN and RCNN are consistently efficient and handle scalability well; in particular, 1000 times faster than KNN when dealing with a relatively larger dataset-Balot. Finally, we highlight RCNN and XCNN’s ability to accurately detect anomalies with a significant reduction in computational time. This advantage grants flexibility for the IDS placement strategy. Our IDS can be placed at a central server as well as resource-constrained edge devices. Our lightweight IDS requires low train time and hence decreases reaction time to zero-day attacks. MDPI 2021-07-15 /pmc/articles/PMC8309834/ /pubmed/34300574 http://dx.doi.org/10.3390/s21144834 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Liu, Zhipeng
Thapa, Niraj
Shaver, Addison
Roy, Kaushik
Siddula, Madhuri
Yuan, Xiaohong
Yu, Anna
Using Embedded Feature Selection and CNN for Classification on CCD-INID-V1—A New IoT Dataset
title Using Embedded Feature Selection and CNN for Classification on CCD-INID-V1—A New IoT Dataset
title_full Using Embedded Feature Selection and CNN for Classification on CCD-INID-V1—A New IoT Dataset
title_fullStr Using Embedded Feature Selection and CNN for Classification on CCD-INID-V1—A New IoT Dataset
title_full_unstemmed Using Embedded Feature Selection and CNN for Classification on CCD-INID-V1—A New IoT Dataset
title_short Using Embedded Feature Selection and CNN for Classification on CCD-INID-V1—A New IoT Dataset
title_sort using embedded feature selection and cnn for classification on ccd-inid-v1—a new iot dataset
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8309834/
https://www.ncbi.nlm.nih.gov/pubmed/34300574
http://dx.doi.org/10.3390/s21144834
work_keys_str_mv AT liuzhipeng usingembeddedfeatureselectionandcnnforclassificationonccdinidv1anewiotdataset
AT thapaniraj usingembeddedfeatureselectionandcnnforclassificationonccdinidv1anewiotdataset
AT shaveraddison usingembeddedfeatureselectionandcnnforclassificationonccdinidv1anewiotdataset
AT roykaushik usingembeddedfeatureselectionandcnnforclassificationonccdinidv1anewiotdataset
AT siddulamadhuri usingembeddedfeatureselectionandcnnforclassificationonccdinidv1anewiotdataset
AT yuanxiaohong usingembeddedfeatureselectionandcnnforclassificationonccdinidv1anewiotdataset
AT yuanna usingembeddedfeatureselectionandcnnforclassificationonccdinidv1anewiotdataset