Skip to content

A Behavior-Based Device Identification Method for the IoT

License

Notifications You must be signed in to change notification settings

kahramankostas/IoTDevIDv2

Repository files navigation

IoTDevID: A Behavior-Based Device Identification Method for the IoT

Overview

In this repository you will find a Python implementation of the methods in the paper IoTDevID: A Behavior-Based Device Identification Method for the IoT.

Kahraman Kostas, Mike Just, and Michael A. Lones. IoTDevID: A Behavior-Based Device Identification Method for the IoT, IEEE Internet of Things Journal, 2022.

What is IoTDevID?

Device identification is one way to secure a network of IoT devices, whereby devices identified as suspicious can subsequently be isolated from a network. In this study, we present a machine learning-based method, IoTDevID, that recognises devices through characteristics of their network packets. As a result of using a rigorous feature analysis and selection process, our study offers a generalizable and realistic approach to modelling device behavior, achieving high predictive accuracy across two public datasets. The model's underlying feature set is shown to be more predictive than existing feature sets used for device identification, and is shown to generalise to data unseen during the feature selection process. Unlike most existing approaches to IoT device identification, IoTDevID is able to detect devices using non-IP and low-energy protocols.

drawing

Fig 1 - A brief overview of the IoTDevID methodology.

Requirements and Infrastructure:

Wireshark and Python 3.6 were used to create the application files. Before running the files, it must be ensured that Wireshark, Python 3.6+ and the following libraries are installed.

Library Task
Scapy Packet(Pcap) crafting
tshark Packet(Pcap) crafting
Sklearn Machine Learning & Data Preparation
xverse Feature importance/voting
Numpy Mathematical Operations
Pandas Data Analysis
Matplotlib Graphics and Visuality
Seaborn Graphics and Visuality
graphviz Graphics and Visuality

The technical specifications of the computer used for experiments are given below.

Central Processing Unit : Intel(R) Core(TM) i7-7500U CPU @ 2.70GHz 2.90 GHz
Random Access Memory : 8 GB (7.74 GB usable)
Operating System : Windows 10 Pro 64-bit
Graphics Processing Unit : AMD Readon (TM) 530

Implementation:

The implementation phase consists of 5 steps, which are:

  • Feature Extraction
  • Feature Selection
  • Algorithm Selection
  • Performance Evaluation
  • Comparison with Previous Work

Each of these steps is implemented using one or more Python files. The same file was saved with both "py" and "ipynb" extensions. The code they contain is exactly the same. The file with the ipynb extension has the advantage of saving the state of the last run of that file and the screen output. Thus, screen output can be seen without re-running the files. Files with the ipynb extension can be run using jupyter notebook.

01 Feature Extraction (PCAP2CSV)

Section III.C in the article

There are four files relevant to this section:

These files convert the files with pcap extension to single packet-based, CSV extension fingerprint files (IoT Sentinel, IoTSense, IoTDevID individual packet based feature sets) and creates the labeling.

The processed datasets are shared in the repository. However, raw versions of the datasets used in the study and their addresses are given below.

Dataset capture year Number of Devices Type
Aalto University 2016 31 Benign
UNSW-Sydney IEEE TMC 2016 31 Benign
UNSW-Sydney ACM SOSR 2018 28 Benign & Malicious
CIC-IoT-22* 2022 60 Benign & Malicious
LSIF** 2020 22 Benign

*: The IoTDevID method was applied to this dataset as part of another study [Code]-[Paper]

**: The IoTDevID method was applied to this dataset as part of another study [Code]-[Paper-see Chapter 4]

Since the UNSW data are very large, we filter the data on a device and session basis. You can access the Pcap files obtained from this filtering process from this link (Used Pcap Files).

In addition, the CSVs.zip file contains the feature sets that are the output of this step and that we used in our experiments. These files:

  • Aalto_test_IoTDevID.csv
  • Aalto_train_IoTDevID.csv
  • Aalto_IoTSense_Test.csv
  • Aalto_IoTSense_Train.csv
  • Aalto_IoTSentinel_Test.csv
  • Aalto_IoTSentinel_Train.csv
  • UNSW_test_IoTDevID.csv
  • UNSW_train_IoTDevID.csv
  • UNSW_IoTSense_Test.csv
  • UNSW_IoTSense_Train.csv
  • UNSW_IoTSentinel_Test.csv
  • UNSW_IoTSentinel_Train.csv

02 Feature Selection

Section IV.A in the article

There are three files relevant to this section.

  • 02.1 Feature importance voting and pre-assessment of features: This file calculates the importance scores for each feature using six feature score calculation methods. It then votes for features using these scores. It lists the feature scores and the votes they have received and shows them on a plot. The six feature importance score calculation methods used are as follows.

    • Information Value using Weight of evidence.
    • Variable Importance using Random Forest.
    • Recursive Feature Elimination.
    • Variable Importance using Extra trees classifier.
    • Chi-Square best variables.
    • L1-based feature selection.
  • 02.2 Comparison of isolated data and CV methods: In this file, the results of the isolated test-training data and the cross-validated data are compared.

  • 02.3 Feature selection process using genetic algorithm: In this file, feature selection is performed by using a genetic algorithm.

03 Algorithm Selection

Section IV.B in the article

There are two files relevant to this section.

04 Performance Evaluation

Section V in the article

There are four files relevant to this section. In our experiments above, we found that DT offers the best balance between predictive performance and inference time among other machine learning methods. Therefore, only DT is used in all our subsequent experiments.

05 Comparison with Previous Work

Section VI in the article

There are two files relevant to this section.

License

This project is licensed under the MIT License - see the LICENSE file for details

Citations

If you use the source code please cite the following paper:

@article{kostas2022iot,
author = "Kahraman Kostas and Mike Just and Lones, {Michael Adam}",
year = "2022",
month = dec,
day = "1",
doi = "10.1109/JIOT.2022.3191951",
language = "English",
volume = "9",
pages = "23741--23749",
journal = "IEEE Internet of Things Journal",
issn = "2327-4662",
publisher = "IEEE",
number = "23",
}

Contact: Kahraman Kostas [email protected]