DetGen is a set of scripts and conventions for capturing network traffic and logging information from self-contained "scenarios" which are run inside Docker containers. The scenarios can be easily re-run to regenerate traffic with or without variation. In contrast to capturing traffic from general VMs or actual workstations, the captures are repeatable and more determinstic. We have used this framework for research on synthetic data for Network Intrusion Detection.
We recommend setting up in an Ubuntu instance:
- Install Docker on your Ubuntu machine following these instructions:
https://docs.docker.com/install/linux/docker-ce/ubuntu/#set-up-the-repository - Install docker-compose following:
https://docs.docker.com/desktop/install/linux-install/ - Clone this repository into a working directory:
git clone https://github.com/detlearsom/DetGen
- Build all the images:
sudu make
Note that building all of the images takes up a large amount of storage and, depending on your use-case, it may be better to build a smaller number of images manually. For a given capture, the necessary images can be read from the docker-compose.yml
file. We follow a standard naming procedure for our images: for a Dockerfile defined in containers/docker-{NAME}
, the corresponding image should be tagged detlearsom/{NAME}
when built.
To run scenarios, look at the capture.sh
scripts inside the captures
directory.
To introduce variation into the traffic, we use a series of scripts contained within the captures/Controlfunctions
directory. In particular, the tc-netem
scripts, container_tc.sh
and container_tc_local_bandwidth.sh
are useful to introduce network variations. By default, however, we have disabled these variations (simply by commenting them out of the capture.sh
scripts) as these have a tendency to occasionally cause the docker network to break, requiring you to restart docker.
For some Datsets used for publications in the Detlearsom project, please look for instructions here
This release contains all scenarios that are in a well-maintained state, namely:
- HTTP
- FTP
- SSH
- File-Sync
- BitTorrent
- SQL
- IRC
- NTP
- Music and Video streaming
and the following attack scenarios:
- SQL-injections
- Heartbleed
- XXE attacks
We have a number of additional scenarios and supporting software in other working repositories. Please contact us if you are interested.
- Minor updates to documentation and to scripts for compatibility
- Deprecation of prototype GUI interface
- Added the rapidreset scenario
- Fixed a lot of bugs! Although we attempted to make these scenarios as resiliant as possible, issues with deprecated package repositories or base images led to a number of scenarios being broken. We have fixed most of them (marking those that remain untested).
Please cite the following paper when using DetGen in your research:
Traffic generation using containerization for machine learning. Henry Clausen, Robert Flood, and David Aspinall. Workshop on DYnamic and Novel Advances in Machine Learning and Intelligent Cyber Security (pp. 1-12), December 2019.
DetGen is also used in work including:
Evading stepping-stone detection with enough chaff. Henry Clausen, Michael Gibson, David Aspinall. Conference on Network and System Security, November 2020
Examining traffic microstructures to improve model development. Henry Clausen and David Aspinall. 2021 IEEE Security and Privacy Workshops (SPW), May 2021
CBAM: A Contextual Model for Network Anomaly Detection. Henry Clausen, Gudmund Grov, and David Aspinall. MDPI Computers 2021 (MDPI), June 2021
Controlling network traffic microstructures for machine-learning model probing. Henry Clausen, Robert Flood, and David Aspinall. 17th EAI International Conference on Security and Privacy in Communication Networks (SecureComm 2021), September 2021.
Bad Design Smells in Benchmark NIDS Datasets. Robert Flood, Gints Engelen, David Aspinall, Lieven Desmet. 9th IEEE European Symposium on Security and Privacy, EuroS&P, July 2024. (winner of the Distinguished Paper Award)
The main authors are Henry Clausen, Robert Flood and David Aspinall. Thanks to further contributors for feedback, testing and extensions:
- Gudmund Grov
- Anthos Makris
Further contributions or notes of how you are using DetGen would be very welcome. Please contact David Aspinall in the first instance.
This project is licensed under the terms of the MIT license.
Copyright (c) 2019-2024 University of Edinburgh and contributors.
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.