-
Notifications
You must be signed in to change notification settings - Fork 9
Threat categorization
IP addresses are divided into categories based on the type of threat they pose. Primarily, each address is labeled as either src
or dst
, depending on whether it is an active attacker or a part of some malicious infrastructure. Source addresses are then classified based on the kind of attack (e.g. brute force or DDoS attacks) and destination addresses are further divided by their type (C&C servers, phishing sites, etc.). These categories can have additional subcategories (depending on the configuration), such as targeted ports or protocols. An IP may belong to multiple categories and each category label has a confidence value based on the number of related reports.
category | role | subcategories | description |
---|---|---|---|
bruteforce |
src | port, protocol | The IP performs dictionary (or bruteforce) attacks on password-protected services. Usually accompanied with scanning - searching for the targeted service. |
botnet_drone |
src | malware_family | The IP is acting as a bot/drone of a botnet. |
cc |
dst | malware_family | The IP is used as Command&Control server for a botnet/malware. |
ddos |
src | - | The IP was observed as a source of volumetric (D)DoS attacks. |
ddos-amplifier |
dst | protocol | The IP runs a service which can be (and often is) misused as an amplifier for DDoS attacks, e.g. open DNS resolvers, NTP servers, memcached, etc. |
exploit |
src | protocol | The IP is attempting to exploit known vulnerabilities. |
malware_distribution |
dst | malware_family | The IP is used to distribute a malware, e.g. hosts an HTTP URL from which a malware is being downloaded. |
phishing_site |
dst | - | The IP is hosting a phishing website. |
scan |
src | port | The IP address performs a network scanning, i.e. it tries to connect to various targets to search for open ports/services. |
spam |
src | - | The IP is sending spam. |
unknown |
src | - | The IP was reported as a source of malicious/rogue/unexpected packets, but without any further specification. |
When an IP address is seen in a new event/pulse/blacklist, it is assigned a threat category by the corresponding source module. All modules use the same taxonomy (defined above) and the classification method is largely the same, but can differ slightly based on how the module operates and what kind of information it has about each IP.
The classification is based on a system of rules. Rules are evaluated using Python (by the built-in eval()
function), i.e. each rule must be a valid Python expression that resolves to either True
or False
when evaluated. When a rule is evaluated as True
, the IP address will be assigned corresponding category label. Optionally, it is also possible to assign subcategory values directly within a specific rule.
Within each rule the programmer can access objects and functions visible in the context of the classify_ip()
function defined in /nerd/common/threat_categorization.py
, mainly the event
object and its attributes (which contains information about the new event that is currently being classified) and the regex library (re
) that can be used for easier string matching.
Here is a list of event attributes that can be used for classification:
attribute name | type | description |
---|---|---|
date |
string | Time of detection |
description |
string | Event description |
ip_info |
string | IP description / additional info |
protocols |
list[string] | List of protocols used by the IP |
target_ports |
list[int] | List of ports targeted by the IP |
categories |
list[string] | List of event categories (specific to Warden) |
tags |
list[string] | List of tags related to the event and/or IP address (specific to MISP) |
ip_role |
string | IP role (src/dst, specific to MISP) |
indicator_role |
string | IP category (specific to OTX) |
blacklist_id |
string | Blacklist ID (specific to blacklists) |
For each IP address there is a history of category records which contain the category id, date of detection and the number of reports from individual source modules. These records are then aggregated by a secondary module threat_category_summary
and each category from the final summary is assigned a confidence value based on the number of times the IP was reported.
The confidence for each category is computed as follows:
- For each of the last 14 days, compute:
-
n_events(d)
- Number of times the IP address was reported within the day -
n_sources(d)
- Number of distinct source modules that reported those events - Daily confidence
confidence(d) = (1 - 1/2^n_events(d)) * (1 - 1/2^n_sources(d))
-
- Final confidence is the weighted average of the 14 daily values with linearly decreasing weight (most recent day has the highest weight):
-
confidence = SUM[d=0..13](confidence(d) * (14-d)/14) / 7.5
(whered
is the number of days before today;7.5
is just the sum of the weights)
-
Configuration is specified in a YAML-formatted file (/etc/nerd/threat_categorization.yml
by default). It contains the definition of individual categories and their parameters:
threat_categorization:
category_id:
label: "Full name"
description: "Category description"
role: "src"
subcategories:
- "port"
- "protocol"
- "malware_family"
triggers:
module_name: |-
"indicator1" in event.description
"indicator2" in event.ip_info -> {protocol: ['proto1']}
another_module_name: |-
...
another_category_id:
...
A category is defined by specifying a new item under threat_categorization
with key set to the name of the category. Each category must have a label (full name), a description and a role (src or dst).
subcategories
specifies which subcategories should source modules try to classify. Currently there are 3 possible subcategories - target port, protocol and malware family.
The triggers
field contains a set of rules that are used for classification, divided into sections for individual source modules (it is possible to define a common set of rules for all modules under the name general
). Rules for each module are written into a single multiline string (block scalar with one rule per line) so that special characters like quotes do not have to be escaped. Each rule may have 2 parts divided by the "->" symbol - statement used for classification (mandatory) and subcategory assignment (optional). Both have to be valid Python expressions, able to be evaluated by eval()
. The statement should resolve to either True or False and the assignment should be a valid dictionary (key : set of values).