Threat categorization

IP addresses are divided into categories based on the type of threat they pose. Primarily, each address is labeled as either src or dst, depending on whether it is an active attacker or a part of some malicious infrastructure. Source addresses are then classified based on the kind of attack (e.g. brute force or DDoS attacks) and destination addresses are further divided by their type (C&C servers, phishing sites, etc.). These categories can have additional subcategories (depending on the configuration), such as targeted ports or protocols. An IP may belong to multiple categories and each category label has a confidence value based on the number of related reports.

Taxonomy

category	role	subcategories	description
`bruteforce`	src	port, protocol	The IP performs dictionary (or bruteforce) attacks on password-protected services. Usually accompanied with scanning - searching for the targeted service.
`botnet_drone`	src	malware_family	The IP is acting as a bot/drone of a botnet.
`cc`	dst	malware_family	The IP is used as Command&Control server for a botnet/malware.
`ddos`	src	-	The IP was observed as a source of volumetric (D)DoS attacks.
`ddos-amplifier`	dst	protocol	The IP runs a service which can be (and often is) misused as an amplifier for DDoS attacks, e.g. open DNS resolvers, NTP servers, memcached, etc.
`exploit`	src	protocol	The IP is attempting to exploit known vulnerabilities.
`malware_distribution`	dst	malware_family	The IP is used to distribute a malware, e.g. hosts an HTTP URL from which a malware is being downloaded.
`phishing_site`	dst	-	The IP is hosting a phishing website.
`scan`	src	port	The IP address performs a network scanning, i.e. it tries to connect to various targets to search for open ports/services.
`spam`	src	-	The IP is sending spam.
`unknown`	src	-	The IP was reported as a source of malicious/rogue/unexpected packets, but without any further specification.

How it works

Classification

When an IP address is seen in a new event/pulse/blacklist, it is assigned a threat category by the corresponding source module. All modules use the same taxonomy (defined above) and the classification method is largely the same, but can differ slightly based on how the module operates and what kind of information it has about each IP.

The classification is based on a system of rules. Rules are evaluated using Python (by the built-in eval() function), i.e. each rule must be a valid Python expression that resolves to either True or False when evaluated. When a rule is evaluated as True, the IP address will be assigned corresponding category label. Optionally, it is also possible to assign subcategory values directly within a specific rule.

Within each rule the programmer can access objects and functions visible in the context of the classify_ip() function defined in /nerd/common/threat_categorization.py, mainly the event object and its attributes (which contains information about the new event that is currently being classified) and the regex library (re) that can be used for easier string matching.

Here is a list of event attributes that can be used for classification:

attribute name	type	description
`date`	string	Time of detection
`description`	string	Event description
`ip_info`	string	IP description / additional info
`protocols`	list[string]	List of protocols used by the IP
`target_ports`	list[int]	List of ports targeted by the IP
`categories`	list[string]	List of event categories (specific to Warden)
`tags`	list[string]	List of tags related to the event and/or IP address (specific to MISP)
`ip_role`	string	IP role (src/dst, specific to MISP)
`indicator_role`	string	IP category (specific to OTX)
`blacklist_id`	string	Blacklist ID (specific to blacklists)

Summary module

For each IP address there is a history of category records which contain the category id, date of detection and the number of reports from individual source modules. These records are then aggregated by a secondary module threat_category_summary and each category from the final summary is assigned a confidence value based on the number of times the IP was reported.

The confidence for each category is computed as follows:

For each of the last 14 days, compute:
- n_events(d) - Number of times the IP address was reported within the day
- n_sources(d) - Number of distinct source modules that reported those events
- Daily confidence confidence(d) = (1 - 1/2^n_events(d)) * (1 - 1/2^n_sources(d))
Final confidence is the weighted average of the 14 daily values with linearly decreasing weight (most recent day has the highest weight):
- confidence = SUM[d=0..13](confidence(d) * (14-d)/14) / 7.5 (where d is the number of days before today; 7.5 is just the sum of the weights)

Configuration

Configuration is specified in a YAML-formatted file (/etc/nerd/threat_categorization.yml by default). It contains the definition of individual categories and their parameters:

threat_categorization:
	category_id:
		label: "Full name"
		description: "Category description"
		role: "src"
		subcategories:
			- "port"
			- "protocol"
			- "malware_family"
		triggers:
			module_name: |-
				"indicator1" in event.description
				"indicator2" in event.ip_info -> {protocol: ['proto1']}
			another_module_name: |-
				...
	another_category_id:
		...

A category is defined by specifying a new item under threat_categorization with key set to the name of the category. Each category must have a label (full name), a description and a role (src or dst).

subcategories specifies which subcategories should source modules try to classify. Currently there are 3 possible subcategories - target port, protocol and malware family.

The triggers field contains a set of rules that are used for classification, divided into sections for individual source modules (it is possible to define a common set of rules for all modules under the name general). Rules for each module are written into a single multiline string (block scalar with one rule per line) so that special characters like quotes do not have to be escaped. Each rule may have 2 parts divided by the "->" symbol - statement used for classification (mandatory) and subcategory assignment (optional). Both have to be valid Python expressions, able to be evaluated by eval(). The statement should resolve to either True or False and the assignment should be a valid dictionary (key : set of values).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Threat categorization

Threat categorization

Taxonomy

How it works

Classification

Summary module

Configuration

Clone this wiki locally