-
Notifications
You must be signed in to change notification settings - Fork 44
Configuration Guide
The Alert Manager-App's main purpose is to extend Splunk's core alerting functionality with sophisticated incident workflows and reporting.
Alert Manager can be also used to replace existing workflow solutions (eg. Incident Review in Enterprise Security).
Alert Manager is built on top of Splunk's core alerting functionality, utilizing its main functionality. Instead of just doing a "fire and forget" action on the alert, Alert Manager will store the state of an alert as an incident in a KV store.
Alert Manager was designed to easily integrate into existing environments by just adding a Alert Script to alerts that should be managed and adding the alert_manager role to the users that use the app or send alerts to the app. Existing Alert Scripts can be integrated by Alert Manager's pass-through capability.
It is important, to distinguish between the terms alerts and incidents.
The term alert is used for alerts triggered by a Splunk scheduled search. Alert metadata is indexed by default into an index named alerts.
The term incident is used for enriched metadata around the alert. The data is stored in a KV store and some metadata is enriched using lookup tables (for dynamic customizations).
Incidents are stored with metadata such as alert_time, job_id, owner, status, priority, ttl, etc.
To define, which alerts should create incidents within Alert Manager , select the item Incident Settings under the Settings menu.
Categorization is used to group incidents. Categorization can be used to filter incidents on the Incident Posture dashboard and run category statistics. There are two attributes can be used: category and subcategory.
For more complex environments, incidents can be tagged with an arbitrary number of tags. Incidents can be filtered on the Incident Posture.
The incident's urgency is calculated using the alert's severity and the incident's priority setting. This is based on a lookup table named alert_urgencies. A sample lookup table has been provided.
$APP_HOME/lookup/alert_urgencies.csv.sample:
severity,priority,urgency
unknown,unknown,low
unknown,low,low
unknown,medium,low
...
informational,high,informational
informational,critical,informational
...
low,high,medium
low,critical,medium
...
fatal,high,critical
fatal,critical,critical
To adjust the urgencies, create a new lookup table $APP_HOME/lookup/alert_urgencies.csv and edit $APP_HOME/local/transforms.conf to point to this new lookup table.
Alert Manager uses Splunk's built-in alert script facility. To still allow further or existing alert scripts to run, Alert Manager passes through all shell option to an optional alert script.
Alert Manager allows incidents to be automatically assigned to owners. If no owner is selected, a default owner is assigned ( defined under Global Settings ). Owners can be selected amongst user defined under User Settings.
Splunk's alerting facility triggers on search results. Sometimes an incident is resolved if no further search-results are found. In this case the "Auto TTL Resolve" -function can be used.
Another scenario could be, that an alert keeps reoccurring many times before an incident owner can find the root cause and fix the problem. This may cause a lot of incidents in the "new"-state. To close these previously opened incidents, the Auto Previous Resolve -function can be used.
To use the Auto TTL Resolve feature, the expiration time of the triggered alert time should be set. E.g. if an alert search runs every 15 Minutes, the expiration time should also be set to 15 Minutes.
E.g. the first alert fires at 1:00am and creates an incident. The next scheduled alert runs at 1:15am without results. The first alert from 1:00am will expire at 1:15am and the incident will be automatically resolved with status auto_ttl_resolve.
The Auto Previous Resolve feature closes previous incident in status "new"
E.g. the first alert fires at 1:00am and creates an incident. The next scheduled alert fires at 1:15am and opens a new incident. If the first incident from 1:00am is still in status "new", it will be automatically resolved with status auto_previous_resolve. In case, the first incident's status was changed, it will not be resolved and it's status will be preserved.
For alerts to be managed by Alert Manager a few per-requisites have to be fulfilled.
The scheduled alert has to run a scripted alert script "alert_handler.py". Enable "Run a script" under the Splunk Saved Searches configuration page, and add the script name into the text field.
The alert has to be run by a user with the alert_manager role. This is needed for the alert_handler.py script to be able to ingest alert metadata into the index "alerts".
By default, the table shows all alerts that are managed by Alert Manager (indicated by the _key column). Depending on the App context drop-down selection, alerts that are readable by the logged in user's role, are displayed. Unmanaged alerts do not yet have a _key set.
To configure an unmanaged alert to be managed, the App context where the alert resides in needs to be selected. All alerts in the app context will be displayed in the table. If there are alerts that, are superfluous, they can be deleted by right-clicking on the table and selecting Remove row.
To store the new incident configuration, Save settings has to be selected. Before or after saving, further customization of the incident can be applied.
A category and a subcategory can be defined for every incident.
Tags have to be entered as a space separated list.
A priority can be assigned to an alert. Following values are available: unknown, low, medium, high, critical.
An alert script placed under $SPLUNK_HOME/bin/scripts/
Checkbox to enable/disable auto-assignment. If auto_assigned is enabled under column auto_assign, the incident will be assigned to the value in column auto_assign_owner.
If auto_ttl_resolve is selected, the incident will be closed, after the alert has expired.
If auto_previous_resolve is selected, previously opened incident with the same alert name, and status new will be automatically resolved.