Skip to content

Commit

Permalink
feat: Add ability to get alertmanager to expire alerts (#364)
Browse files Browse the repository at this point in the history
* Add functionality to resend an aged alert to alertmanager when a user clicks close

* Add the code to allow OpsGenie to update Alerta

* Fix config example in README.md

* No need for backticks

* Update the OpsGenie screenshot

* Fix some links in README.md

* Fix up some of the links to OpsGenie docs

* Add a known configurable source: value  to the payload from the OpsGenie plugin

* Add some troubleshooting around alert source usage

Co-authored-by: Cody Stevens <[email protected]>
  • Loading branch information
dakotacody and costevens authored Oct 23, 2021
1 parent 22109ac commit 69d271e
Show file tree
Hide file tree
Showing 8 changed files with 445 additions and 9 deletions.
216 changes: 216 additions & 0 deletions integrations/opsgenie/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,216 @@
Set up OpsGenie with an OpsGenie Edge Connector integration.
==================


While the OpsGenie plugin provided by Alerta can send alerts to OpsGenie it does not allow OpsGenie to update Alerta.
Fortunately, OpsGenie has an edge connector we can install and configure to use some code to do this for us.


Set up OpsGenie Edge Connector (oec)
------------------

Log in to OpsGenie
In Settings under the "Integration List" search for Edge.
Select OEC from the results and click "Add". This will generate an API key for the integration. Name it whatever you'd like.

The currently supported actions from OEC to Alerta are mapped as follows:

- from OEC "alert is acknowledged" -> Alerta will ack the alert
- from OEC "alert is closed" -> Alerta will close the alert
- from OEC "alert is unacknowledged" -> Alerta will unack the alert
- from OEC "a note is added" -> Alerta will add a note to the alert
- from OEC "A user executes assign ownership" -> Alerta will assign the alert
- from OEC "A user takes ownership" -> Alerta will assign the alert


Click "Add new action" and add whichever actions you desire from OEC.
Copy the API Key and set it aside somewhere for later. This is the API key that OEC will need to communicate with OpsGenie
Click "Save Integration" when you have added all the actions you want to be sent to Alerta from OEC


![Configuring OpsGenie Edge Connector for Alerta](./images/2.png)

Set up an API user and key for Alerta. This is the key that OEC needs to auth into Alerta with.

Set up a user and api key. This integration is currently set to use the api as a single user. In our setup we chose to use a local Alerta user 'opsgenie' and assigned an API key.

[Alerta api key docs]( https://docs.alerta.io/en/latest/webui/apikeys.html#webui-api-keys)

Set the Alerta API key you will use aside for configuration later.

![Configuring OpsGenie Edge Connector API key for Alerta](./images/3.png)


As mentioned all actions will be shown to be executed by the user you chose to add the API key to. Notes will include the user name from OpsGenie. This could be addressed in Alerta in the future if a permission was added to be able to impersonate an Alerta user and assigned to the api key. Passing another field to the API that would associate the action with an existing Alerta user.


Install and configure OpsGenie Edge Connector on a host in your network. Alerta has been tested with OEC version 1.1.3
------------------

Some links to OpsGenie OEC documentation:

[Installation docs for OEC provided by Atlassian](https://support.atlassian.com/opsgenie/docs/opsgenie-edge-connector-installation-packs/)

[Basic OEC configuration information is provided by Atlassian](https://support.atlassian.com/opsgenie/docs/configure-opsgenie-edge-connector/)

By default OEC installs into /home/opsgenie. Ensure that the following directories are created and owned by opsgenie:opsgenie

/home/opsgenie
/home/opsgenie/oec
/home/opsgenie/oec/conf
/home/opsgenie/oec/output
/home/opsgenie/oec/scripts


ensure python3 is installed in the opsgenie user's PATH

Install and edit the OEC config for Alerta
------------------

Edit and install the config.json into /home/opsgenie/oec/conf/config.json

add the API key you generated in OpsGenie for apiKey, the double-quotes are necessary as this is a string. This is the key that lets OEC communicate with OpsGenie.

"apiKey": "your_alerta_oec_api_key_goes_here",


add the alertaApiKey and alertaApiUrl for Alerta to the globalArgs portion of the config. This is the key that lets OEC communicate with Alerta at the alertaApiUrl.

"globalArgs": ["--alertaApiUrl", "{{ alerta_api_url_goes_here }}",
"--alertaApiKey", "{{ alerta_opsgenie_api_key_goes_here}}" ],


change any paths for stderr and stdout if you don't want any logging or want it somewhere else
install the newly edited config.json file to /home/opsgenie/oec/conf/config.json

Install the script that will be run by OEC to interact with Alerta
------------------

install oecAlertaExecutor.py script to: /home/opsgenie/oec/scripts

ensure that the perms are

owner: opsgenie
group: opsgenie
mode: 0755



Remove some things that OEC installs by default
------------------

The following seemed to cause issues on our install. Removing them resolved our issues

/home/opsgenie/oec/scripts/http.py
/home/opsgenie/oec/scripts/actionExecutor.py
/home/opsgenie/oec/scripts/__pycache__


Restart OEC on your system.

If Alerta is configured to send alerts to OpsGenie then OEC should get updates and be able to update alerts in Alerta from any of the OpsGenie interfaces (web/phone etc..)

Troubleshooting
------------------
If alerts are not firing it could be due to the alert source not being set. This requires an update to the OpsGenie plugin that hasn't been accepted yet.
Including a line in the plugin to set the source from the config or a reasonable default should address this.

```OPSGENIE_ALERT_SOURCE = os.environ.get('OPSGENIE_ALERT_SOURCE') or app.config.get('OPSGENIE_ALERT_SOURCE', 'Alerta'```

and later in the plugin include that in your payload

```
payload = {
"alias": alert.id,
"message": "{}: {} -> {}".format(alert.severity, alert.text, alert.value),
"entity": "{}-{}".format("-".join(alert.service), alert.environment),
"responders": teams,
"tags": tags,
"source": "{}".format(OPSGENIE_ALERT_SOURCE),
"details": details
}
```

This is useful for OpsGenie Edge Connector to not update ALL Edge connector integrations if you have more than one running in your env. This will send updates to Alerta when the
source was Alerta. So if JIRA is also integrated through OEC it won't be trying to send any updates to Alerta etc.

![Limiting which integrations can update Alerta in OpsGenie](./images/alert-filter.png)





Here is the the full config we use in prod ( templatized)


{
"apiKey": "{{ alerta_oec_api_key }}",
"baseUrl": "https://api.opsgenie.com",
"logLevel": "WARN",
"globalArgs": ["--alertaApiUrl", "{{ alerta_api_url }}",
"--alertaApiKey", "{{ alerta_stg_opsgenie_api_key}}" ],
"globalFlags": {},
"actionMappings": {
"Acknowledge": {
"filepath": "/home/opsgenie/oec/scripts/oecAlertaExecutor.py",
"sourceType": "local",
"env": [],
"stderr": "/var/log/opsgenie/oecAlertaExecutor-errors.log",
"stdout": "/var/log/opsgenie/oecAlertaExecutor.log"
},
"AddNote": {
"filepath": "/home/opsgenie/oec/scripts/oecAlertaExecutor.py",
"sourceType": "local",
"env": [],
"stderr": "/var/log/opsgenie/oecAlertaExecutor-errors.log",
"stdout": "/var/log/opsgenie/oecAlertaExecutor.log"
},
"Close": {
"filepath": "/home/opsgenie/oec/scripts/oecAlertaExecutor.py",
"sourceType": "local",
"env": [],
"stderr": "/var/log/opsgenie/oecAlertaExecutor-errors.log",
"stdout": "/var/log/opsgenie/oecAlertaExecutor.log"
},
"AssignOwnership": {
"filepath": "/home/opsgenie/oec/scripts/oecAlertaExecutor.py",
"sourceType": "local",
"env": [],
"stderr": "/var/log/opsgenie/oecAlertaExecutor-errors.log",
"stdout": "/var/log/opsgenie/oecAlertaExecutor.log"
},
"Snooze": {
"filepath": "/home/opsgenie/oec/scripts/oecAlertaExecutor.py",
"sourceType": "local",
"env": [],
"stderr": "/var/log/opsgenie/oecAlertaExecutor-errors.log",
"stdout": "/var/log/opsgenie/oecAlertaExecutor.log"
},
"TakeOwnership": {
"filepath": "/home/opsgenie/oec/scripts/oecAlertaExecutor.py",
"sourceType": "local",
"env": [],
"stderr": "/var/log/opsgenie/oecAlertaExecutor-errors.log",
"stdout": "/var/log/opsgenie/oecAlertaExecutor.log"
},
"UnAcknowledge": {
"filepath": "/home/opsgenie/oec/scripts/oecAlertaExecutor.py",
"sourceType": "local",
"env": [],
"stderr": "/var/log/opsgenie/oecAlertaExecutor-errors.log",
"stdout": "/var/log/opsgenie/oecAlertaExecutor.log"
}
},
"pollerConf": {
"pollingWaitIntervalInMillis": 100,
"visibilityTimeoutInSec": 30,
"maxNumberOfMessages": 10
},
"poolConf": {
"maxNumberOfWorker": 12,
"minNumberOfWorker": 4,
"monitoringPeriodInMillis": 15000,
"keepAliveTimeInMillis": 6000,
"queueSize": 0
}
}
Binary file added integrations/opsgenie/images/2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added integrations/opsgenie/images/3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added integrations/opsgenie/images/alert-filter.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
156 changes: 156 additions & 0 deletions integrations/opsgenie/oecAlertaExecutor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
#!/usr/bin/env python3
import argparse
import json
import logging
import sys
import requests

parser = argparse.ArgumentParser()
parser.add_argument('-payload', '--queuePayload', help='Payload from queue', required=True)
parser.add_argument('-apiKey', '--apiKey', help='The apiKey of the integration', required=True)
parser.add_argument('-opsgenieUrl', '--opsgenieUrl', help='The url', required=True)
parser.add_argument('-logLevel', '--logLevel', help='Log level', required=True)
parser.add_argument('-alertaApiUrl', '--alertaApiUrl', help='The url to do alerta api operations', required=True)
parser.add_argument('-alertaApiKey', '--alertaApiKey', help='The api key to do alerta api operations', required=True)
args = vars(parser.parse_args())

logging.basicConfig(stream=sys.stdout, level=args['logLevel'])
LOG_PREFIX = 'oec_action'


def do_alerta_things(alerta_api_target, alerta_headers, payload):
try:
r = requests.put(alerta_api_target, json=payload, headers=alerta_headers, timeout=2)
except Exception as e:
logging.error("{} - Error updating {}. Error: {}".format(LOG_PREFIX, alerta_api_target, e))

logging.info('{} - Call to {} return status code: {}'.format(LOG_PREFIX, alerta_api_target, r.status_code))
return r.status_code


def get_alert_status(alerta_api_target, alerta_headers):
try:
r = requests.get(alerta_api_target, headers=alerta_headers, timeout=2)
except Exception as e:
logging.error("{} - Error getting {} : {}".format(LOG_PREFIX, alerta_api_target, e))

contents = json.loads(r.content)
cur_status = contents['alert'].get('status', None)

return cur_status


def main():

queue_message_string = args['queuePayload']
queue_message = json.loads(queue_message_string)

action = queue_message["action"]
LOG_PREFIX = "[ {} ]".format(action)

alert_id = queue_message["alert"]["alertId"]
origin = queue_message["alert"]["source"]
username = queue_message["alert"]["username"]
logging.debug("{} - Username is: {}, Origin is: {}".format(LOG_PREFIX, username, origin))
alerta_url = args['alertaApiUrl']
alerta_headers = {'Content-type': 'application/json', 'Authorization': 'Key {}'.format(args['alertaApiKey'])}

logging.info("{} - Using Alerta URL : {}".format(LOG_PREFIX, alerta_url))
logging.debug("{} - Message: {}".format(LOG_PREFIX, queue_message))
timeout = 300 # default timeout for connections to opsgenie api
action_timeout = 7200 # default alerta action timeout

logging.info("{} - Will execute {} for alertId {}".format(LOG_PREFIX, action, alert_id))

action_map = {"Acknowledge": "ack",
"AddNote": "note",
"AssignOwnership": "assign",
"TakeOwnership": "assign",
"UnAcknowledge": "unack",
"Close": "close",
"Snooze": "shelve"}

if alert_id:
alert_api_url = "{}/v2/alerts/{}".format(args['opsgenieUrl'], alert_id)
headers = {
"Content-Type": "application/json",
"Accept-Language": "application/json",
"Authorization": "GenieKey {}".format(args['apiKey'])
}
alert_response = requests.get(alert_api_url, headers=headers, timeout=timeout)
if alert_response.status_code < 299 and alert_response.json()['data']:
if action in action_map.keys() and origin == 'Alerta':

alias = queue_message["alert"]["alias"]
logging.info("{} - {} {} from {}".format(LOG_PREFIX, action, alias, username))
alerta_action = action_map[action]

# set default target and payload
alerta_api_target = "{}/{}/action".format(alerta_url, alias)
payload = {"action": alerta_action, "text": "{}d by {}.".format(action, username), "timeout": action_timeout}

# payload will change according to action and then fall through to the
# default api call unless the alerta_api_target is set to None on its way down
if action == 'Snooze':
# snooze_end = queue_message["alert"]["snoozedUntil"]
snooze_end = queue_message["alert"]["snoozeEndDate"]
# snooze_end = dt.fromtimestamp(int("{}".format(snooze_end)[:-3])) # < - datetime object
# now = dt.fromtimestamp(dt.timestamp(dt.utcnow()))
# snooze_seconds = int((snooze_end - now).total_seconds())
# if snooze_seconds > 0:
# logging.info("{} - Snoozing for {} seconds".format(LOG_PREFIX, snooze_seconds))
payload["text"] = "Shelved until: {} by {}".format(snooze_end, username)

elif action == 'AddNote':
# payload and target for notes is different than actions
alerta_api_target = "{}/{}/note".format(alerta_url, alias)

# since we have one api key assigned to a default 'opsgenie' user
# include the username with the note so we know who wrote it
payload = {"note": "{} Added by {}".format(queue_message["alert"]["note"], username)}

elif action == 'AssignOwnership': #
owner = queue_message["alert"]["owner"]
# update the payload
payload["text"] = "Assigned to {} by {}".format(owner, username)
elif action == 'TakeOwnership': #
# open_payload = { "action": "open", "text": "transisition to open for assignment", "timeout": action_timeout }
# do_alerta_things(alerta_api_target,open_payload)

# update the payload
payload["text"] = "{} took ownership".format(username)
elif action == 'Acknowledge': # update the acked-by attribute too..
# opsgenie does not send an action when an alert comes out of snooze
# we will check the alert and if it has a 'shelved' status unshelve it
# this is silly but the tags opgsgenie has are NOT the alert tags.
# or I would just look at those
# Get the alert so we can check the status
alert_url = "{}/{}".format(alerta_url, alias)
status = get_alert_status(alert_url, alerta_headers)
if status == 'shelved':
# unshelve the thing (default)
# and then the normal action can run
unshelve_payload = {"action": "unshelve", "text": "Unshelved by {}.".format(username), "timeout": action_timeout}

do_alerta_things(alerta_api_target, alerta_headers, unshelve_payload)
# update the api target to None unshelving will put it back to Ack
alerta_api_target = None

# update the acked-by attribute
ack_by_payload = {"attributes": {"acked-by": username}}
ack_by_target = "{}/{}/attributes".format(alerta_url, alias)
do_alerta_things(ack_by_target, alerta_headers, ack_by_payload)

if alerta_api_target:
# as long as none of the above set the
# alerta_api_target to None we should do the original action
do_alerta_things(alerta_api_target, alerta_headers, payload)

else:
logging.warning("{} - Alert with id [ {} ] does not exist in Opsgenie. It is probably deleted.".format(LOG_PREFIX, alert_id))
else:
logging.warning("{} - Alert id was not sent in the payload. Ignoring.".format(LOG_PREFIX))


if __name__ == '__main__':
main()
4 changes: 4 additions & 0 deletions plugins/opsgenie/alerta_opsgenie.py
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,9 @@
DASHBOARD_URL = os.environ.get('DASHBOARD_URL') or app.config.get('DASHBOARD_URL', '')
LOG.info('Initialized: %s key, %s matchers' % (OPSGENIE_SERVICE_KEY, SERVICE_KEY_MATCHERS))

# when using with OpsGenie Edge connector setting a known source is useful
OPSGENIE_ALERT_SOURCE = os.environ.get('OPSGENIE_ALERT_SOURCE') or app.config.get('OPSGENIE_ALERT_SOURCE', 'Alerta')

class TriggerEvent(PluginBase):

def opsgenie_service_key(self, resource):
Expand Down Expand Up @@ -105,6 +108,7 @@ def post_receive(self, alert):
"entity": alert.environment,
"responders" : self.get_opsgenie_teams(),
"tags": [alert.environment, alert.resource, alert.service[0], alert.event],
"source": "{}".format(OPSGENIE_ALERT_SOURCE),
"details": details
}

Expand Down
Loading

0 comments on commit 69d271e

Please sign in to comment.