All services, applications, and consumers SHOULD have a CloudWatch Alarm set up.
Multiple alarms MAY be set up.
Owners SHOULD identify and monitor other key success metrics.
Alarms MUST be named consistently:
- The name MUST be in upper camel case and MUST be prefixed with the component name. For example:
BibServiceErrorAlarm
.
Alarms SHOULD generally use the sum of a metric to trigger an alarm.
Alarms SHOULD notify a SNS topic when triggered.
Alarms SHOULD be added to an alarms dashboard.
Alarms SHOULD be tested as part of the Production Readiness process:
- This can often be done during a CHAOS session. For example:
- Turning off an upstream service or reducing or raising alarm threshold in order to artificially trigger alarm.
- Verify all people get emails that should; and that dashboard updates appropriately.
All services, applications, and consumers MUST create a metric filter for their log messages of severity ERROR
and greater.
The metric filter MUST be named and namespaced consistently:
- The namespace MUST be
LogMetrics
. - The name MUST be in upper camel case and MUST be prefixed with the component name. For example:
BibServiceError
.
The RECOMMENDED search for a metric filter is { $.levelCode <= 3 }
.
The metric created from the metric filter MUST be used to trigger an alarm.
-
Log in the AWS Service with your credentials.
-
Go to
CloudWatch
by clickingServices
dropdown menu from the top navigation and chooseCloudWatch
. -
Go to
Logs
from the left navigation and find the log group you want to build the metric on from theLog Groups
list.- Warning: Be careful about choosing the correct log group - especially for Elastic Beanstalk log groups (see a previous post-mortem that had metric filters improperly configured).
-
Click the left circle of the log, and then go up to the top of the list, click
Create Metric Filter
. -
In
Define Logs Metric Filter
page, enter the filter you like inFilter Pattern
field, such as{ $.levelCode <= 3 }
. More details forFilter Pattern
, see here. Then clickAssign Metric
. -
In
Create Metric Filter and Assign a Metric
page, enter your filter name and metric name based on the conventions from the previous paragraph. Also, notice that all the customized metrics SHOULD be assigned in theMetric Namespace
ofLogMetrics
. Click Create Filter to finish creating the metric.
Additional alarms MAY be set up for other metrics. For example:
- HTTP errors (5xx, 4xx status codes)
- Lambda errors
- Kinesis errors
-
You SHOULD set up a SNS Topic before you set up the alarm, so the alarm will have the place to go.
-
Go to
CloudWatch
by clickingServices
dropdown menu from the top navigation and chooseCloudWatch
. -
If it is your first alarm of the metric, a big chance you might not have any logs in the metric yet, thus you will not be able to see the metric by searching it. To set up the alarm on it, go to
Logs
from the left navigation. And find the log that has your metric fromLog Groups
. -
On the log, you will find that it indicates how many filters it has on
Metric Filters
column. Click the link such as1 filter
. -
Now you will be on the page that has all the filters the log has. On the filter you want, you can find a link to
Create Alarm
on the top right corner of the filter block. -
In the pop-up
Create Alarm
window, first enter the name and the threshold of the alarm underAlarm Threshold
section. The name SHOULD follow the naming conventions. InWhenever
section,is:
is usually set up to greater and equal to 1. -
In
Additional settings
section, setTreat missing data as:
as good. -
In
Actions
section, setWhenever this alarm:
asState is ALARM
, andSend notification to:
as the SNS Topic you have already created. -
In
Alarm Preview
section, choose your preferred period yet the Statistic SHOULD beStandard
andSum
. -
Click
Create Alarm
to finish it.
The SNS topic SHOULD be setup before creating an alarm.
The SNS topic MUST be named consistently and SHOULD be reused for all the component alarms.
- The name MUST be in upper camel case and MUST be prefixed with the component name. For example:
BibServiceErrorAlarm
.
The SNS topic SHOULD notify the component owner(s) by email or other method.
-
Click
Services
dropdown menu from the top navigation and chooseSimple Notification Service
under the category ofMessaging
. -
Click
Create topic
on the page, and name your topic following the naming conventions from the previous paragraph. And then create the topic. -
You should be on
Topic details
page now. InSubscription
section, clickCreate subscription
. In theProtocol
dropdown menu, choose the method you want to recieve the notifications. Generally we useEmail.
Then, in theEndpoint
field, enter your email address. Finally, clickCreate subscription
.