Updating cloudwatch stuff to reflect new pod and deployment names #1691

ben851 · 2024-12-03T19:33:57Z

Summary | Résumé

In prep for helmfile migration

Also removed github arc stuff that's no longer used

Related Issues | Cartes liées

https://app.zenhub.com/workspaces/notify-planning-core-6411dfb7c95fb80014e0cab0/issues/gh/cds-snc/notification-planning-core/296

Before merging this PR

Read code suggestions left by the
cds-ai-codereviewer bot. Address
valid suggestions and shortly write down reasons to not address others. To help
with the classification of the comments, please use these reactions on each of the
comments made by the AI review:

Classification	Reaction	Emoticon
Useful	+1	👍
Noisy	eyes	👀
Hallucination	confused	😕
Wrong but teachable	rocket	🚀
Wrong and incorrect	-1	👎

The classifications will be extracted and summarized into an analysis of how helpful
or not the AI code review really is.

Test instructions | Instructions pour tester la modification

After the helm migration, verify the queries and dashboards

[review]

github-actions · 2024-12-04T13:33:37Z

aws/eks/dashboards.tf

@@ -314,7 +314,7 @@ resource "aws_cloudwatch_dashboard" "notify_system" {
            "x": 0,
            "type": "log",
            "properties": {
-                "query": "SOURCE '/aws/containerinsights/${aws_eks_cluster.notification-canada-ca-eks-cluster.name}/application' | fields @timestamp, log, kubernetes.container_name as app, kubernetes.pod_name as pod_name, @logStream\n| filter kubernetes.container_name not like /^celery/\n| fields @message like /HTTP\\/\\d+\\.\\d+\\\\\" 50\\d/ as is_error\n| stats sum(is_error) as errors by bin(1m)\n",
+                "query": "SOURCE '/aws/containerinsights/${aws_eks_cluster.notification-canada-ca-eks-cluster.name}/application' | fields @timestamp, log, kubernetes.container_name as app, kubernetes.pod_name as pod_name, @logStream\n| filter kubernetes.container_name not like /^notify-celery/\n| fields @message like /HTTP\\/\\d+\\.\\d+\\\\\" 50\\d/ as is_error\n| stats sum(is_error) as errors by bin(1m)\n",


Consider using a more specific regex pattern to match notify-celery container names to avoid potential mismatches with other container names that might start with notify-celery.

github-actions · 2024-12-04T13:33:37Z

aws/eks/dashboards.tf

@@ -343,7 +343,7 @@
            "type": "metric",
            "properties": {
                "metrics": [
-                    [ "ContainerInsights/Prometheus", "kube_deployment_status_replicas_available", "namespace", "notification-canada-ca", "ClusterName", "${aws_eks_cluster.notification-canada-ca-eks-cluster.name}", "deployment", "celery-primary", { "region": "${var.region}", "label": "celery-primary" } ],
+                    [ "ContainerInsights/Prometheus", "kube_deployment_status_replicas_available", "namespace", "notification-canada-ca", "ClusterName", "${aws_eks_cluster.notification-canada-ca-eks-cluster.name}", "deployment", "notify-celery-primary", { "region": "${var.region}", "label": "notify-celery-primary" } ],
                    [ "...", "celery-scalable", { "region": "${var.region}", "label": "celery-scalable" } ]


The metric label 'celery-scalable' should be updated to 'notify-celery-scalable' to reflect the new deployment name, similar to the change made on line 346.

so close but I'm changing primary not scalable

github-actions · 2024-12-04T13:33:37Z

aws/eks/dashboards.tf

@@ -362,7 +362,7 @@
            "type": "metric",
            "properties": {
                "metrics": [
-                    [ "ContainerInsights/Prometheus", "kube_deployment_status_replicas_available", "namespace", "notification-canada-ca", "ClusterName", "${aws_eks_cluster.notification-canada-ca-eks-cluster.name}", "deployment", "celery-sms-send-primary", { "region": "${var.region}", "label": "celery-sms-send-primary" } ],
+                    [ "ContainerInsights/Prometheus", "kube_deployment_status_replicas_available", "namespace", "notification-canada-ca", "ClusterName", "${aws_eks_cluster.notification-canada-ca-eks-cluster.name}", "deployment", "notify-celery-sms-send-primary", { "region": "${var.region}", "label": "notify-celery-sms-send-primary" } ],
                    [ "...", "celery-sms-send-scalable", { "region": "${var.region}", "label": "celery-sms-send-scalable" } ]


The metric name 'celery-sms-send-scalable' should be updated to 'notify-celery-sms-send-scalable' to reflect the new deployment name, similar to the change made on line 365.

same reasoning as above

…rm into helm-alarms

…k on staging first only for a while [review]

…m to work on staging first only for a while" This reverts commit 621d0fe.

github-actions · 2024-12-12T15:47:41Z

staging: pinpoint_to_sqs_sms_callbacks

✅ Terraform Init: success
✅ Terraform Validate: success
✅ Terraform Format: success
✅ Terraform Plan: success
✅ Conftest: success

Plan: 0 to add, 1 to change, 0 to destroy

Show summary

CHANGE	NAME
update	`aws_cloudwatch_dashboard.sms-send-rate[0]`

Show plan

Resource actions are indicated with the following symbols:
  ~ update in-place

Terraform will perform the following actions:

  # aws_cloudwatch_dashboard.sms-send-rate[0] will be updated in-place
  ~ resource "aws_cloudwatch_dashboard" "sms-send-rate" {
      ~ dashboard_body = (sensitive value)
        id             = "Specialized-sms-send-rate"
        # (2 unchanged attributes hidden)
    }

Plan: 0 to add, 1 to change, 0 to destroy.

─────────────────────────────────────────────────────────────────────────────

Saved the plan to: plan.tfplan

To perform exactly these actions, run the following command to apply:
    terraform apply "plan.tfplan"

Show Conftest results

WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.pinpoint_deliveries"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.pinpoint_deliveries_failures"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.pinpoint_to_sqs_sms_callbacks_log_group[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.lambda-image-pinpoint-delivery-receipts-errors-critical[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.lambda-image-pinpoint-delivery-receipts-errors-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.logs-1-500-error-1-minute-warning-pinpoint_to_sqs_sms_callbacks-api[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.logs-10-500-error-5-minutes-critical-pinpoint_to_sqs_sms_callbacks-api[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.pinpoint-sms-blocked-as-spam-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.pinpoint-sms-failures-bell-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.pinpoint-sms-failures-bragg-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.pinpoint-sms-failures-freedom-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.pinpoint-sms-failures-iristel-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.pinpoint-sms-failures-maritime-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.pinpoint-sms-failures-mts-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.pinpoint-sms-failures-rogers-warning[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.pinpoint-sms-failures-telus-warning[0]"]
WARN - plan.json - main - Missing...

github-actions · 2024-12-12T15:50:22Z

staging: eks

✅ Terraform Init: success
✅ Terraform Validate: success
✅ Terraform Format: success
✅ Terraform Plan: success
✅ Conftest: success

⚠️ Warning: resources will be destroyed by this change!

Plan: 0 to add, 47 to change, 6 to destroy

Show summary

CHANGE	NAME
delete	`aws_cloudwatch_log_metric_filter.github-arc-runner-alarm[0]`
	`aws_cloudwatch_metric_alarm.github-arc-runner-error-alarm[0]`
	`aws_cloudwatch_query_definition.gh-arc-errors[0]`
	`aws_iam_role.secrets_csi_github`
	`aws_iam_role_policy_attachment.parameters_csi_github`
	`aws_iam_role_policy_attachment.secrets_csi_github`
update	`aws_cloudwatch_dashboard.errors[0]`
	`aws_cloudwatch_dashboard.kubernetes[0]`
	`aws_cloudwatch_dashboard.notify_system[0]`
	`aws_cloudwatch_log_metric_filter.admin-evicted-pods[0]`
	`aws_cloudwatch_log_metric_filter.api-evicted-pods[0]`
	`aws_cloudwatch_log_metric_filter.celery-evicted-pods[0]`
	`aws_cloudwatch_log_metric_filter.document-download-evicted-pods[0]`
	`aws_cloudwatch_log_metric_filter.documentation-evicted-pods[0]`
	`aws_cloudwatch_metric_alarm.admin-pods-high-cpu-warning[0]`
	`aws_cloudwatch_metric_alarm.admin-pods-high-memory-warning[0]`
	`aws_cloudwatch_metric_alarm.admin-replicas-unavailable[0]`
	`aws_cloudwatch_metric_alarm.api-pods-high-cpu-warning[0]`
	`aws_cloudwatch_metric_alarm.api-pods-high-memory-warning[0]`
	`aws_cloudwatch_metric_alarm.api-replicas-unavailable[0]`
	`aws_cloudwatch_metric_alarm.celery-beat-replicas-unavailable[0]`
	`aws_cloudwatch_metric_alarm.celery-email-send-primary-replicas-unavailable[0]`
	`aws_cloudwatch_metric_alarm.celery-email-send-scalable-replicas-unavailable[0]`
	`aws_cloudwatch_metric_alarm.celery-primary-pods-high-cpu-warning[0]`
	`aws_cloudwatch_metric_alarm.celery-primary-pods-high-memory-warning[0]`
	`aws_cloudwatch_metric_alarm.celery-primary-replicas-unavailable[0]`
	`aws_cloudwatch_metric_alarm.celery-scalable-pods-high-cpu-warning[0]`
	`aws_cloudwatch_metric_alarm.celery-scalable-replicas-unavailable[0]`
	`aws_cloudwatch_metric_alarm.celery-sms-pods-high-cpu-warning[0]`
	`aws_cloudwatch_metric_alarm.celery-sms-pods-high-memory-warning[0]`
	`aws_cloudwatch_metric_alarm.celery-sms-replicas-unavailable[0]`
	`aws_cloudwatch_metric_alarm.celery-sms-send-primary-replicas-unavailable[0]`
	`aws_cloudwatch_metric_alarm.celery-sms-send-scalable-replicas-unavailable[0]`
	`aws_cloudwatch_metric_alarm.document-download-api-replicas-unavailable[0]`
	`aws_cloudwatch_metric_alarm.documentation-replicas-unavailable[0]`
	`aws_cloudwatch_metric_alarm.karpenter-replicas-unavailable[0]`
	`aws_cloudwatch_query_definition.admin-50X-errors[0]`
	`aws_cloudwatch_query_definition.api-50X-errors[0]`
	`aws_cloudwatch_query_definition.bounce-rate-critical[0]`
	`aws_cloudwatch_query_definition.bounce-rate-warning[0]`
	`aws_cloudwatch_query_definition.bounce-rate-warnings-and-criticals[0]`
	`aws_cloudwatch_query_definition.callback-errors-by-url[0]`
	`aws_cloudwatch_query_definition.callback-failures[0]`
	`aws_cloudwatch_query_definition.callback-max-retry-failures-by-service[0]`
	`aws_cloudwatch_query_definition.celery-errors[0]`
	`aws_cloudwatch_query_definition.celery-filter-by-job[0]`
	`aws_cloudwatch_query_definition.celery-filter-by-notification-id[0]`
	`aws_cloudwatch_query_definition.celery-queues[0]`
	`aws_cloudwatch_query_definition.celery-starts[0]`
	`aws_cloudwatch_query_definition.celery-worker-exited-normally[0]`
	`aws_cloudwatch_query_definition.celery-worker-exited-prematurely[0]`
	`aws_cloudwatch_query_definition.celery-worker-exits-cold-vs-warm[0]`
	`aws_cloudwatch_query_definition.retry-attemps-by-duration[0]`

✂ Warning: plan has been truncated! See the full plan in the logs.

Show plan

Resource actions are indicated with the following symbols:
  ~ update in-place
  - destroy

Terraform will perform the following actions:

  # aws_cloudwatch_dashboard.errors[0] will be updated in-place
  ~ resource "aws_cloudwatch_dashboard" "errors" {
      ~ dashboard_body = (sensitive value)
        id             = "Errors"
        # (2 unchanged attributes hidden)
    }

  # aws_cloudwatch_dashboard.kubernetes[0] will be updated in-place
  ~ resource "aws_cloudwatch_dashboard" "kubernetes" {
      ~ dashboard_body = jsonencode(
          ~ {
              ~ widgets = [
                    {
                        height     = 15
                        properties = {
                            aggregateBy   = {
                                func = "MAX"
                                key  = "Name"
                            }
                            labels        = [
                                {
                                    key   = "Name"
                                    value = "notification-canada-ca"
                                },
                            ]
                            metrics       = [
                                {
                                    metricName   = "node_cpu_limit"
                                    resourceType = "AWS::EKS::Cluster"
                                    stat         = "Maximum"
                                },
                                {
                                    metricName   = "node_cpu_usage_total"
                                    resourceType = "AWS::EKS::Cluster"
                                    stat         = "Maximum"
                                },
                                {
                                    metricName   = "node_memory_limit"
                                    resourceType = "AWS::EKS::Cluster"
                                    stat         = "Maximum"
                                },
                                {
                                    metricName   = "node_memory_working_set"
                                    resourceType = "AWS::EKS::Cluster"
                                    stat         = "Maximum"
                                },
                                {
                                    metricName   = "cluster_failed_node_count"
                                    resourceType = "AWS::EKS::Cluster"
                                    stat         = "Sum"
                                },
                                {
                                    metricName   = "cluster_node_count"
                                    resourceType = "AWS::EKS::Cluster"
                                    stat         = "Sum"
                                },
                            ]
                            period        = 300
                            region        = "ca-central-1"
                            splitBy       = "Name"
                            widgetOptions = {
                                legend        = {
                                    position = "bottom"
                                }
                                rowsPerPage   = 50
                                stacked       = false
                                view          = "timeSeries"
                                widgetsPerRow = 2
                            }
                        }
                        type       = "explorer"
                        width      = 24
                        x          = 0
                        y          = 14
                    },
                  ~ {
                      ~ properties = {
                          ~ metrics = [
                              ~ [
                                    # (2 unchanged elements hidden)
                                    "PodName",
                                  ~ "admin" -> "notify-admin",
                                    "ClusterName",
                                    # (3 unchanged elements hidden)
                                ],
                              ~ [
                                    "...",
                                  ~ "api" -> "notify-api",
                                    ".",
                                    # (3 unchanged elements hidden)
                                ],
                              ~ [
                                    "...",
                                  ~ "celery" -> "notify-celery",
                                    ".",
                                    # (3 unchanged elements hidden)
                                ],
                              ~ [
                                    "...",
                                  ~ "document-download-api" -> "notify-document-download",
                                    ".",
                                    # (3 unchanged elements hidden)
                                ],
                            ]
                            # (7 unchanged attributes hidden)
                        }
                        # (5 unchanged attributes hidden)
                    },
                  ~ {
                      ~ properties = {
                          ~ metrics = [
                              ~ [
                                    # (4 unchanged elements hidden)
                                    "Service",
                                  ~ "api" -> "notify-api",
                                    "Namespace",
                                    # (2 unchanged elements hidden)
                                ],
                              ~ [
                                    "...",
                                  ~ "celery" -> "notify-celery",
                                    ".",
                                    # (2 unchanged elements hidden)
                                ],
                              ~ [
                                    "...",
                                  ~ "admin" -> "notify-admin",
                                    ".",
                                    # (2 unchanged elements hidden)
                                ],
                              ~ [
                                    # (4 unchanged elements hidden)
                                    ".",
                                  ~ "celery" -> "notify-celery",
                                    ".",
                                    # (2 unchanged elements hidden)
                                ],
                              ~ [
                                    "...",
                                  ~ "api" -> "notify-api",
                                    ".",
                                    # (2 unchanged elements hidden)
                                ],
                              ~ [
                                    "...",
                                  ~ "admin" -> "notify-admin",
                                    ".",
                                    # (1 unchanged element hidden)
                                ],
                            ]
                            # (7 unchanged attributes hidden)
                        }
                        # (5 unchanged attributes hidden)
                    },
                ]
            }
        )
        id             = "Kubernetes"
        # (2 unchanged attributes hidden)
    }

  # aws_cloudwatch_dashboard.notify_system[0] will be updated in-place
  ~ resource "aws_cloudwatch_dashboard" "notify_system" {
      ~ dashboard_body = (sensitive value)
        id             = "Notify-System-Overview"
        # (2 unchanged attributes hidden)
    }

  # aws_cloudwatch_log_metric_filter.admin-evicted-pods[0] will be updated in-place
  ~ resource "aws_cloudwatch_log_metric_filter" "admin-evicted-pods" {
        id             = "admin-evicted-pods"
        name           = "admin-evicted-pods"
      ~ pattern        = "{ ($.reason = \"Evicted\") && ($.kube_pod_status_reason = 1) && ($.pod = \"admin-*\") }" -> "{ ($.reason = \"Evicted\") && ($.kube_pod_status_reason = 1) && ($.pod = \"notify-admin-*\") }"
        # (1 unchanged attribute hidden)

        # (1 unchanged block hidden)
    }

  # aws_cloudwatch_log_metric_filter.api-evicted-pods[0] will be updated in-place
  ~ resource "aws_cloudwatch_log_metric_filter" "api-evicted-pods" {
        id             = "api-evicted-pods"
        name           = "api-evicted-pods"
      ~ pattern        = "{ ($.reason = \"Evicted\") && ($.kube_pod_status_reason = 1) && ($.pod = \"api-*\") }" -> "{ ($.reason = \"Evicted\") && ($.kube_pod_status_reason = 1) && ($.pod = \"notify-api-*\") }"
        # (1 unchanged attribute hidden)

        # (1 unchanged block hidden)
    }

  # aws_cloudwatch_log_metric_filter.celery-evicted-pods[0] will be updated in-place
  ~ resource "aws_cloudwatch_log_metric_filter" "celery-evicted-pods" {
        id             = "celery-evicted-pods"
        name           = "celery-evicted-pods"
      ~ pattern        = "{ ($.reason = \"Evicted\") && ($.kube_pod_status_reason = 1) && ($.pod = \"celery-*\") }" -> "{ ($.reason = \"Evicted\") && ($.kube_pod_status_reason = 1) && ($.pod = \"notify-celery-*\") }"
        # (1 unchanged attribute hidden)

        # (1 unchanged block hidden)
    }

  # aws_cloudwatch_log_metric_filter.document-download-evicted-pods[0] will be updated in-place
  ~ resource "aws_cloudwatch_log_metric_filter" "document-download-evicted-pods" {
        id             = "document-download-evicted-pods"
        name           = "document-download-evicted-pods"
      ~ pattern        = "{ ($.reason = \"Evicted\") && ($.kube_pod_status_reason = 1) && ($.pod = \"document-download-*\") }" -> "{ ($.reason = \"Evicted\") && ($.kube_pod_status_reason = 1) && ($.pod = \"notify-document-download-*\") }"
        # (1 unchanged attribute hidden)

        # (1 unchanged block hidden)
    }

  # aws_cloudwatch_log_metric_filter.documentation-evicted-pods[0] will be updated in-place
  ~ resource "aws_cloudwatch_log_metric_filter" "documentation-evicted-pods" {
        id             = "documentation-evicted-pods"
        name           = "documentation-evicted-pods"
      ~ pattern        = "{ ($.reason = \"Evicted\") && ($.kube_pod_status_reason = 1) && ($.pod = \"documentation-*\") }" -> "{ ($.reason = \"Evicted\") && ($.kube_pod_status_reason = 1) && ($.pod = \"notify-documentation-*\") }"
        # (1 unchanged attribute hidden)

        # (1 unchanged block hidden)
    }

  # aws_cloudwatch_log_metric_filter.github-arc-runner-alarm[0] will be destroyed
  # (because aws_cloudwatch_log_metric_filter.github-arc-runner-alarm is not in configuration)
  - resource "aws_cloudwatch_log_metric_filter" "github-arc-runner-alarm" {
      - id             = "GitHub ARC Runners Write Alarm" -> null
      - log_group_name = "/aws/containerinsights/notification-canada-ca-staging-eks-cluster/application" -> null
      - name           = "GitHub ARC Runners Write Alarm" -> null
      - pattern        = "{ $.kubernetes.pod_name = \"github-arc-ss-staging-*-runner-*\"  && $.log = \"*ERROR*\" }" -> null

      - metric_transformation {
          - dimensions    = {} -> null
          - name          = "aggregating-github-arc-runner-alarm" -> null
          - namespace     = "LogMetrics" -> null
          - unit          = "None" -> null
          - value         = "1" -> null
            # (1 unchanged attribute hidden)
        }
    }

  # aws_cloudwatch_metric_alarm.admin-pods-high-cpu-warning[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "admin-pods-high-cpu-warning" {
      ~ dimensions                            = {
          ~ "Service"     = "admin" -> "notify-admin"
            # (2 unchanged elements hidden)
        }
        id                                    = "admin-pods-high-cpu-warning"
        tags                                  = {}
        # (21 unchanged attributes hidden)
    }

  # aws_cloudwatch_metric_alarm.admin-pods-high-memory-warning[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "admin-pods-high-memory-warning" {
      ~ dimensions                            = {
          ~ "Service"     = "admin" -> "notify-admin"
            # (2 unchanged elements hidden)
        }
        id                                    = "admin-pods-high-memory-warning"
        tags                                  = {}
        # (21 unchanged attributes hidden)
    }

  # aws_cloudwatch_metric_alarm.admin-replicas-unavailable[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "admin-replicas-unavailable" {
        id                                    = "admin-replicas-unavailable"
        tags                                  = {}
      ~ treat_missing_data                    = "notBreaching" -> "breaching"
        # (21 unchanged attributes hidden)

      - metric_query {
          - id          = "m1" -> null
          - period      = 0 -> null
          - return_data = true -> null
            # (3 unchanged attributes hidden)

          - metric {
              - dimensions  = {
                  - "ClusterName" = "notification-canada-ca-staging-eks-cluster"
                  - "deployment"  = "admin"
                  - "namespace"   = "notification-canada-ca"
                } -> null
              - metric_name = "kube_deployment_status_replicas_unavailable" -> null
              - namespace   = "ContainerInsights/Prometheus" -> null
              - period      = 300 -> null
              - stat        = "Average" -> null
                # (1 unchanged attribute hidden)
            }
        }
      + metric_query {
          + id          = "m1"
          + return_data = true
            # (3 unchanged attributes hidden)

          + metric {
              + dimensions  = {
                  + "ClusterName" = "notification-canada-ca-staging-eks-cluster"
                  + "deployment"  = "notify-admin"
                  + "namespace"   = "notification-canada-ca"
                }
              + metric_name = "kube_deployment_status_replicas_unavailable"
              + namespace   = "ContainerInsights/Prometheus"
              + period      = 300
              + stat        = "Average"
                # (1 unchanged attribute hidden)
            }
        }
    }

  # aws_cloudwatch_metric_alarm.api-pods-high-cpu-warning[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "api-pods-high-cpu-warning" {
      ~ dimensions                            = {
          ~ "Service"     = "api" -> "notify-api"
            # (2 unchanged elements hidden)
        }
        id                                    = "api-pods-high-cpu-warning"
        tags                                  = {}
        # (21 unchanged attributes hidden)
    }

  # aws_cloudwatch_metric_alarm.api-pods-high-memory-warning[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "api-pods-high-memory-warning" {
      ~ dimensions                            = {
          ~ "Service"     = "api" -> "notify-api"
            # (2 unchanged elements hidden)
        }
        id                                    = "api-pods-high-memory-warning"
        tags                                  = {}
        # (21 unchanged attributes hidden)
    }

  # aws_cloudwatch_metric_alarm.api-replicas-unavailable[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "api-replicas-unavailable" {
        id                                    = "api-replicas-unavailable"
        tags                                  = {}
      ~ treat_missing_data                    = "notBreaching" -> "breaching"
        # (21 unchanged attributes hidden)

      - metric_query {
          - id          = "m1" -> null
          - period      = 0 -> null
          - return_data = true -> null
            # (3 unchanged attributes hidden)

          - metric {
              - dimensions  = {
                  - "ClusterName" = "notification-canada-ca-staging-eks-cluster"
                  - "deployment"  = "api"
                  - "namespace"   = "notification-canada-ca"
                } -> null
              - metric_name = "kube_deployment_status_replicas_unavailable" -> null
              - namespace   = "ContainerInsights/Prometheus" -> null
              - period      = 300 -> null
              - stat        = "Average" -> null
                # (1 unchanged attribute hidden)
            }
        }
      + metric_query {
          + id          = "m1"
          + return_data = true
            # (3 unchanged attributes hidden)

          + metric {
              + dimensions  = {
                  + "ClusterName" = "notification-canada-ca-staging-eks-cluster"
                  + "deployment"  = "notify-api"
                  + "namespace"   = "notification-canada-ca"
                }
              + metric_name = "kube_deployment_status_replicas_unavailable"
              + namespace   = "ContainerInsights/Prometheus"
              + period      = 300
              + stat        = "Average"
                # (1 unchanged attribute hidden)
            }
        }
    }

  # aws_cloudwatch_metric_alarm.celery-beat-replicas-unavailable[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "celery-beat-replicas-unavailable" {
        id                                    = "celery-beat-replicas-unavailable"
        tags                                  = {}
      ~ treat_missing_data                    = "notBreaching" -> "breaching"
        # (21 unchanged attributes hidden)

      - metric_query {
          - id          = "m1" -> null
          - period      = 0 -> null
          - return_data = true -> null
            # (3 unchanged attributes hidden)

          - metric {
              - dimensions  = {
                  - "ClusterName" = "notification-canada-ca-staging-eks-cluster"
                  - "deployment"  = "celery-beat"
                  - "namespace"   = "notification-canada-ca"
                } -> null
              - metric_name = "kube_deployment_status_replicas_unavailable" -> null
              - namespace   = "ContainerInsights/Prometheus" -> null
              - period      = 300 -> null
              - stat        = "Average" -> null
                # (1 unchanged attribute hidden)
            }
        }
      + metric_query {
          + id          = "m1"
          + return_data = true
            # (3 unchanged attributes hidden)

          + metric {
              + dimensions  = {
                  + "ClusterName" = "notification-canada-ca-staging-eks-cluster"
                  + "deployment"  = "notify-celery-beat"
                  + "namespace"   = "notification-canada-ca"
                }
              + metric_name = "kube_deployment_status_replicas_unavailable"
              + namespace   = "ContainerInsights/Prometheus"
              + period      = 300
              + stat        = "Average"
                # (1 unchanged attribute hidden)
            }
        }
    }

  # aws_cloudwatch_metric_alarm.celery-email-send-primary-replicas-unavailable[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "celery-email-send-primary-replicas-unavailable" {
        id                                    = "celery-email-send-primary-replicas-unavailable"
        tags                                  = {}
      ~ treat_missing_data                    = "notBreaching" -> "breaching"
        # (21 unchanged attributes hidden)

      - metric_query {
          - id          = "m1" -> null
          - period      = 0 -> null
          - return_data = true -> null
            # (3 unchanged attributes hidden)

          - metric {
              - dimensions  = {
                  - "ClusterName" = "notification-canada-ca-staging-eks-cluster"
                  - "deployment"  = "celery-email-send-primary"
                  - "namespace"   = "notification-canada-ca"
                } -> null
              - metric_name = "kube_deployment_status_replicas_unavailable" -> null
              - namespace   = "ContainerInsights/Prometheus" -> null
              - period      = 300 -> null
              - stat        = "Minimum" -> null
                # (1 unchanged attribute hidden)
            }
        }
      + metric_query {
          + id          = "m1"
          + return_data = true
            # (3 unchanged attributes hidden)

          + metric {
              + dimensions  = {
                  + "ClusterName" = "notification-canada-ca-staging-eks-cluster"
                  + "deployment"  = "notify-celery-email-send-primary"
                  + "namespace"   = "notification-canada-ca"
                }
              + metric_name = "kube_deployment_status_replicas_unavailable"
              + namespace   = "ContainerInsights/Prometheus"
              + period      = 300
              + stat        = "Minimum"
                # (1 unchanged attribute hidden)
            }
        }
    }

  # aws_cloudwatch_metric_alarm.celery-email-send-scalable-replicas-unavailable[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "celery-email-send-scalable-replicas-unavailable" {
        id                                    = "celery-email-send-scalable-replicas-unavailable"
        tags                                  = {}
      ~ treat_missing_data                    = "notBreaching" -> "breaching"
        # (21 unchanged attributes hidden)

      - metric_query {
          - id          = "m1" -> null
          - period      = 0 -> null
          - return_data = true -> null
            # (3 unchanged attributes hidden)

          - metric {
              - dimensions  = {
                  - "ClusterName" = "notification-canada-ca-staging-eks-cluster"
                  - "deployment"  = "celery-email-send-scalable"
                  - "namespace"   = "notification-canada-ca"
                } -> null
              - metric_name = "kube_deployment_status_replicas_unavailable" -> null
              - namespace   = "ContainerInsights/Prometheus" -> null
              - period      = 300 -> null
              - stat        = "Minimum" -> null
                # (1 unchanged attribute hidden)
            }
        }
      + metric_query {
          + id          = "m1"
          + return_data = true
            # (3 unchanged attributes hidden)

          + metric {
              + dimensions  = {
                  + "ClusterName" = "notification-canada-ca-staging-eks-cluster"
                  + "deployment"  = "notify-celery-email-send-scalable"
                  + "namespace"   = "notification-canada-ca"
                }
              + metric_name = "kube_deployment_status_replicas_unavailable"
              + namespace   = "ContainerInsights/Prometheus"
              + period      = 300
              + stat        = "Minimum"
                # (1 unchanged attribute hidden)
            }
        }
    }

  # aws_cloudwatch_metric_alarm.celery-primary-pods-high-cpu-warning[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "celery-primary-pods-high-cpu-warning" {
      ~ dimensions                            = {
          ~ "Service"     = "celery-primary" -> "notify-celery-primary"
            # (2 unchanged elements hidden)
        }
        id                                    = "celery-primary-pods-high-cpu-warning"
        tags                                  = {}
        # (21 unchanged attributes hidden)
    }

  # aws_cloudwatch_metric_alarm.celery-primary-pods-high-memory-warning[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "celery-primary-pods-high-memory-warning" {
      ~ dimensions                            = {
          ~ "Service"     = "celery-primary" -> "notify-celery-primary"
            # (2 unchanged elements hidden)
        }
        id                                    = "celery-primary-pods-high-memory-warning"
        tags                                  = {}
        # (21 unchanged attributes hidden)
    }

  # aws_cloudwatch_metric_alarm.celery-primary-replicas-unavailable[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "celery-primary-replicas-unavailable" {
        id                                    = "celery-primary-replicas-unavailable"
        tags                                  = {}
      ~ treat_missing_data                    = "notBreaching" -> "breaching"
        # (21 unchanged attributes hidden)

      - metric_query {
          - id          = "m1" -> null
          - period      = 0 -> null
          - return_data = true -> null
            # (3 unchanged attributes hidden)

          - metric {
              - dimensions  = {
                  - "ClusterName" = "notification-canada-ca-staging-eks-cluster"
                  - "deployment"  = "celery-primary"
                  - "namespace"   = "notification-canada-ca"
                } -> null
              - metric_name = "kube_deployment_status_replicas_unavailable" -> null
              - namespace   = "ContainerInsights/Prometheus" -> null
              - period      = 300 -> null
              - stat        = "Minimum" -> null
                # (1 unchanged attribute hidden)
            }
        }
      + metric_query {
          + id          = "m1"
          + return_data = true
            # (3 unchanged attributes hidden)

          + metric {
              + dimensions  = {
                  + "ClusterName" = "notification-canada-ca-staging-eks-cluster"
                  + "deployment"  = "notify-celery-primary"
                  + "namespace"   = "notification-canada-ca"
                }
              + metric_name = "kube_deployment_status_replicas_unavailable"
              + namespace   = "ContainerInsights/Prometheus"
              + period      = 300
              + stat        = "Minimum"
                # (1 unchanged attribute hidden)
            }
        }
    }

  # aws_cloudwatch_metric_alarm.celery-scalable-pods-high-cpu-warning[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "celery-scalable-pods-high-cpu-warning" {
      ~ dimensions                            = {
          ~ "Service"     = "celery-scalable" -> "notify-celery-scalable"
            # (2 unchanged elements hidden)
        }
        id                                    = "celery-scalable-pods-high-cpu-warning"
        tags                                  = {}
        # (21 unchanged attributes hidden)
    }

  # aws_cloudwatch_metric_alarm.celery-scalable-replicas-unavailable[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "celery-scalable-replicas-unavailable" {
        id                                    = "celery-scalable-replicas-unavailable"
        tags                                  = {}
      ~ treat_missing_data                    = "notBreaching" -> "breaching"
        # (21 unchanged attributes hidden)

      - metric_query {
          - id          = "m1" -> null
          - period      = 0 -> null
          - return_data = true -> null
            # (3 unchanged attributes hidden)

          - metric {
              - dimensions  = {
                  - "ClusterName" = "notification-canada-ca-staging-eks-cluster"
                  - "deployment"  = "celery-scalable"
                  - "namespace"   = "notification-canada-ca"
                } -> null
              - metric_name = "kube_deployment_status_replicas_unavailable" -> null
              - namespace   = "ContainerInsights/Prometheus" -> null
              - period      = 300 -> null
              - stat        = "Minimum" -> null
                # (1 unchanged attribute hidden)
            }
        }
      + metric_query {
          + id          = "m1"
          + return_data = true
            # (3 unchanged attributes hidden)

          + metric {
              + dimensions  = {
                  + "ClusterName" = "notification-canada-ca-staging-eks-cluster"
                  + "deployment"  = "notify-celery-scalable"
                  + "namespace"   = "notification-canada-ca"
                }
              + metric_name = "kube_deployment_status_replicas_unavailable"
              + namespace   = "ContainerInsights/Prometheus"
              + period      = 300
              + stat        = "Minimum"
                # (1 unchanged attribute hidden)
            }
        }
    }

  # aws_cloudwatch_metric_alarm.celery-sms-pods-high-cpu-warning[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "celery-sms-pods-high-cpu-warning" {
      ~ dimensions                            = {
          ~ "Service"     = "celery-sms" -> "notify-celery-sms"
            # (2 unchanged elements hidden)
        }
        id                                    = "celery-sms-pods-high-cpu-warning"
        tags                                  = {}
        # (21 unchanged attributes hidden)
    }

  # aws_cloudwatch_metric_alarm.celery-sms-pods-high-memory-warning[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "celery-sms-pods-high-memory-warning" {
      ~ dimensions                            = {
          ~ "Service"     = "celery-sms" -> "notify-celery-sms"
            # (2 unchanged elements hidden)
        }
        id                                    = "celery-sms-pods-high-memory-warning"
        tags                                  = {}
        # (21 unchanged attributes hidden)
    }

  # aws_cloudwatch_metric_alarm.celery-sms-replicas-unavailable[0] will be updated in-place
  ~ resource "aws_cloudwatch_metric_alarm" "celery-sms-replicas-unavailable" {
        id                                    = "celery-sms-replicas-unavailable"
        tags                                  = {}
      ~ treat_missing_data                    = "notBreaching" -> "breaching"
        #...

Show Conftest results

WARN - plan.json - main - Cloudwatch log metric pattern is invalid: ["aws_cloudwatch_log_metric_filter.celery-error[0]"]
WARN - plan.json - main - Cloudwatch log metric pattern is invalid: ["aws_cloudwatch_log_metric_filter.scanfiles-timeout[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_acm_certificate.client_vpn"]
WARN - plan.json - main - Missing Common Tags: ["aws_acm_certificate.notification-canada-ca"]
WARN - plan.json - main - Missing Common Tags: ["aws_acm_certificate.notification-canada-ca-alt[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb.notification-canada-ca"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_listener.internal_alb_tls"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_listener.notification-canada-ca"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.internal_nginx_http"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.notification-canada-ca-admin"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.notification-canada-ca-api"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.notification-canada-ca-document"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.notification-canada-ca-document-api"]
WARN - plan.json - main - Missing Common Tags: ["aws_alb_target_group.notification-canada-ca-documentation"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.blazer[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.notification-canada-ca-eks-application-logs[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.notification-canada-ca-eks-cluster-logs[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_log_group.notification-canada-ca-eks-prometheus-logs[0]"]
WARN - plan.json - main - Missing Common Tags: ["aws_cloudwatch_metric_alarm.admin-evicted-pods[0]"]
WARN - plan.json - main - Missing Common Tags:...

github-actions · 2024-12-12T16:21:05Z

Updating alarms ⏰? Great! Please update the Google Sheet and add a 👍 to this message after 🙏

P0NDER0SA

Looks good. Let's do this

Updating cloudwatch stuff to reflect new pod and deployment names

0cdff2d

[review]

github-actions bot reviewed Dec 4, 2024

View reviewed changes

ben851 and others added 11 commits December 5, 2024 10:47

Merge remote-tracking branch 'origin/main' into helm-alarms

616e900

switch to breaching

ebb5836

Merge remote-tracking branch 'origin/main' into helm-alarms

76e562d

Merge branch 'main' into helm-alarms

1089283

alarm fixes

173d0b5

Merge branch 'helm-alarms' of github.com:cds-snc/notification-terrafo…

97b97e6

…rm into helm-alarms

making these conditional for our production rollout. Need them to wor…

621d0fe

…k on staging first only for a while [review]

Revert "making these conditional for our production rollout. Need the…

db3683f

…m to work on staging first only for a while" This reverts commit 621d0fe.

formatting

bf38f1d

self referencing whoops

ed5f60e

Merge branch 'main' into helm-alarms

4e66850

ben851 marked this pull request as ready for review December 12, 2024 16:20

ben851 requested a review from jimleroyer as a code owner December 12, 2024 16:20

P0NDER0SA approved these changes Dec 12, 2024

View reviewed changes

ben851 merged commit b9c72ef into main Dec 12, 2024
30 checks passed

ben851 deleted the helm-alarms branch December 12, 2024 16:21

ben851 mentioned this pull request Dec 17, 2024

tf release 2.17.36 #1701

Merged

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating cloudwatch stuff to reflect new pod and deployment names #1691

Updating cloudwatch stuff to reflect new pod and deployment names #1691

ben851 commented Dec 3, 2024 •

edited

Loading

github-actions bot Dec 4, 2024

github-actions bot Dec 4, 2024

ben851 Dec 4, 2024

github-actions bot Dec 4, 2024

ben851 Dec 4, 2024

github-actions bot commented Dec 12, 2024

github-actions bot commented Dec 12, 2024

github-actions bot commented Dec 12, 2024

P0NDER0SA left a comment

Updating cloudwatch stuff to reflect new pod and deployment names #1691

Updating cloudwatch stuff to reflect new pod and deployment names #1691

Conversation

ben851 commented Dec 3, 2024 • edited Loading

Summary | Résumé

Related Issues | Cartes liées

Before merging this PR

Test instructions | Instructions pour tester la modification

github-actions bot Dec 4, 2024

Choose a reason for hiding this comment

github-actions bot Dec 4, 2024

Choose a reason for hiding this comment

ben851 Dec 4, 2024

Choose a reason for hiding this comment

github-actions bot Dec 4, 2024

Choose a reason for hiding this comment

ben851 Dec 4, 2024

Choose a reason for hiding this comment

github-actions bot commented Dec 12, 2024

staging: pinpoint_to_sqs_sms_callbacks

github-actions bot commented Dec 12, 2024

staging: eks

github-actions bot commented Dec 12, 2024

P0NDER0SA left a comment

Choose a reason for hiding this comment

ben851 commented Dec 3, 2024 •

edited

Loading