Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MVP - Endpoint Monitoring and Updates: IaC initiative #94846

Open
5 tasks
Tracked by #92005
jennb33 opened this issue Oct 11, 2024 · 0 comments
Open
5 tasks
Tracked by #92005

MVP - Endpoint Monitoring and Updates: IaC initiative #94846

jennb33 opened this issue Oct 11, 2024 · 0 comments
Labels
2024 alert updates Update UI and/or Content for Alerts devops practice area categorization -- NOT a team assignment endpoint Used to identify endpoints that will be kept in the endpoint-library board in zenhub. engineering Engineering topics needs-refinement Identifies tickets that need to be refined platform-product-team terraform

Comments

@jennb33
Copy link
Contributor

jennb33 commented Oct 11, 2024

User Story

As the Managers and Developers of the VA Platform,
We need to ensure the proper storage and management of monitoring configurations,
So that the right Terraform configurations are applied to endpoints and properly managed.

Issue Description

The objective is to ensure continuous awareness of the performance of http://va.gov/ systems and to be promptly notified of any system behavior that negatively impacts Veterans. It is important to ensure there is an awareness around show traffic levels, error rates, system latency, etc. To effectively address issues in a timely manner, teams should implement dashboards, monitors, and alerts for their systems. High priority applications may need to be integrated into the Watchtower program, which provides oversight for critical systems. Teams can mark specific monitors for Watch Officer review by using the watchtower tag, and all monitors included in Watchtower must have clear, actionable instructions for the Watch Officer to follow. This ensures that the necessary information is readily available to address any problems quickly and efficiently.

See the discovery document that was stood up for this effort.

The vision for this work would be to leverage discovery work that was done for the Zero Silent Failures effort. There should be lists of the endpoints and APIs in these tickets (92876, 92926)

Tasks

  • All monitoring configurations, including dashboards, alerts, and monitors, must be stored and managed via Terraform to prevent configuration drift.
  • Ensure that any manual changes made through the UI are reflected in Terraform to avoid being overwritten during TF applications. This should be automated to track and capture any changes made outside of Terraform. Note: Hook this into our TF drift action
  • Implement checks to ensure PRs submitted with TF changes comply with the monitoring mandate, particularly with regard to including new monitors or alerts for newly added endpoints
  • Document the process of establishing and executing this initiative

Success Metrics

There are proper Terraforms applied to all monitoring configurations.

Acceptance Criteria

  • The right Terraforms are applied to all monitoring configurations, prefenving configuration drift.

Validation

Assignee to add steps to this section. List the actions that need to be taken to confirm this issue is complete. Include any necessary links or context. State the expected outcome.

@jennb33 jennb33 added platform-product-team devops practice area categorization -- NOT a team assignment needs-grooming Use this to designate any issues that need grooming from the team terraform engineering Engineering topics needs-refinement Identifies tickets that need to be refined endpoint Used to identify endpoints that will be kept in the endpoint-library board in zenhub. 2024 labels Oct 11, 2024
@jennb33 jennb33 added the alert updates Update UI and/or Content for Alerts label Oct 11, 2024
@AshleyGuerrant AshleyGuerrant removed the needs-grooming Use this to designate any issues that need grooming from the team label Nov 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2024 alert updates Update UI and/or Content for Alerts devops practice area categorization -- NOT a team assignment endpoint Used to identify endpoints that will be kept in the endpoint-library board in zenhub. engineering Engineering topics needs-refinement Identifies tickets that need to be refined platform-product-team terraform
Projects
None yet
Development

No branches or pull requests

2 participants