MVP - Endpoint Monitoring and Updates: Create dashboard monitors #94816
Labels
2024
alert updates
Update UI and/or Content for Alerts
backend
endpoint
Used to identify endpoints that will be kept in the endpoint-library board in zenhub.
engineering
Engineering topics
monitoring
needs-refinement
Identifies tickets that need to be refined
platform-product-team
User Story
As the managers and engineers responsible for the VA Platform,
We need to create endpoint dashboard monitors,
So that the Endpoint monitoring can be established for better adherence to OCTO's effort for better notifications in the event that a veteran is blocked.
Issue Description
The objective is to ensure continuous awareness of the performance of http://va.gov/ systems and to be promptly notified of any system behavior that negatively impacts Veterans. It is important to ensure there is an awareness around show traffic levels, error rates, system latency, etc. To effectively address issues in a timely manner, teams should implement dashboards, monitors, and alerts for their systems. High priority applications may need to be integrated into the Watchtower program, which provides oversight for critical systems. Teams can mark specific monitors for Watch Officer review by using the watchtower tag, and all monitors included in Watchtower must have clear, actionable instructions for the Watch Officer to follow. This ensures that the necessary information is readily available to address any problems quickly and efficiently.
See the discovery document that was stood up for this effort.
The vision for this work would be to leverage discovery work that was done for the Zero Silent Failures effort. There should be lists of the endpoints and APIs in these tickets (92876, 92926) Create the appropriate dashboard monitors for the endpoints that are validated to be under the ownership of the Platform Product team.
Tasks
Success Metrics
There will be an updated list of endpoints under the purview of the Platform Product team, and there will be monitors for each of these endpoints so that the team has a better handle on our own inventory and the endpoints feed into the Zero Silent Failures objective of OCTO.
Acceptance Criteria
Validation
Assignee to add steps to this section. List the actions that need to be taken to confirm this issue is complete. Include any necessary links or context. State the expected outcome.
The text was updated successfully, but these errors were encountered: