You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As the Managers and Developers of the VA Platform,
We need to implement a process for performance and scalability,
So that we can understand the cost constraints and opportunities when using Datadog's features and not go over budget.
Issue Description
The objective is to ensure continuous awareness of the performance of http://va.gov/ systems and to be promptly notified of any system behavior that negatively impacts Veterans. It is important to ensure there is an awareness around show traffic levels, error rates, system latency, etc. To effectively address issues in a timely manner, teams should implement dashboards, monitors, and alerts for their systems. High priority applications may need to be integrated into the Watchtower program, which provides oversight for critical systems. Teams can mark specific monitors for Watch Officer review by using the watchtower tag, and all monitors included in Watchtower must have clear, actionable instructions for the Watch Officer to follow. This ensures that the necessary information is readily available to address any problems quickly and efficiently.
The vision for this work would be to leverage discovery work that was done for the Zero Silent Failures effort. There should be lists of the endpoints and APIs in these tickets (92876, 92926)
The scope of this ticket is to establish the costs for Datadog and how the costs change depending on how dashboards, monitors and alerts are used.
Tasks
Establish Datadog cost associated with the new dashboards, monitors and alerts
Implement a process for performance and scalability
Document findings
Share with leadership and, after they have approved this, share with other teams.
Success Metrics
There will be documents providing best practice guidance regarding the use of new Datadog dashboards, monitors and alerts for performance and scalability.
Acceptance Criteria
Best practice documentation on using DataDog features without impacting the budget.
Validation
Assignee to add steps to this section. List the actions that need to be taken to confirm this issue is complete. Include any necessary links or context. State the expected outcome.
The text was updated successfully, but these errors were encountered:
User Story
As the Managers and Developers of the VA Platform,
We need to implement a process for performance and scalability,
So that we can understand the cost constraints and opportunities when using Datadog's features and not go over budget.
Issue Description
The objective is to ensure continuous awareness of the performance of http://va.gov/ systems and to be promptly notified of any system behavior that negatively impacts Veterans. It is important to ensure there is an awareness around show traffic levels, error rates, system latency, etc. To effectively address issues in a timely manner, teams should implement dashboards, monitors, and alerts for their systems. High priority applications may need to be integrated into the Watchtower program, which provides oversight for critical systems. Teams can mark specific monitors for Watch Officer review by using the watchtower tag, and all monitors included in Watchtower must have clear, actionable instructions for the Watch Officer to follow. This ensures that the necessary information is readily available to address any problems quickly and efficiently.
See the discovery document that was stood up for this effort.
The vision for this work would be to leverage discovery work that was done for the Zero Silent Failures effort. There should be lists of the endpoints and APIs in these tickets (92876, 92926)
The scope of this ticket is to establish the costs for Datadog and how the costs change depending on how dashboards, monitors and alerts are used.
Tasks
Success Metrics
There will be documents providing best practice guidance regarding the use of new Datadog dashboards, monitors and alerts for performance and scalability.
Acceptance Criteria
Validation
Assignee to add steps to this section. List the actions that need to be taken to confirm this issue is complete. Include any necessary links or context. State the expected outcome.
The text was updated successfully, but these errors were encountered: