Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Monitor celery tasks in cloudwatch (PP-1150) #1813

Merged
merged 9 commits into from
Apr 30, 2024

Conversation

jonathangreen
Copy link
Member

Description

Add new custom metrics for Celery tasks and queues in Cloudwatch:

Tasks:

  • TaskFailed
  • TaskSucceeded
  • TaskRuntime

Queues:

  • QueueWaiting

Motivation and Context

Allow us to monitor our Celery queues via Cloudwatch.

How Has This Been Tested?

  • Tested locally
  • Tested in CI

This one is tough to test fully locally, so it may need some ineration after actually being deployed to AWS and fully pushing metrics into cloudwatch.

Checklist

  • I have updated the documentation accordingly.
  • All new and existing tests passed.

Copy link

codecov bot commented Apr 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 90.04%. Comparing base (d6c4019) to head (75e0a6b).

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1813      +/-   ##
==========================================
+ Coverage   90.01%   90.04%   +0.02%     
==========================================
  Files         299      300       +1     
  Lines       39643    39742      +99     
  Branches     8596     8615      +19     
==========================================
+ Hits        35686    35784      +98     
- Misses       2626     2627       +1     
  Partials     1331     1331              
Flag Coverage Δ
manager 89.84% <100.00%> (+0.02%) ⬆️
migration 24.56% <15.74%> (-0.04%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@jonathangreen jonathangreen requested a review from a team April 30, 2024 17:39
@jonathangreen jonathangreen force-pushed the feature/celery-cloudwatch-monitoring branch from bdec0af to 679401c Compare April 30, 2024 18:06
Copy link
Contributor

@dbernstein dbernstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@dbernstein
Copy link
Contributor

Assuming you fix the mypy issues first.

@dbernstein
Copy link
Contributor

dbernstein commented Apr 30, 2024

@jonathangreen : would it be helpful also to aggregate stats based on the type of task? It wasn't completely clear to me if the task type is being captured. If it is not being captured, it would be helpful to get a sense of the task run time based on the type of task. That way we would be able to monitor performance in a more granular way if necessary.

For example, if a change was introduced to a query within a job that started to cause a bottleneck we would be in a better position to troubleshoot it.

@jonathangreen
Copy link
Member Author

@jonathangreen : would it be helpful also to aggregate stats based on the type of task? It wasn't completely clear to me if the task type is being captured. If it is not being captured, it would be helpful to get a sense of the task run time based on the type of task. That way we would be able to monitor performance in a more granular way if necessary.

@dbernstein The stats we push have a dimension for the task name. So per task name we get the number of tasks that ran, failed and the task runtime. I think this is what you are asking for. If not, what is it that you would use to group tasks into "type"?

@jonathangreen
Copy link
Member Author

I'm going to merge this one since the tests are passing, so I can test it out on Minotaur tomorrow. @dbernstein if there are additional stats you want captured or dimensions added to the stats lets discuss on PP-1150. Can either add them as a follow up on that ticket.

@jonathangreen jonathangreen merged commit 27c020a into main Apr 30, 2024
21 checks passed
@jonathangreen jonathangreen deleted the feature/celery-cloudwatch-monitoring branch April 30, 2024 20:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants