Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(uptime): Make consumer able to run in parallel #81409

Merged
merged 5 commits into from
Dec 18, 2024

Conversation

wedamija
Copy link
Member

@wedamija wedamija commented Nov 27, 2024

This adds thread pool parallelization to the uptime consumer. We should potentially consider process parallelisation as well, so that we can take advantage of all cores. Alternatively, it could be worth experimenting with disabling the GIL so we can avoid the complexity of figuring out processes.

@github-actions github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Nov 27, 2024
Copy link

codecov bot commented Nov 27, 2024

Codecov Report

Attention: Patch coverage is 91.66667% with 6 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
.../remote_subscriptions/consumers/result_consumer.py 90.90% 6 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #81409      +/-   ##
==========================================
+ Coverage   80.36%   80.42%   +0.05%     
==========================================
  Files        7273     7284      +11     
  Lines      321302   321490     +188     
  Branches    20948    20948              
==========================================
+ Hits       258227   258550     +323     
+ Misses      62673    62538     -135     
  Partials      402      402              

This adds thread pool parallelization to the uptime consumer. We should potentially consider process parallelizatoin as well, so that we can take advantage of all cores.

This is a rough draft of this and isn't tested - based off the issue occurrence consumer and monitor consumer.
@wedamija wedamija marked this pull request as ready for review December 14, 2024 00:49
@wedamija wedamija requested a review from a team as a code owner December 14, 2024 00:49
return self.create_serial_worker(commit)

def create_serial_worker(self, commit: Commit) -> ProcessingStrategy[KafkaPayload]:
return RunTask(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're already using RunTask. wouldn't it be easier to conditionally swap it for RunTaskInThreads instead of maintaining your own threadpool?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have special logic here where we want to process a batch of results before processing the next batch. This allows us to make sure that related data is processed serially.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because we need to partition the batches in such a way that uptime results for the same monitor are processed in order.

Using RunTaskInThreads doesn't offer any kind of way to specify ordering

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're actually doing this in 3 different places now (crons, uptime, issue occrences). I really would love to turn this into some kind of reusable strategy

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I keep forgetting about this, feels like I asked this before 😅

I think the part where you need to wait for a set of messages before processing the next one can be generalized independently of the inner strategy even.

@wedamija wedamija merged commit ef03ea7 into master Dec 18, 2024
49 checks passed
@wedamija wedamija deleted the danf/uptime-parallel-consumer branch December 18, 2024 23:24
andrewshie-sentry pushed a commit that referenced this pull request Jan 2, 2025
This adds thread pool parallelization to the uptime consumer. We should
potentially consider process parallelisation as well, so that we can
take advantage of all cores. Alternatively, it could be worth
experimenting with disabling the GIL so we can avoid the complexity of
figuring out processes.
@github-actions github-actions bot locked and limited conversation to collaborators Jan 3, 2025
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Scope: Backend Automatically applied to PRs that change backend components
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants