-
-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(uptime): Make consumer able to run in parallel #81409
Conversation
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## master #81409 +/- ##
==========================================
+ Coverage 80.36% 80.42% +0.05%
==========================================
Files 7273 7284 +11
Lines 321302 321490 +188
Branches 20948 20948
==========================================
+ Hits 258227 258550 +323
+ Misses 62673 62538 -135
Partials 402 402 |
This adds thread pool parallelization to the uptime consumer. We should potentially consider process parallelizatoin as well, so that we can take advantage of all cores. This is a rough draft of this and isn't tested - based off the issue occurrence consumer and monitor consumer.
1fe893b
to
7e170d1
Compare
return self.create_serial_worker(commit) | ||
|
||
def create_serial_worker(self, commit: Commit) -> ProcessingStrategy[KafkaPayload]: | ||
return RunTask( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you're already using RunTask. wouldn't it be easier to conditionally swap it for RunTaskInThreads instead of maintaining your own threadpool?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have special logic here where we want to process a batch of results before processing the next batch. This allows us to make sure that related data is processed serially.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because we need to partition the batches in such a way that uptime results for the same monitor are processed in order.
Using RunTaskInThreads
doesn't offer any kind of way to specify ordering
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're actually doing this in 3 different places now (crons, uptime, issue occrences). I really would love to turn this into some kind of reusable strategy
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I keep forgetting about this, feels like I asked this before 😅
I think the part where you need to wait for a set of messages before processing the next one can be generalized independently of the inner strategy even.
This adds thread pool parallelization to the uptime consumer. We should potentially consider process parallelisation as well, so that we can take advantage of all cores. Alternatively, it could be worth experimenting with disabling the GIL so we can avoid the complexity of figuring out processes.
This adds thread pool parallelization to the uptime consumer. We should potentially consider process parallelisation as well, so that we can take advantage of all cores. Alternatively, it could be worth experimenting with disabling the GIL so we can avoid the complexity of figuring out processes.