feat(uptime): Make consumer able to run in parallel #81409

wedamija · 2024-11-27T22:22:24Z

This adds thread pool parallelization to the uptime consumer. We should potentially consider process parallelisation as well, so that we can take advantage of all cores. Alternatively, it could be worth experimenting with disabling the GIL so we can avoid the complexity of figuring out processes.

codecov · 2024-11-27T22:49:07Z

Codecov Report

Attention: Patch coverage is 91.66667% with 6 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
.../remote_subscriptions/consumers/result_consumer.py	90.90%	6 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master   #81409      +/-   ##
==========================================
+ Coverage   80.36%   80.42%   +0.05%     
==========================================
  Files        7273     7284      +11     
  Lines      321302   321490     +188     
  Branches    20948    20948              
==========================================
+ Hits       258227   258550     +323     
+ Misses      62673    62538     -135     
  Partials      402      402

This adds thread pool parallelization to the uptime consumer. We should potentially consider process parallelizatoin as well, so that we can take advantage of all cores. This is a rough draft of this and isn't tested - based off the issue occurrence consumer and monitor consumer.

src/sentry/remote_subscriptions/consumers/result_consumer.py

untitaker · 2024-12-18T21:45:34Z

src/sentry/remote_subscriptions/consumers/result_consumer.py

+            return self.create_serial_worker(commit)
+
+    def create_serial_worker(self, commit: Commit) -> ProcessingStrategy[KafkaPayload]:
+        return RunTask(


you're already using RunTask. wouldn't it be easier to conditionally swap it for RunTaskInThreads instead of maintaining your own threadpool?

We have special logic here where we want to process a batch of results before processing the next batch. This allows us to make sure that related data is processed serially.

This is because we need to partition the batches in such a way that uptime results for the same monitor are processed in order.

Using RunTaskInThreads doesn't offer any kind of way to specify ordering

We're actually doing this in 3 different places now (crons, uptime, issue occrences). I really would love to turn this into some kind of reusable strategy

Got it. I keep forgetting about this, feels like I asked this before 😅

I think the part where you need to wait for a set of messages before processing the next one can be generalized independently of the inner strategy even.

This adds thread pool parallelization to the uptime consumer. We should potentially consider process parallelisation as well, so that we can take advantage of all cores. Alternatively, it could be worth experimenting with disabling the GIL so we can avoid the complexity of figuring out processes.

github-actions bot added the Scope: Backend Automatically applied to PRs that change backend components label Nov 27, 2024

vercel bot deployed to Preview November 27, 2024 22:25 View deployment

wedamija added 3 commits December 14, 2024 09:36

refactors

ed17b89

more refactor

7e170d1

wedamija force-pushed the danf/uptime-parallel-consumer branch from 1fe893b to 7e170d1 Compare December 13, 2024 22:36

vercel bot deployed to Preview December 13, 2024 22:41 View deployment

add tests

b9d2073

vercel bot deployed to Preview December 14, 2024 00:49 View deployment

wedamija marked this pull request as ready for review December 14, 2024 00:49

wedamija requested a review from a team as a code owner December 14, 2024 00:49

evanpurkhiser reviewed Dec 18, 2024

View reviewed changes

src/sentry/remote_subscriptions/consumers/result_consumer.py Show resolved Hide resolved

evanpurkhiser reviewed Dec 18, 2024

View reviewed changes

src/sentry/remote_subscriptions/consumers/result_consumer.py Outdated Show resolved Hide resolved

evanpurkhiser reviewed Dec 18, 2024

View reviewed changes

src/sentry/remote_subscriptions/consumers/result_consumer.py Outdated Show resolved Hide resolved

evanpurkhiser reviewed Dec 18, 2024

View reviewed changes

src/sentry/remote_subscriptions/consumers/result_consumer.py Outdated Show resolved Hide resolved

small tidy-ups

323fc20

wedamija requested review from a team as code owners December 18, 2024 21:36

vercel bot deployed to Preview December 18, 2024 21:40 View deployment

untitaker reviewed Dec 18, 2024

View reviewed changes

evanpurkhiser approved these changes Dec 18, 2024

View reviewed changes

wedamija merged commit ef03ea7 into master Dec 18, 2024
49 checks passed

wedamija deleted the danf/uptime-parallel-consumer branch December 18, 2024 23:24

github-actions bot locked and limited conversation to collaborators Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(uptime): Make consumer able to run in parallel #81409

feat(uptime): Make consumer able to run in parallel #81409

wedamija commented Nov 27, 2024 •

edited

Loading

codecov bot commented Nov 27, 2024 •

edited

Loading

untitaker Dec 18, 2024

wedamija Dec 18, 2024

evanpurkhiser Dec 18, 2024

evanpurkhiser Dec 18, 2024

untitaker Dec 18, 2024

feat(uptime): Make consumer able to run in parallel #81409

feat(uptime): Make consumer able to run in parallel #81409

Conversation

wedamija commented Nov 27, 2024 • edited Loading

codecov bot commented Nov 27, 2024 • edited Loading

Codecov Report

untitaker Dec 18, 2024

Choose a reason for hiding this comment

wedamija Dec 18, 2024

Choose a reason for hiding this comment

evanpurkhiser Dec 18, 2024

Choose a reason for hiding this comment

evanpurkhiser Dec 18, 2024

Choose a reason for hiding this comment

untitaker Dec 18, 2024

Choose a reason for hiding this comment

wedamija commented Nov 27, 2024 •

edited

Loading

codecov bot commented Nov 27, 2024 •

edited

Loading