-
Notifications
You must be signed in to change notification settings - Fork 224
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ratelimit set_presence updates #18000
base: develop
Are you sure you want to change the base?
Conversation
9ee8357
to
9a48930
Compare
9a48930
to
aed1b1e
Compare
I tested this PR on our production server (using v1.120.2) with 340 active users today. Here follows a test report that appears to fix #16843 for us. It shows the changes in CPU usage patterns when enabling or disabling the patch during our peak usage times. Test timeline2024-12-05T11:20:27+0100 systemctl restart nginx.service
2024-12-05T12:15:44+0100 systemctl restart netdata.service
2024-12-05T12:19:04+0100 systemctl restart synapse-phys.target # 1 restart v1.120.2
2024-12-05T12:22:01+0100 mv env env_orig; mv env_patched env
2024-12-05T12:26:56+0100 vim /etc/opt/synapse/phys.ethz.ch.homeserver.yaml
2024-12-05T12:30:11+0100 mv env env_patched; mv env_orig env
2024-12-05T12:30:18+0100 systemctl restart synapse-phys.target # 2 restart v1.120.2 patched
2024-12-05T12:30:51+0100 vim /etc/opt/synapse/phys.ethz.ch.homeserver.yaml # set per_second: 0.02
2024-12-05T12:45:03+0100 systemctl restart synapse-phys.target # 3 restart v1.120.2
2024-12-05T12:55:56+0100 systemctl restart synapse-phys.target # 4 restart v1.120.2
2024-12-05T12:57:11+0100 mv env env_orig; mv env_patched env
2024-12-05T12:57:17+0100 systemctl restart synapse-phys.target # 5 restart v1.120.2 patched (0.02)
2024-12-05T13:05:00+0100 cd /etc/opt/synapse/
2024-12-05T13:05:06+0100 vim phys.ethz.ch.homeserver.yaml # set per_second: 0.002
2024-12-05T13:05:34+0100 systemctl restart synapse-phys.target # 6 restart v1.120.2 patched (0.002)
2024-12-05T13:52:17+0100 mv env env_patched; mv env_orig env
2024-12-05T13:52:20+0100 systemctl restart synapse-phys.target # 7 restart v1.120.2
2024-12-05T13:58:36+0100 vim /etc/opt/synapse/phys.ethz.ch.homeserver.yaml # set per_second: 0.01?
2024-12-05T13:59:22+0100 mv env env_orig; mv env_patched env
2024-12-05T13:59:25+0100 systemctl restart synapse-phys.target # 8 restart v1.120.2 patched (0.01?)
2024-12-05T14:05:43+0100 vim /etc/opt/synapse/phys.ethz.ch.homeserver.yaml # set ?
2024-12-05T14:17:19+0100 systemctl restart synapse-phys.target # 9 restart v1.120.2 patched (?)
2024-12-05T14:18:32+0100 vim env/lib/python3.11/site-packages/synapse/rest/client/sync.py # logger.info -> logger.debug (disabling the "User set_presence ratelimit exceeded; ignoring it." log spam, synapse log level was always set to info during these tests)
2024-12-05T14:19:09+0100 vim /etc/opt/synapse/phys.ethz.ch.homeserver.yaml # set per_second: 0.02?
2024-12-05T14:20:27+0100 systemctl restart synapse-phys.target # 10 restart v1.120.2 patched (0.02?)
2024-12-05T15:12:21+0100 mv env env_patched; mv env_orig env
2024-12-05T15:12:27+0100 systemctl restart synapse-phys.target # 11 restart v1.120.2
2024-12-05T15:13:41+0100 vim /etc/opt/synapse/phys.ethz.ch.homeserver.yaml # unset (use defaults: per_second: 0.1)
2024-12-05T15:18:38+0100 mv env env_orig; mv env_patched env
2024-12-05T15:18:41+0100 systemctl restart synapse-phys.target # 12 restart v1.120.2 patched (0.1) Numbered screenshots of CPU usage patterns1 |
After 2.5 work days the issue (#16843) did not reappear: The day with the red line was the day of testing #18000 (comment). After the red line the PR is live with v1.120.2 using defaults ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks good to me! This is only thing missing are unit tests (though thank you for the proof-in-production graphs!).
At minimum, could you write two tests:
- Send a presence update, check that it went through, immediately send another and check that it was ignored.
- Send a presence update, check that it went through, advancing time a sufficient amount, send another presence update and check that it also worked.
Presence-related tests go in PresenceUpdateTestCase
. Here is an example of advancing time in a test. Time won't advance otherwise.
I'd recommend switching the presence state (offline
-> online
, etc.) so you can check that a change occurred.
Let me know or shout in #synapse-dev:matrix.org if you need further guidance!
The `rc_set_presence.per_user` sets ratelimiting how often a specific users' presence | ||
updates are evaluated. Ratelimited presence updates are ignored. | ||
`per_user` defaults to `per_second: 0.1`, `burst_count: 1`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Slight clarification:
The `rc_set_presence.per_user` sets ratelimiting how often a specific users' presence | |
updates are evaluated. Ratelimited presence updates are ignored. | |
`per_user` defaults to `per_second: 0.1`, `burst_count: 1`. | |
The `rc_set_presence.per_user` option sets rate limits on how often a specific | |
users' presence updates are evaluated. Ratelimited presence updates are | |
ignored, and no error is returned to the client. | |
`per_user` defaults to `per_second: 0.1`, `burst_count: 1`. |
per_second: 0.1 | ||
burst_count: 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Typically the example configuration should be something other than the default.
per_second: 0.1 | |
burst_count: 1 | |
per_second: 0.05 | |
burst_count: 0.5 |
This adds rate-limiting applied for presence updates per user.
This is my first contribution to synapse. The default settings probably need guidance from developers and should be tested on other server configurations. I will post a test report in the comments.
Pull Request Checklist
EventStore
toEventWorkerStore
.".code blocks
.(run the linters)