-
-
Notifications
You must be signed in to change notification settings - Fork 130
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(http-ratelimiting): add a bucket for global rate limits #2159
base: main
Are you sure you want to change the base?
Conversation
Let's ignore the failing http tests for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great overall, thank you! I've got concerns about the current implementation of the global bucket loop being a bottleneck, and I think fixing that concern can still cause ratelimit headers to be reached. I've added some scenarios where I think this implementation can cause global ratelimits to be encountered, and I've proposed a solution to prevents that possibility.
|
||
use self::bucket::{Bucket, BucketQueueTask}; | ||
pub use self::global_bucket::GlobalBucket; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be public? I don't see it being able to be returned or used by this ratelimiter implementation. If this is something we expect people to use for building other ratelimiters then we should move this outside of the in_memory
module to the root.
let global_ticket_tx = if let Ok(ticket_tx) = self.wait_for_global().await { | ||
ticket_tx | ||
} else { | ||
continue; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like this shouldn't happen. The global bucket lives at least as long as this task since each queue has a strong reference to it.
But, let's say a panic happens in the task driving the global bucket, such as an arithmetic panic (eg divide by 0) or a panic in some underlying dependency, such as an undocumented panic when calling tokio::time:sleep
. When that happens, wait_for_bucket
will only ever return an error since there's no receiving end to send a TicketSender
back. When this happens, we continue
the loop, and the cycle starts again. All requests to the ratelimiter from outside parties will start to just receive a mysterious error that the ratelimiter dropped the request, since they're never getting the sending channel back (by calling queue_tx.available()
here) and the sending half of the channel gets dropped.
This is kind of not great, so we have two options: explicitly drop the queue and "stop" the ratelimiter and return an appropriate error to fail all requests, or allow all requests to pass through without a global ratelimiter. The second is more shocking, so we should probably default to that, but make the behavior customizable as a method on the ratelimiter.
async fn run_global_queue_task(bucket: Arc<InnerGlobalBucket>, period: u64, requests: u32) { | ||
let mut time = Instant::now(); | ||
|
||
while let Some(queue_tx) = bucket.queue.pop().await { | ||
wait_if_needed(bucket.as_ref(), &mut time, period, requests).await; | ||
|
||
let ticket_headers = if let Some(ticket_headers) = queue_tx.available() { | ||
ticket_headers | ||
} else { | ||
continue; | ||
}; | ||
|
||
if let Ok(Some(RatelimitHeaders::Global(headers))) = ticket_headers.await { | ||
tracing::debug!(seconds = headers.retry_after(), "globally ratelimited"); | ||
|
||
bucket.is_locked.store(true, Ordering::Release); | ||
tokio::time::sleep(Duration::from_secs(headers.retry_after())).await; | ||
bucket.is_locked.store(false, Ordering::Release); | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's take a look at the flow of the logic here:
- global queue task waits for an item in the queue, which happens on every ratelimiter request
- the global ratelimiter waits if needed
- the global queue notifies the other half they can send the request
a. if the other half is still around, the global queue receives a receiving half to receive headers from
b. otherwise, the global queue can know the other half doesn't exist and loops over to the next request - the global queue waits for the headers from the other half; if the headers are of a global ratelimit then it sleeps, causing the entire system to stall, as intended
- it loops over
In terms of preventing more than a configured number of requests in a given period, this system of preemptive sleeping works great. The problem here is that this is synchronous: only one request will be processed at a time. Practically speaking, this means the global bucket won't limit to, say, 50 requests/1 second, but rather to a much smaller number of requests because the global bucket is a bottleneck between each request. Requests to ping Discord take an average latency of 102ms right now according to the status page, so functionally the global bucket will bottleneck users to < 10 requests/second, if not much fewer depending on the routes in use.
We need to change the implementation here to solve this. First instinct is that hitting a global ratelimit header with this system is unlikely with the preemptive ratelimiter in place, so what we could do instead is spawn a task for each set of headers while waiting. With a mutex in place instead of the is_locked
boolean, we could lock up the ratelimiter for the duration. That works.
But this has its own flaw: if a user makes, say, 100 requests to the ratelimiter with a global bucket of 50 requests/1 second, then across just over 1 second they'll receive all 100 request allotments. If requests don't all take the same time (very likely), or they get stalled, or something else, then users could fire off more than 50 requests in a given second and still hit the global ratelimit. It's also impossible to determine when Discord will count against a request against a given second. We can see how this could fail under this circumstance:
- User requests 100 ratelimiter tickets for requests, current configuration is 50 requests/second
- 50 tickets are granted in the first second
- 40 are processed by Discord in that second
- 50 tickets are granted in the second second
- those 50, plus the 10 from the previous second, are processed by Discord
- that was 60 in a second -- oops, global ratelimit hit
The solution I see is to simultaneously limit the number of requests allotted in a given second and limit the number of in-flight requests to the configured number of requests. We can see why this prevents the previous scenario:
- User requests 100 ratelimiter tickets for requests, current configuration is 50 requests/second
- First second
- 50 tickets are granted in the first second and their requests are sent to Discord
- 40 are processed by Discord in that second
- 30 requests get responses and are processed by the global queue
- Second.. second
- Because 20 requests are still in flight and their global ratelimit status is indeterminate, 30 tickets are granted
- the responses of the 10 requests that were processed by Discord but didn't receive responses are received
- the 10 requests that weren't processed by Discord last second are processed and return responses
- 20 of the requests from this second are received
- 10 are not, so there are 10 requests still in flight
- Third second
- because 10 requests are in flight, there are 40 tickets allotted for the second
- the cycle continues
- it's mathematically and logically impossible for a global ratelimit
To accomplish this, my suggestion is we use a combination of tracking the time (like now), spawning tasks to process responses, and a Tokio semaphore with the configured number of requests as allotments. When one second passes over into another, the first processed request of that second can add allotments for the number of responses that were received last second. If you want tips on how to implement this let me know and I can provide them, it's a bit complicated.
Let me know if I'm thinking about this wrong, but this implementation of using a dual timer and semaphore seems like the very correct way of doing this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current time tracker is inefficient, wait_if_needed
will ~always sleep and yield one request/(period/requests) and therefore evenly space them out. We could either use a leaky bucket algorithm (we can take a dependency upon a crate implementing it) or mirror the implementation of twilight_gateway::CommandRatelmitier
. The ratelimiter in the gateway was quite tricky, so I'd recommend adding lots of tests for this module.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that Discord resets every second opposed to evenly allowing more? In that case this way would allow requests to complete earlier.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure that Discord resets every second opposed to evenly allowing more? In that case this way would allow requests to complete earlier.
I misremembered, thanks for the catch. I don't believe the docs explicitly state this, but it is a GCRA implementation, so it's actually a leaky bucket that will replenish over time.
ticket_tx | ||
} else { | ||
continue; | ||
}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let-else once we upgrade to rust 1.65 🙏
let mut rx = self.rx.lock().await; | ||
|
||
timeout(timeout_duration, rx.recv()).await.ok().flatten() | ||
rx.recv().await |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am somewhat certain that this will never return None
(as atleast one sender is tied to each BucketQueue
), which is otherwise necessary for the bucket to be finished and freed (see BucketQueueTask::run
).
cd8ad2c
to
0ec96cd
Compare
This closes #650.
I opted to keep the rate limit trait as is for now, even though
is_globally_locked
is unused.Additionally, the global rate limiter waits for each request to return before accepting the next request. Since 50/s isn't that many, I wanted to keep it simple. If you disagree, then I would change that.
Lastly, Discord only documents the hardcoded 50 requests per second. Nirn-proxy however seemed to have figured out that big bots have different global rate limits.
We discussed this on Discord and decided not to support this due to it not being officially documented.EDIT: It was requested to add the option to instead specify custom ratelimit values manually.