You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Is your feature request related to a problem? Please describe.
In SaaS environment we use rate-limiter mechanism, which can cause serious problems for us. Connectors try to get token (to be able to poll from Operate), but this can lead to 429 Too Many Requests because of the rate-limiter. Once this happens we can get into an infinite loop where all the connector runtime tries to fetch the token and we keep getting 429 responses. The reason why it can occur is that rate-limiting happens globally per regions and nor per cluster. See: https://github.com/camunda-cloud/team-sre/issues/545 We have observed this on DEV but this issue can occur on any environment.
Describe the solution you'd like
Add backoff strategy strategy for failed requests: increase the interval of getting tokens after each failed request, to prevent bombarding the /oauth/token endpoint.
Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Without this solution we can easily get into an infinite loop trying to get new tokens and always hitting the rate limit in SaaS.
The text was updated successfully, but these errors were encountered:
We also observed this multiple times even without using connectors. We then had to scale down all our job worker deployments in all our clusters to mitigate it, which resulted in prod downtimes.
Is your feature request related to a problem? Please describe.
In SaaS environment we use rate-limiter mechanism, which can cause serious problems for us. Connectors try to get token (to be able to poll from Operate), but this can lead to 429 Too Many Requests because of the rate-limiter. Once this happens we can get into an infinite loop where all the connector runtime tries to fetch the token and we keep getting 429 responses. The reason why it can occur is that rate-limiting happens globally per regions and nor per cluster. See: https://github.com/camunda-cloud/team-sre/issues/545 We have observed this on DEV but this issue can occur on any environment.
Describe the solution you'd like
Add backoff strategy strategy for failed requests: increase the interval of getting tokens after each failed request, to prevent bombarding the
/oauth/token
endpoint.Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.
Additional context
Without this solution we can easily get into an infinite loop trying to get new tokens and always hitting the rate limit in SaaS.
The text was updated successfully, but these errors were encountered: