Retry api calls on 'service unavailable' #266

fwiesel · 2021-10-29T10:11:29Z

The keystoneauth1 adapter used as the basis for cinder, glance,
and neutron api calls already support to retry on a 503 status call,
if the corresponding parameter is passed.
Currently, only connection failures are retried, but if the service
is behind a load-balancer, that is rather unlikely and instead a
Service Unavailable error would be raised.

Change-Id: I82cf1d6eecad1262841c49e10d30c1ec5ba26f80

The keystoneauth1 adapter used as the basis for cinder, glance, and neutron api calls already support to retry on a 503 status call, if the corresponding parameter is passed. Currently, only connection failures are retried, but if the service is behind a load-balancer, that is rather unlikely and instead a Service Unavailable error would be raised. Change-Id: I82cf1d6eecad1262841c49e10d30c1ec5ba26f80

joker-at-work

Our admin/internal APIs don't go via loadbalancer, so it's more likely that connection errors occur.

Why did you opt for taking the same setting for different retries? We wouldn't be able to disable one, if it becomes problematic.

By default, this only retries 503, which is probably fine for requests changing anything in Cinder, but for GET-requests, we might also want to retry on 500 - e.g. the DB restarting and thus requests failing with "internal server error". What do you think?

fwiesel · 2021-10-29T12:39:16Z

Our admin/internal APIs don't go via loadbalancer, so it's more likely that connection errors occur.

The change is general and not specific to our situation. But agreed, then it doesn't help us much.

Why did you opt for taking the same setting for different retries?

The option is called http_retries and the says Number of times xyzclient should retry on any failed http call..
As it stands, the code doesn't do that, it only retries on failed connections.
Separate values make more sense to me if we need the granularity to actually tune them do different values.

We wouldn't be able to disable one, if it becomes problematic.

We still can disable retries, just not separately. I would say that is good enough for fixing something which requires a fix.

By default, this only retries 503, which is probably fine for requests changing anything in Cinder, but for GET-requests, we might also want to retry on 500 - e.g. the DB restarting and thus requests failing with "internal server error". What do you think?

The retries are enabled for all requests including POST,PUT,etc. I rather would not like to retry that on the lowest level except for 503, as the APIs are not guaranteed to be idempotent.
I think, a 500 error is a "real" server error, which should be fixed on the application/db side.

E.g. fix the retry logic in the application-db api (oslo.db) to handle the restart better (more likely in a short-time frame) or fixing it on the db side that the API is zero-downtime (quite a bit of effort).

joker-at-work

Sounds good. Thank you for the explanations.

fwiesel requested a review from joker-at-work October 29, 2021 10:11

joker-at-work reviewed Oct 29, 2021

View reviewed changes

joker-at-work approved these changes Oct 29, 2021

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Retry api calls on 'service unavailable' #266

Retry api calls on 'service unavailable' #266

fwiesel commented Oct 29, 2021

joker-at-work left a comment •

edited

Loading

fwiesel commented Oct 29, 2021

joker-at-work left a comment

Retry api calls on 'service unavailable' #266

Are you sure you want to change the base?

Retry api calls on 'service unavailable' #266

Conversation

fwiesel commented Oct 29, 2021

joker-at-work left a comment • edited Loading

Choose a reason for hiding this comment

fwiesel commented Oct 29, 2021

joker-at-work left a comment

Choose a reason for hiding this comment

joker-at-work left a comment •

edited

Loading