Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry api calls on 'service unavailable' #266

Open
wants to merge 1 commit into
base: stable/rocky-m3
Choose a base branch
from

Conversation

fwiesel
Copy link
Member

@fwiesel fwiesel commented Oct 29, 2021

The keystoneauth1 adapter used as the basis for cinder, glance,
and neutron api calls already support to retry on a 503 status call,
if the corresponding parameter is passed.
Currently, only connection failures are retried, but if the service
is behind a load-balancer, that is rather unlikely and instead a
Service Unavailable error would be raised.

Change-Id: I82cf1d6eecad1262841c49e10d30c1ec5ba26f80

The keystoneauth1 adapter used as the basis for cinder, glance,
and neutron api calls already support to retry on a 503 status call,
if the corresponding parameter is passed.
Currently, only connection failures are retried, but if the service
is behind a load-balancer, that is rather unlikely and instead a
Service Unavailable error would be raised.

Change-Id: I82cf1d6eecad1262841c49e10d30c1ec5ba26f80
Copy link

@joker-at-work joker-at-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Our admin/internal APIs don't go via loadbalancer, so it's more likely that connection errors occur.

Why did you opt for taking the same setting for different retries? We wouldn't be able to disable one, if it becomes problematic.

By default, this only retries 503, which is probably fine for requests changing anything in Cinder, but for GET-requests, we might also want to retry on 500 - e.g. the DB restarting and thus requests failing with "internal server error". What do you think?

@fwiesel
Copy link
Member Author

fwiesel commented Oct 29, 2021

Our admin/internal APIs don't go via loadbalancer, so it's more likely that connection errors occur.

The change is general and not specific to our situation. But agreed, then it doesn't help us much.

Why did you opt for taking the same setting for different retries?

The option is called http_retries and the says Number of times xyzclient should retry on any failed http call..
As it stands, the code doesn't do that, it only retries on failed connections.
Separate values make more sense to me if we need the granularity to actually tune them do different values.

We wouldn't be able to disable one, if it becomes problematic.

We still can disable retries, just not separately. I would say that is good enough for fixing something which requires a fix.

By default, this only retries 503, which is probably fine for requests changing anything in Cinder, but for GET-requests, we might also want to retry on 500 - e.g. the DB restarting and thus requests failing with "internal server error". What do you think?

The retries are enabled for all requests including POST,PUT,etc. I rather would not like to retry that on the lowest level except for 503, as the APIs are not guaranteed to be idempotent.
I think, a 500 error is a "real" server error, which should be fixed on the application/db side.

E.g. fix the retry logic in the application-db api (oslo.db) to handle the restart better (more likely in a short-time frame) or fixing it on the db side that the API is zero-downtime (quite a bit of effort).

Copy link

@joker-at-work joker-at-work left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good. Thank you for the explanations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants