Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Blocking behavior in AWSV4SignerAsyncAuth #410

Open
dacevedo12 opened this issue Jun 14, 2023 · 8 comments
Open

[BUG] Blocking behavior in AWSV4SignerAsyncAuth #410

dacevedo12 opened this issue Jun 14, 2023 · 8 comments
Labels
bug Something isn't working

Comments

@dacevedo12
Copy link

dacevedo12 commented Jun 14, 2023

What is the bug?

Yesterday's AWS us-east-1 outage revealed that botocore talks to STS using requests, and retries them using a backoff with time.sleep, which are both blocking functions and will cause performance degradation as they hold the event-loop hostage.

In this line: sig_v4_auth.add_auth(aws_request) https://github.com/opensearch-project/opensearch-py/blob/main/opensearchpy/helpers/asyncsigner.py#L54

How can one reproduce the bug?

Try calling a search operation using AWSV4SignerAsyncAuth while simulating poor network conditions to trigger the retry with backoff mechanism in botocore

What is the expected behavior?

It should not block the event-loop

What is your host/environment?

Not relevant

Do you have any screenshots?

image

Do you have any additional context?

Three possible solution proposals:

@dacevedo12 dacevedo12 added bug Something isn't working untriaged Need triage labels Jun 14, 2023
@wbeckler wbeckler removed the untriaged Need triage label Jun 15, 2023
@dblock
Copy link
Member

dblock commented Jun 20, 2023

Ouch. Regardless of the fix we do, do you think you can write a test that reproduces this kind of behavior?

I would go with aiobotocore to delegate this kind of problems, I don't see downsides.

@dacevedo12
Copy link
Author

Ouch. Regardless of the fix we do, do you think you can write a test that reproduces this kind of behavior?

I would go with aiobotocore to delegate this kind of problems, I don't see downsides.

Sure, I can take a look this weekend

@wbeckler wbeckler added the good first issue Good for newcomers label Sep 19, 2023
@dblock dblock removed the good first issue Good for newcomers label Oct 12, 2023
@dblock
Copy link
Member

dblock commented Nov 10, 2023

@dacevedo12 Any interest of picking this up again?

@dblock
Copy link
Member

dblock commented Nov 19, 2023

In 2.4.0 we've added support for Urllib3 Sigv4 (it's now the default). Does that implementation have the same problem @dacevedo12?

@dacevedo12
Copy link
Author

One could say this is mitigated by #470 as it defaults to using frozen credentials, which won't be refreshed by boto under the hood.

However, if it goes through the other branch with credentials that (for some unknown reason) don't have a get_frozen_credentials method, it will refresh them when needed, and that process is potentially blocking.

Refreshing credentials is usually a very quick network call to AWS STS so in most cases it's not concerning, but in scenarios like the one AWS experienced when I originally reported this issue, there will be retries with exponential backoff (using time.sleep) and that for sure will be blocking.

@thehesiod
Copy link

hello, maintainer of aiobotocore here, yes this should support the async version of AioSession.get_credentials. So in this case swapping to .get_credentials and checking to see if the method is async or not

@dblock
Copy link
Member

dblock commented Oct 16, 2024

Thanks for jumping in @thehesiod! Appreciate if someone on this thread could PR something here.

@thehesiod
Copy link

made a draft here: #834

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants