Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Download directly from S3 for faster query times #65

Open
jankaifer opened this issue Oct 15, 2024 · 0 comments
Open

Download directly from S3 for faster query times #65

jankaifer opened this issue Oct 15, 2024 · 0 comments

Comments

@jankaifer
Copy link

jankaifer commented Oct 15, 2024

Opportunity

Whenever you query with Athena it will save the result into some S3 bucket. You can simply download this .csv file like any normal file from S3.

Another observation is that Athena API can return at most 1000 rows on one single page. This has a significant performance impact if you try to download 100k + rows. There need to be 100+ requests, even if it's just a few MB.

In our case, we are querying Athena from a different region (and different continent) so just the latency alone on those 100+ requests is multiple seconds.

Downloading from S3 is a single request, which is faster. There are almost no downsides.

Result

After I implemented fetching directly from Athena we observed a significant speed-up in our query times. For queries that ~100k rows, it went from 38 seconds to just 18 seconds which is more than a 2x improvement. This is even more significant for queries that return more rows (in some places it was even 4x speed-up).

Request

It would be nice if some form of S3 fetching would be implemented upstream. I have opened PR with my implementation, it's not in a mergeable state right now. I will not have time to clean it up and create a proper PR, but I wanted to share my code anyway in case it helps someone or someone finds the time to properly integrate that functionality into athenadriver API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant