You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Whenever you query with Athena it will save the result into some S3 bucket. You can simply download this .csv file like any normal file from S3.
Another observation is that Athena API can return at most 1000 rows on one single page. This has a significant performance impact if you try to download 100k + rows. There need to be 100+ requests, even if it's just a few MB.
In our case, we are querying Athena from a different region (and different continent) so just the latency alone on those 100+ requests is multiple seconds.
Downloading from S3 is a single request, which is faster. There are almost no downsides.
Result
After I implemented fetching directly from Athena we observed a significant speed-up in our query times. For queries that ~100k rows, it went from 38 seconds to just 18 seconds which is more than a 2x improvement. This is even more significant for queries that return more rows (in some places it was even 4x speed-up).
Request
It would be nice if some form of S3 fetching would be implemented upstream. I have opened PR with my implementation, it's not in a mergeable state right now. I will not have time to clean it up and create a proper PR, but I wanted to share my code anyway in case it helps someone or someone finds the time to properly integrate that functionality into athenadriver API.
The text was updated successfully, but these errors were encountered:
Opportunity
Whenever you query with Athena it will save the result into some S3 bucket. You can simply download this
.csv
file like any normal file from S3.Another observation is that Athena API can return at most 1000 rows on one single page. This has a significant performance impact if you try to download 100k + rows. There need to be 100+ requests, even if it's just a few MB.
In our case, we are querying Athena from a different region (and different continent) so just the latency alone on those 100+ requests is multiple seconds.
Downloading from S3 is a single request, which is faster. There are almost no downsides.
Result
After I implemented fetching directly from Athena we observed a significant speed-up in our query times. For queries that ~100k rows, it went from 38 seconds to just 18 seconds which is more than a 2x improvement. This is even more significant for queries that return more rows (in some places it was even 4x speed-up).
Request
It would be nice if some form of S3 fetching would be implemented upstream. I have opened PR with my implementation, it's not in a mergeable state right now. I will not have time to clean it up and create a proper PR, but I wanted to share my code anyway in case it helps someone or someone finds the time to properly integrate that functionality into
athenadriver
API.The text was updated successfully, but these errors were encountered: