Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider using the downloads available from releases #31

Closed
tayloramurphy opened this issue Apr 26, 2024 · 6 comments
Closed

Consider using the downloads available from releases #31

tayloramurphy opened this issue Apr 26, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@tayloramurphy
Copy link

I ran the tap yesterday and it took about 8 hours to get all of the data downloaded from 1950 to present.

On https://github.com/f1db/f1db/releases/tag/v2024.5.0 they offer multiple data formats for the full data set. It would likely be faster to fetch those and then do a batch load or even row by row read of the data.

I also think it's missing some of the data, but I don't have a good sense of what exactly. I uploaded the pg dump from the releases to postgres and did the Meltano extract to Snowflake and get different numbers for quite a few of the tables.

Either way though, thanks for the tap @ReubenFrankel 😄

@tayloramurphy tayloramurphy changed the title Consider using the downloads available form releases Consider using the downloads available from releases Apr 26, 2024
@ReubenFrankel
Copy link
Owner

ReubenFrankel commented Apr 26, 2024

An 8 hour full sync is pretty bad, so I'll have a look into this as a viable alternative source to Ergast API (which is due to be shut down at the end of 2024 anyway).

Also just to clarify, you mean F1DB might be missing some data, or the tap?

I may also have a look and see if there are any easy improvements that can be made to reduce the number of API calls, which I imagine make up most of the sync time. I definitely ran full syncs during initial development, but it's possible I made a change that causes the number of requests to exponentially grow the further back the start date is set.

@ReubenFrankel ReubenFrankel added the enhancement New feature or request label Apr 26, 2024
@tayloramurphy
Copy link
Author

Oh wow, just saw that it's going to be shutdown at the end of the season. Looks like it's website and API - hopefully someone from the community will pick it up...

I'm just comparing rows between the postgres dump available on the releases and what's spit out of the tap - so I think some rows are possibly dropped during the fetching from the API.

@ReubenFrankel
Copy link
Owner

@tayloramurphy Sorry, I might be misunderstanding... 😅 Are you implying that F1DB releases are associated with the Ergast API? Or are you referring to the Ergast database images? My understanding is that they are two distinct sources, so I'd expect a difference in the data for sure. If there's data provided by the API that the tap is missing (which could be verified by comparing a dump of the Ergast database image to the tap sync result), that's a separate issue.

@tayloramurphy
Copy link
Author

@ReubenFrankel ...I thought they were the same. 🤦 I think you can ignore me haha. But since the Ergast API is ending maybe switching to F1DB is the answer... sorry for the confusion!

@ReubenFrankel
Copy link
Owner

@tayloramurphy I think there is still a valid case for keeping this open for the current performance issues you highlighted. Happy for you to reopen with a new title, or I'll follow up with a new issue at some point. 👍

But since the Ergast API is ending maybe switching to F1DB is the answer

Yep - I think this could address both problems. I'll keep an eye on F1DB and other alternatives too - thanks for the heads-up!

@ReubenFrankel
Copy link
Owner

#83 updates to the Ergast-compatible Jolpica API. They don't/will not be using F1DB data: jolpica/jolpica-f1#100 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants