-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consider using the downloads available from releases #31
Comments
An 8 hour full sync is pretty bad, so I'll have a look into this as a viable alternative source to Ergast API (which is due to be shut down at the end of 2024 anyway). Also just to clarify, you mean F1DB might be missing some data, or the tap? I may also have a look and see if there are any easy improvements that can be made to reduce the number of API calls, which I imagine make up most of the sync time. I definitely ran full syncs during initial development, but it's possible I made a change that causes the number of requests to exponentially grow the further back the start date is set. |
Oh wow, just saw that it's going to be shutdown at the end of the season. Looks like it's website and API - hopefully someone from the community will pick it up... I'm just comparing rows between the postgres dump available on the releases and what's spit out of the tap - so I think some rows are possibly dropped during the fetching from the API. |
@tayloramurphy Sorry, I might be misunderstanding... 😅 Are you implying that F1DB releases are associated with the Ergast API? Or are you referring to the Ergast database images? My understanding is that they are two distinct sources, so I'd expect a difference in the data for sure. If there's data provided by the API that the tap is missing (which could be verified by comparing a dump of the Ergast database image to the tap sync result), that's a separate issue. |
@ReubenFrankel ...I thought they were the same. 🤦 I think you can ignore me haha. But since the Ergast API is ending maybe switching to F1DB is the answer... sorry for the confusion! |
@tayloramurphy I think there is still a valid case for keeping this open for the current performance issues you highlighted. Happy for you to reopen with a new title, or I'll follow up with a new issue at some point. 👍
Yep - I think this could address both problems. I'll keep an eye on F1DB and other alternatives too - thanks for the heads-up! |
#83 updates to the Ergast-compatible Jolpica API. They don't/will not be using F1DB data: jolpica/jolpica-f1#100 (comment) |
I ran the tap yesterday and it took about 8 hours to get all of the data downloaded from 1950 to present.
On https://github.com/f1db/f1db/releases/tag/v2024.5.0 they offer multiple data formats for the full data set. It would likely be faster to fetch those and then do a batch load or even row by row read of the data.
I also think it's missing some of the data, but I don't have a good sense of what exactly. I uploaded the pg dump from the releases to postgres and did the Meltano extract to Snowflake and get different numbers for quite a few of the tables.
Either way though, thanks for the tap @ReubenFrankel 😄
The text was updated successfully, but these errors were encountered: