Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Where is the full dataset? #1

Open
moore269 opened this issue Sep 17, 2024 · 7 comments
Open

Where is the full dataset? #1

moore269 opened this issue Sep 17, 2024 · 7 comments

Comments

@moore269
Copy link

moore269 commented Sep 17, 2024

I see there are 178 examples in this file
https://github.com/xlang-ai/Spider2/blob/main/spider2/examples/spider2.jsonl

However, the paper says there are 600 examples? Where are the rest? Also, is it possible to have the correctly labeled sql as another field in the jsonl as well?

Lastly, is there a place where I can easily download all the referenced tables?

@lfy79001
Copy link
Collaborator

Hi,

We are working on 600 examples, and we have currently only released part of the data. It is expected to take another week.

Some of the tables are on the cloud, please refer to Bigquery Guideline, so you don’t need to download them. There is another portion of the tables that need to be downloaded, which you can access via this link

@sethsiddharth
Copy link

Hello @lfy79001!
It's been about 3 weeks since the last update on the full dataset release. Any news on the progress?

Thank you for your work on this project and for keeping the community informed.

@lfy79001
Copy link
Collaborator

lfy79001 commented Oct 9, 2024

Thank you for your interest in Spider 2.0. We have been busy with paper writing and data validation. In about 10 days, we will release the paper and all the data.

@sethsiddharth
Copy link

Thank you for the update. Looking forward to the release!

@fi5421
Copy link

fi5421 commented Nov 29, 2024

Hello,
Has the complete dataset for nlp questions released? I am interested in the spider2.0-snow dataset which should have 547 questions however the spider2-snow.jsonl only has 260 questions.

@lfy79001
Copy link
Collaborator

Thank you for your interest in our work.
Spider 2.0 , Spider 2.0-lite and Spider 2.0-snow, have now been released with 50% dev split dataset. However, we currently do not plan to release the full dataset of questions to ensure the fairness of the competition.

At this stage, we encourage you to follow the guidelines provided in this document: Submission Guidelines. We will assist with the evaluation of your method based on these instructions.

We may consider releasing all questions in a few months. If there are any changes to the dataset release plan, I will inform you promptly.

Best regards

@fi5421
Copy link

fi5421 commented Nov 30, 2024

Thanks for the update!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants