Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Real dataset and Reflexion #15

Open
Qiuhai-Zeng opened this issue Dec 2, 2024 · 3 comments
Open

Real dataset and Reflexion #15

Qiuhai-Zeng opened this issue Dec 2, 2024 · 3 comments

Comments

@Qiuhai-Zeng
Copy link

Hi,

Thanks for creating this benchmarking dataset. This will be very helpful on building autonomous scientific discovery using LLMs.

I noticed the paper mentioned there were 264 tasks in the 'real' set but this repository has 283 'queries' in the metadata json files and the answer_key_real.csv has 239 rows. I'm wondering what in this repo is defined as the task in the paper. Can you give a hint on how to find those 264 tasks and the answers?

Besides, is it possible that you can share the prompt to run the Reflexion (oracle) method? Thanks!

@AryanPrakhar
Copy link
Collaborator

Hi @Qiuhai-Zeng,

Thank you for your kind words about DiscoveryBench and for diving into the details!

To clarify:

  • The 264 tasks in the 'real' set correspond to unique "qid" entries in discoverybench/real metadata.
  • Of these, 239 are in the test set (with answers in answer_key_real.csv), and the remaining 25 are in the train set.

I’ll follow up with details on the Reflexion method prompts soon. Let me know if you have more questions in the meantime!

Best,
Aryan

@Qiuhai-Zeng
Copy link
Author

Hi Aryan,

Thank you for the quick response and detailed explanation! On a related note, could you share the gold workflows you’ve prepared for the real datasets? Appreciate it!

Best,
Qiuhai

@Qiuhai-Zeng
Copy link
Author

Never mind, I found the 'workflow_tags' in the answer_key. I believe this is the gold workflow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants