Real dataset and Reflexion #15

Qiuhai-Zeng · 2024-12-02T01:06:59Z

Hi,

Thanks for creating this benchmarking dataset. This will be very helpful on building autonomous scientific discovery using LLMs.

I noticed the paper mentioned there were 264 tasks in the 'real' set but this repository has 283 'queries' in the metadata json files and the answer_key_real.csv has 239 rows. I'm wondering what in this repo is defined as the task in the paper. Can you give a hint on how to find those 264 tasks and the answers?

Besides, is it possible that you can share the prompt to run the Reflexion (oracle) method? Thanks!

AryanPrakhar · 2024-12-02T17:39:12Z

Hi @Qiuhai-Zeng,

Thank you for your kind words about DiscoveryBench and for diving into the details!

To clarify:

The 264 tasks in the 'real' set correspond to unique "qid" entries in discoverybench/real metadata.
Of these, 239 are in the test set (with answers in answer_key_real.csv), and the remaining 25 are in the train set.

I’ll follow up with details on the Reflexion method prompts soon. Let me know if you have more questions in the meantime!

Best,
Aryan

Qiuhai-Zeng · 2024-12-07T19:35:40Z

Hi Aryan,

Thank you for the quick response and detailed explanation! On a related note, could you share the gold workflows you’ve prepared for the real datasets? Appreciate it!

Best,
Qiuhai

Qiuhai-Zeng · 2024-12-10T18:29:36Z

Never mind, I found the 'workflow_tags' in the answer_key. I believe this is the gold workflow.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Real dataset and Reflexion #15

Real dataset and Reflexion #15

Qiuhai-Zeng commented Dec 2, 2024

AryanPrakhar commented Dec 2, 2024

Qiuhai-Zeng commented Dec 7, 2024

Qiuhai-Zeng commented Dec 10, 2024

Real dataset and Reflexion #15

Real dataset and Reflexion #15

Comments

Qiuhai-Zeng commented Dec 2, 2024

AryanPrakhar commented Dec 2, 2024

Qiuhai-Zeng commented Dec 7, 2024

Qiuhai-Zeng commented Dec 10, 2024