You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Thanks for creating this benchmarking dataset. This will be very helpful on building autonomous scientific discovery using LLMs.
I noticed the paper mentioned there were 264 tasks in the 'real' set but this repository has 283 'queries' in the metadata json files and the answer_key_real.csv has 239 rows. I'm wondering what in this repo is defined as the task in the paper. Can you give a hint on how to find those 264 tasks and the answers?
Besides, is it possible that you can share the prompt to run the Reflexion (oracle) method? Thanks!
The text was updated successfully, but these errors were encountered:
Thank you for the quick response and detailed explanation! On a related note, could you share the gold workflows you’ve prepared for the real datasets? Appreciate it!
Hi,
Thanks for creating this benchmarking dataset. This will be very helpful on building autonomous scientific discovery using LLMs.
I noticed the paper mentioned there were 264 tasks in the 'real' set but this repository has 283 'queries' in the metadata json files and the answer_key_real.csv has 239 rows. I'm wondering what in this repo is defined as the task in the paper. Can you give a hint on how to find those 264 tasks and the answers?
Besides, is it possible that you can share the prompt to run the Reflexion (oracle) method? Thanks!
The text was updated successfully, but these errors were encountered: