Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Evaluation] Add DiscoveryBench Benchmark #4

Open
suranah opened this issue Oct 18, 2024 · 0 comments
Open

[Evaluation] Add DiscoveryBench Benchmark #4

suranah opened this issue Oct 18, 2024 · 0 comments
Labels
enhancement New feature or request

Comments

@suranah
Copy link
Member

suranah commented Oct 18, 2024

What problem or use case are you trying to solve?

Add DiscoveryBench to OpenHands' evaluation suite. DiscoveryBench contains 264 tasks collected
across 6 diverse domains, such as biology, economics, and sociology. It incorporates discovery workflows from published papers to approximate the real-world challenges faced by researchers.

https://github.com/allenai/discoverybench/
https://x.com/mbodhisattwa/status/1811524569410531333

Do you have thoughts on the technical implementation?

The implementation will consist of:

  1. Inference script to solve a DiscoveryBench task (goal & datasets)
  2. Facetted evaluation script to rigorously evaluate the answers
  3. Documentation for the OpenHands users

Additional context

We are working on a PR for this and will seek OpenHands contributors' input to finalize it.

@suranah suranah added the enhancement New feature or request label Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

1 participant