You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue tracks the integration of DiscoveryBench, a benchmark designed to evaluate multi-step scientific discovery tasks, into OpenHands. The integration will assess OpenHands’ capability to handle complex, data-driven workflows and problem-solving.
📋 Tasks
1. Set up DiscoveryBench
Clone the DiscoveryBench repository and install necessary dependencies.
Prepare the dataset for evaluation and ensure it’s ready for integration.
2. Initialize Runtime
Set up the runtime environment for running experiments.
Ensure the system is properly initialized to execute DiscoveryBench tasks in OpenHands.
3. Run Evaluation and Extract Responses
Execute tasks from the benchmark and capture agent responses.
Ensure all results are accurately captured for each task.
4. Log and Manage Evaluation Outputs
Log all evaluation outputs and ensure proper storage for further analysis.
Compile results for easy access and reporting.
5. Validate Integration
Perform a full end-to-end validation to ensure that the integration works smoothly.
Fix any issues and refine the workflow based on results from testing.
The text was updated successfully, but these errors were encountered:
🛰️ DiscoveryBench Integration
This issue tracks the integration of DiscoveryBench, a benchmark designed to evaluate multi-step scientific discovery tasks, into OpenHands. The integration will assess OpenHands’ capability to handle complex, data-driven workflows and problem-solving.
📋 Tasks
1. Set up DiscoveryBench
2. Initialize Runtime
3. Run Evaluation and Extract Responses
4. Log and Manage Evaluation Outputs
5. Validate Integration
The text was updated successfully, but these errors were encountered: