From 419bc1267d88597408e6e45a624751847b76a4f5 Mon Sep 17 00:00:00 2001 From: Boyuan Zheng Date: Wed, 3 Jan 2024 02:20:33 -0500 Subject: [PATCH] Update index.html --- index.html | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/index.html b/index.html index 4341522..ba708ac 100644 --- a/index.html +++ b/index.html @@ -740,11 +740,11 @@

Results

- algebraic reasoning + algebraic reasoning
Whole task success rate across task difficulty levels. We categorize tasks based on the number of actions to complete, i.e., Easy: 2-4, Medium: 5-7, and Hard: 8-12, with 26, 15, and 9 tasks in each group, respectively.
- your second image description + your second image description
Whole task success rate (%) under both offline and online evaluation. Offline0 and Offline1 refer to no tolerance for error at any step and allowing for error at one step, respectively.