diff --git a/index.html b/index.html index d8094f7..cbe99b1 100644 --- a/index.html +++ b/index.html @@ -363,16 +363,34 @@
- To compare with SEEACT, we also implement methods based on text-only LLMs and BLIP2 following the two-stage strategy of MindAct. - Firstly, we employ the ranker above to pick the top 50 elements. - Subsequently, the action generation problem is formulated as a multi-choice question answering problem, with the candidate elements as options, including a "None" option if the target element is absent. - During inference, elements are clustered into groups of 5 elements, with iterative refinement, until a single choice is made or all options are discarded. + We compare SEEACT with other models following the two-stage strategy of MindAct. We evaluate supervised fine-tuning (SFT) methods using FLAN-T5 and BLIP2-T5 and in-context learning (ICL) methods using GPT-3.5, GPT-4. The experiment results are shown in the following table.
+We develop a new online evaluation tool using Playwright to evaluate web agents on live websites. Our tool can convert the predicted action into a browser event and exectute on the website. - To adhere to ethical standards, our experiments are restricted to non-login tasks in compliance with user agreements, and we closely monitor agent activities during online evaluation to prevent any actions that have potentially harmful impact, like placing an order or modifying the user profile. + To adhere to ethical standards, our experiments are restricted to non-login tasks in compliance with user agreements, and we closely monitor agent activities during online evaluation to prevent any actions that have potentially harmful impact.