diff --git a/index.html b/index.html index 1dd277c..6aaf324 100644 --- a/index.html +++ b/index.html @@ -217,13 +217,13 @@
- SEEACT is a generalist web agent based on GPT-4V. + SeeAct is a generalist web agent based on GPT-4V. Specifically, given a web-based task (e.g., “Compare iPhone 15 Pro Max with iPhone 13 Pro Max” in Apple homepage), the agent first perform Action Generation to produce an action description at each step towards completing the task (e.g., “Navigate to the iPhone category”), and then Action Grounding to identify an HTML element (e.g., “[button] iPhone”) at the current step on the webpage.
- SEEACT can successfully compete 50% of the tasks on live websites given an oracle action grounding method. + SeeAct can successfully compete 50% of the tasks on live websites given an oracle action grounding method. It also exhibits remarkable capabilities, ranging from long-range action planning, webpage content reasoning, and error correction.
@@ -298,7 +298,7 @@- SEEACT firstly perform Action Generation by leveraging an LMM like GPT-4V to visually perceive websites and generate plans in textual forms, + SeeAct firstly perform Action Generation by leveraging an LMM like GPT-4V to visually perceive websites and generate plans in textual forms, and then Action Grounding to grounded textual plans onto the HTML elements and operations to act on the website
- We compare SEEACT with other models following the two-stage strategy of MindAct. + We compare SeeAct with other models following the two-stage strategy of MindAct. We evaluate supervised fine-tuning (SFT) methods using FLAN-T5 and BLIP2-T5 and in-context learning (ICL) methods using GPT-3.5, GPT-4. The experiment results are shown in the following table.
@@ -381,7 +381,7 @@