From 5f54caa9916c29016f1ce6f05225a9132c32581a Mon Sep 17 00:00:00 2001 From: Boyuan Zheng Date: Fri, 29 Dec 2023 21:06:45 -0500 Subject: [PATCH] Update index.html --- index.html | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/index.html b/index.html index 1dd277c..6aaf324 100644 --- a/index.html +++ b/index.html @@ -217,13 +217,13 @@

- SEEACT is a generalist web agent based on GPT-4V. + SeeAct is a generalist web agent based on GPT-4V. Specifically, given a web-based task (e.g., “Compare iPhone 15 Pro Max with iPhone 13 Pro Max” in Apple homepage), the agent first perform Action Generation to produce an action description at each step towards completing the task (e.g., “Navigate to the iPhone category”), and then Action Grounding to identify an HTML element (e.g., “[button] iPhone”) at the current step on the webpage.

- SEEACT can successfully compete 50% of the tasks on live websites given an oracle action grounding method. + SeeAct can successfully compete 50% of the tasks on live websites given an oracle action grounding method. It also exhibits remarkable capabilities, ranging from long-range action planning, webpage content reasoning, and error correction.

@@ -298,7 +298,7 @@

Overview

- SEEACT firstly perform Action Generation by leveraging an LMM like GPT-4V to visually perceive websites and generate plans in textual forms, + SeeAct firstly perform Action Generation by leveraging an LMM like GPT-4V to visually perceive websites and generate plans in textual forms, and then Action Grounding to grounded textual plans onto the HTML elements and operations to act on the website

@@ -371,7 +371,7 @@

Experiments and Results

- We compare SEEACT with other models following the two-stage strategy of MindAct. + We compare SeeAct with other models following the two-stage strategy of MindAct. We evaluate supervised fine-tuning (SFT) methods using FLAN-T5 and BLIP2-T5 and in-context learning (ICL) methods using GPT-3.5, GPT-4. The experiment results are shown in the following table.

@@ -381,7 +381,7 @@

Experiments and Results

    We observes the following results in the experiments:
  • - (1) SEEACT with GPT-4V is a strong generalist web agent, if oracle grounding is provided, which substantially outperforming existing methods like GPT-4 (20%) or FLAN-T5 (18%) + (1) SeeAct with GPT-4V is a strong generalist web agent, if oracle grounding is provided, which substantially outperforming existing methods like GPT-4 (20%) or FLAN-T5 (18%)
  • (2) Grounding is still a major challenge. The best grounding strategy still has a 20-25% gap with oracle grounding