Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
  • Loading branch information
boyuanzheng010 authored Dec 28, 2023
1 parent f91457d commit 8bf0a02
Showing 1 changed file with 19 additions and 1 deletion.
20 changes: 19 additions & 1 deletion index.html
Original file line number Diff line number Diff line change
Expand Up @@ -617,9 +617,27 @@ <h1 class="title is-1 mmmu">
</video>
<p> Video Recording for web agents running on live-time website 2.</p>
</div>


</div>
</div>
</div>
</div>

<h2 class="title is-3">Results</h2>
<div class="content has-text-justified">
<p>
We develop a new online evaluation tool using Playwright to evaluate web agents on live websites.
Our tool can convert the predicted action into a browser event and exectute on the website.
To adhere to ethical standards, our experiments are restricted to non-login tasks in compliance with user agreements, and we closely monitor agent activities during online evaluation to prevent any actions that have potentially harmful impact, like placing an order or modifying the user profile.
</p>
<img src="static/images/sr_by_action_length_difficulty.png" alt="algebraic reasoning" class="center">
<p>
Whole task success rate across task difficulty levels. We categorize tasks based on the number of actions to complete, i.e., Easy: 2-4, Medium: 5-7, and Hard: 8-12, with 26, 15, and 9 tasks in each group, respectively.
</p>

</div>


</div>
</div>
</div>
Expand Down

0 comments on commit 8bf0a02

Please sign in to comment.