Skip to content

Commit

Permalink
fix date picker issue
Browse files Browse the repository at this point in the history
  • Loading branch information
LawyZheng committed Jan 9, 2025
1 parent d4ffcdb commit 2a55fe9
Show file tree
Hide file tree
Showing 9 changed files with 117 additions and 47 deletions.
13 changes: 13 additions & 0 deletions skyvern/forge/agent.py
Original file line number Diff line number Diff line change
Expand Up @@ -846,6 +846,19 @@ async def agent_step(
action=action,
action_result=results,
)
if results[-1].skip_remaining_actions:
LOG.warning(
"Going to stop executing the remaining actions",
task_id=task.task_id,
step_id=step.step_id,
step_order=step.order,
step_retry=step.retry_index,
action_idx=action_idx,
action=action,
action_result=results,
)
break

elif results and isinstance(action, DecisiveAction):
LOG.warning(
"DecisiveAction failed, but not stopping execution and not retrying the step",
Expand Down
10 changes: 6 additions & 4 deletions skyvern/forge/prompts/skyvern/custom-select.j2
Original file line number Diff line number Diff line change
@@ -1,19 +1,21 @@
You are performing a {{ "multi-level selection" if select_history else "selection" }} action on an HTML page. Assist the user in selecting the most appropriate option(or typing some values to search if neccesary) to advance toward their goal, considering the context, user details, and the DOM elements provided in the list.
You are performing a {{ "multi-level selection" if select_history else "selection" }} action on an HTML page. Assist the user in selecting the most appropriate option(or typing some values to search if neccesary) to advance toward their goal, considering the context, user details, and the HTML elements provided in the list.

You can identify the matching element based on the following guidelines:
1. Select the most suitable element based on the user goal, user details, and the context.
2. If none of the options perfectly match, and there is no search box for input, but there is a fallback option such as "Others" or "None of the above" in the DOM elements, you can consider it a match.
2. If none of the options perfectly match, and there is no search box for input, but there is a fallback option such as "Others" or "None of the above" in the HTML elements, you can consider it a match.
3. If a field is required, do not leave it blank.
4. If a field is required, do not select a placeholder value, such as "Please select", "-", or "Select...".
5. Exclude loading indicators like "loading more results" as valid options.{% if select_history %}
6. The selection history displays the previously selected values for the multi-level selection. Continue to complete the entire selection process.{% endif %}
6. The selection history displays the previously selected values for the multi-level selection. Continue to complete the entire selection process.{% if is_date_related %}
7. Date picker might be triggered, you goal is to set the correct start date and end date.{% endif %}{% endif %}

MAKE SURE YOU OUTPUT VALID JSON. No text before or after JSON, no trailing commas, no comments (//), no unnecessary quotes, etc.
Each interactable element is tagged with an ID.

Reply in JSON format with the following keys:
{
"reasoning": str, // The reasoning behind the action. Be specific, referencing the value and the element id in your reasoning. Mention why you chose the element id. Keep the reasoning short and to the point.
"page_info": str, // Think step by step. Describe the page information you parsed from the HTML elements. Your action should be based on the current page information.
"reasoning": str, // The reasoning behind the current single action. Be specific, referencing the value and the element id in your reasoning. Mention why you chose the element id. Keep the reasoning short and to the point.
"confidence_float": float, // The confidence of the action. Pick a number between 0.0 and 1.0. 0.0 means no confidence, 1.0 means full confidence
"id": str, // The id of the element to take action on. The id has to be one from the elements list
"action_type": str, // It's a string enum: "CLICK", "INPUT_TEXT". "CLICK" is an option you'd like to click to choose. "INPUT_TEXT" is an element you'd like to input text into for searching, but it only should be used when there's no valid option to click.
Expand Down
1 change: 1 addition & 0 deletions skyvern/forge/prompts/skyvern/opened-dropdown-confirm.j2
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ There is a screenshot from a part of a web HTML page. Help me confirm if it is a
An open dropdown menu can be defined as:
- At least one option is visible in the screenshot.
- A calendar view could be considered as an open dropdown menu. But DO NOT consider an calendar icon as the dropdown menu.
- If the screenshot contains multiple input fields for selecting quantities, consider as the dropdown menu.
- Do not consider it an open dropdown menu if the only visible option displays a message like "No results" or "No match".
- Do not consider it an open dropdown menu if the only visible element displays a placeholder like "Please select", "-", or "Select...".

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ Reply in the following JSON format:
"is_required": bool, // True if this is a required field, otherwise false.
"is_search_bar": bool, // True if the element to take the action is a search bar, otherwise false.
"is_location_input": bool, // True if the element is asking user to input where he lives, otherwise false. For example, it is asking for location, or address, or other similar information. Output False if it only requires ZIP code or postal code.
"is_date_related": bool, // True if the field is related to date input or select, otherwise false.
}

Existing reasoning context:
Expand Down
1 change: 1 addition & 0 deletions skyvern/webeye/actions/actions.py
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@ class InputOrSelectContext(BaseModel):
is_required: bool | None = None
is_search_bar: bool | None = None # don't trigger custom-selection logic when it's a search bar
is_location_input: bool | None = None # address input usually requires auto completion
is_date_related: bool | None = None # date picker mini agent requires some special logic

def __repr__(self) -> str:
return f"InputOrSelectContext(field={self.field}, is_required={self.is_required}, is_search_bar={self.is_search_bar}, is_location_input={self.is_location_input})"
Expand Down
121 changes: 79 additions & 42 deletions skyvern/webeye/actions/handler.py
Original file line number Diff line number Diff line change
Expand Up @@ -142,25 +142,43 @@ async def wrapper(element_dict: dict) -> bool:
return wrapper


CheckExistIDFunc = Callable[[str], bool]
CheckFilterOutElementIDFunc = Callable[[str], Awaitable[bool]]


def check_id_in_dict_factory(id_dict: dict[str, Any]) -> CheckExistIDFunc:
def helper(element_id: str) -> bool:
if id_dict.get(element_id, ""):
def check_disappeared_element_id_in_incremental_factory(
incremental_scraped: IncrementalScrapePage,
) -> CheckFilterOutElementIDFunc:
current_element_to_dict = copy.deepcopy(incremental_scraped.id_to_css_dict)

async def helper(element_id: str) -> bool:
if not current_element_to_dict.get(element_id, ""):
return False

try:
skyvern_element = await SkyvernElement.create_from_incremental(
incre_page=incremental_scraped, element_id=element_id
)
except Exception:
LOG.info(
"Failed to create skyvern element, going to drop the element from incremental tree",
exc_info=True,
element_id=element_id,
)
return True
return False

skyvern_frame = incremental_scraped.skyvern_frame
return not await skyvern_frame.get_element_visible(await skyvern_element.get_element_handler())

return helper


def remove_exist_elements(element_tree: list[dict], check_exist: CheckExistIDFunc) -> list[dict]:
async def filter_out_elements(element_tree: list[dict], check_filter: CheckFilterOutElementIDFunc) -> list[dict]:
new_element_tree = []
for element in element_tree:
children_elements = element.get("children", [])
if len(children_elements) > 0:
children_elements = remove_exist_elements(element_tree=children_elements, check_exist=check_exist)
if check_exist(element.get("id", "")):
children_elements = await filter_out_elements(element_tree=children_elements, check_filter=check_filter)
if await check_filter(element.get("id", "")):
new_element_tree.extend(children_elements)
else:
element["children"] = children_elements
Expand All @@ -169,14 +187,14 @@ def remove_exist_elements(element_tree: list[dict], check_exist: CheckExistIDFun


def clean_and_remove_element_tree_factory(
task: Task, step: Step, check_exist_funcs: list[CheckExistIDFunc]
task: Task, step: Step, check_filter_funcs: list[CheckFilterOutElementIDFunc]
) -> CleanupElementTreeFunc:
async def helper_func(frame: Page | Frame, url: str, element_tree: list[dict]) -> list[dict]:
element_tree = await app.AGENT_FUNCTION.cleanup_element_tree_factory(task=task, step=step)(
frame, url, element_tree
)
for check_exist in check_exist_funcs:
element_tree = remove_exist_elements(element_tree=element_tree, check_exist=check_exist)
for check_filter in check_filter_funcs:
element_tree = await filter_out_elements(element_tree=element_tree, check_filter=check_filter)

return element_tree

Expand Down Expand Up @@ -557,27 +575,26 @@ async def handle_input_text_action(
)
return [ActionFailure(InteractWithDisabledElement(skyvern_element.get_id()))]

select_action = SelectOptionAction(
reasoning=action.reasoning,
element_id=skyvern_element.get_id(),
option=SelectOption(label=text),
)
if skyvern_element.get_selectable():
LOG.info(
"Input element is selectable, doing select actions",
task_id=task.task_id,
step_id=step.step_id,
element_id=skyvern_element.get_id(),
action=action,
)
return await handle_select_option_action(select_action, page, scraped_page, task, step)

incremental_element: list[dict] = []
auto_complete_hacky_flag: bool = False
# check if it's selectable
if skyvern_element.get_tag_name() == InteractiveElement.INPUT and not await skyvern_element.is_raw_input():
select_action = SelectOptionAction(
reasoning=action.reasoning,
element_id=skyvern_element.get_id(),
option=SelectOption(label=text),
)

await skyvern_element.scroll_into_view()
if skyvern_element.get_selectable():
LOG.info(
"Input element is selectable, doing select actions",
task_id=task.task_id,
step_id=step.step_id,
element_id=skyvern_element.get_id(),
action=action,
)
return await handle_select_option_action(select_action, page, scraped_page, task, step)

# press arrowdown to watch if there's any options popping up
await incremental_scraped.start_listen_dom_increment()
try:
Expand Down Expand Up @@ -605,7 +622,7 @@ async def handle_input_text_action(
await asyncio.sleep(5)

incremental_element = await incremental_scraped.get_incremental_element_tree(
clean_and_remove_element_tree_factory(task=task, step=step, check_exist_funcs=[dom.check_id_in_dom]),
clean_and_remove_element_tree_factory(task=task, step=step, check_filter_funcs=[dom.check_id_in_dom]),
)
if len(incremental_element) == 0:
LOG.info(
Expand Down Expand Up @@ -755,6 +772,7 @@ async def handle_input_text_action(
step=step,
task=task,
):
auto_complete_hacky_flag = False
return [result]

await incremental_scraped.start_listen_dom_increment()
Expand All @@ -763,7 +781,7 @@ async def handle_input_text_action(
await skyvern_element.input_sequentially(text=text)
finally:
incremental_element = await incremental_scraped.get_incremental_element_tree(
clean_and_remove_element_tree_factory(task=task, step=step, check_exist_funcs=[dom.check_id_in_dom]),
clean_and_remove_element_tree_factory(task=task, step=step, check_filter_funcs=[dom.check_id_in_dom]),
)
if len(incremental_element) > 0:
auto_complete_hacky_flag = True
Expand Down Expand Up @@ -1052,7 +1070,7 @@ async def handle_select_option_action(
is_open = True

incremental_element = await incremental_scraped.get_incremental_element_tree(
clean_and_remove_element_tree_factory(task=task, step=step, check_exist_funcs=[dom.check_id_in_dom]),
clean_and_remove_element_tree_factory(task=task, step=step, check_filter_funcs=[dom.check_id_in_dom]),
)

if len(incremental_element) == 0 and skyvern_element.get_tag_name() == InteractiveElement.INPUT:
Expand All @@ -1067,7 +1085,7 @@ async def handle_select_option_action(
# wait 5s for options to load
await asyncio.sleep(5)
incremental_element = await incremental_scraped.get_incremental_element_tree(
clean_and_remove_element_tree_factory(task=task, step=step, check_exist_funcs=[dom.check_id_in_dom]),
clean_and_remove_element_tree_factory(task=task, step=step, check_filter_funcs=[dom.check_id_in_dom]),
)

if len(incremental_element) == 0:
Expand Down Expand Up @@ -1539,7 +1557,7 @@ async def choose_auto_completion_dropdown(
# wait for new elemnts to load
await asyncio.sleep(5)
incremental_element = await incremental_scraped.get_incremental_element_tree(
clean_and_remove_element_tree_factory(task=task, step=step, check_exist_funcs=[dom.check_id_in_dom]),
clean_and_remove_element_tree_factory(task=task, step=step, check_filter_funcs=[dom.check_id_in_dom]),
)

# check if elements in preserve list are still on the page
Expand Down Expand Up @@ -1856,15 +1874,15 @@ async def sequentially_select_from_dropdown(
values: list[str | None] = []
select_history: list[CustomSingleSelectResult] = []

check_exist_funcs: list[CheckExistIDFunc] = [dom.check_id_in_dom]
check_filter_funcs: list[CheckFilterOutElementIDFunc] = [dom.check_id_in_dom]
for i in range(MAX_SELECT_DEPTH):
single_select_result = await select_from_dropdown(
context=input_or_select_context,
page=page,
skyvern_element=skyvern_element,
skyvern_frame=skyvern_frame,
incremental_scraped=incremental_scraped,
check_exist_funcs=check_exist_funcs,
check_filter_funcs=check_filter_funcs,
step=step,
task=task,
dropdown_menu_element=dropdown_menu_element,
Expand All @@ -1877,6 +1895,18 @@ async def sequentially_select_from_dropdown(
# wait 1s until DOM finished updating
await asyncio.sleep(1)

# HACK: if agent took mini actions 2 times, stop executing the rest actions
# this is a hack to fix some date picker issues.
if input_or_select_context.is_date_related and i >= 1 and single_select_result.action_result:
LOG.warning(
"It's a date picker, going to skip reamaining actions",
depth=i,
task_id=task.task_id,
step_id=step.step_id,
)
single_select_result.action_result.skip_remaining_actions = True
break

if await single_select_result.is_done():
return single_select_result.action_result, values[-1] if len(values) > 0 else None

Expand All @@ -1898,14 +1928,15 @@ async def sequentially_select_from_dropdown(
# wait for 3s to load new options
await asyncio.sleep(3)

current_element_to_dict = copy.deepcopy(incremental_scraped.id_to_css_dict)
check_exist_funcs.append(check_id_in_dict_factory(current_element_to_dict))
check_filter_funcs.append(
check_disappeared_element_id_in_incremental_factory(incremental_scraped=incremental_scraped)
)

secondary_increment_element = await incremental_scraped.get_incremental_element_tree(
clean_and_remove_element_tree_factory(
task=task,
step=step,
check_exist_funcs=check_exist_funcs,
check_filter_funcs=check_filter_funcs,
)
)
if len(secondary_increment_element) == 0:
Expand Down Expand Up @@ -1940,7 +1971,7 @@ async def select_from_dropdown(
skyvern_element: SkyvernElement,
skyvern_frame: SkyvernFrame,
incremental_scraped: IncrementalScrapePage,
check_exist_funcs: list[CheckExistIDFunc],
check_filter_funcs: list[CheckFilterOutElementIDFunc],
step: Step,
task: Task,
dropdown_menu_element: SkyvernElement | None = None,
Expand Down Expand Up @@ -1991,14 +2022,15 @@ async def select_from_dropdown(
)

trimmed_element_tree = await incremental_scraped.get_incremental_element_tree(
clean_and_remove_element_tree_factory(task=task, step=step, check_exist_funcs=check_exist_funcs),
clean_and_remove_element_tree_factory(task=task, step=step, check_filter_funcs=check_filter_funcs),
)

html = incremental_scraped.build_html_tree(element_tree=trimmed_element_tree)

skyvern_context = ensure_context()
prompt = prompt_engine.load_prompt(
"custom-select",
is_date_related=context.is_date_related,
field_information=context.field,
required_field=context.is_required,
target_value="" if force_select else target_value,
Expand All @@ -2014,7 +2046,12 @@ async def select_from_dropdown(
step_id=step.step_id,
task_id=task.task_id,
)
json_response = await app.SECONDARY_LLM_API_HANDLER(prompt=prompt, step=step)
if context.is_date_related:
# HACK: according to the test, secondary LLM is not doing well on the date picker
# using the main LLM to handle the case
json_response = await app.LLM_API_HANDLER(prompt=prompt, step=step)
else:
json_response = await app.SECONDARY_LLM_API_HANDLER(prompt=prompt, step=step)
value: str | None = json_response.get("value", None)
single_select_result.value = value
select_reason: str | None = json_response.get("reasoning", None)
Expand Down Expand Up @@ -2121,7 +2158,7 @@ async def select_from_dropdown_by_value(
) -> ActionResult:
timeout = settings.BROWSER_ACTION_TIMEOUT_MS
await incremental_scraped.get_incremental_element_tree(
clean_and_remove_element_tree_factory(task=task, step=step, check_exist_funcs=[dom.check_id_in_dom]),
clean_and_remove_element_tree_factory(task=task, step=step, check_filter_funcs=[dom.check_id_in_dom]),
)

element_locator = await incremental_scraped.select_one_element_by_value(value=value)
Expand Down Expand Up @@ -2155,7 +2192,7 @@ async def select_from_dropdown_by_value(

async def continue_callback(incre_scraped: IncrementalScrapePage) -> bool:
await incre_scraped.get_incremental_element_tree(
clean_and_remove_element_tree_factory(task=task, step=step, check_exist_funcs=[dom.check_id_in_dom]),
clean_and_remove_element_tree_factory(task=task, step=step, check_filter_funcs=[dom.check_id_in_dom]),
)

element_locator = await incre_scraped.select_one_element_by_value(value=value)
Expand Down
3 changes: 3 additions & 0 deletions skyvern/webeye/actions/responses.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ class ActionResult(BaseModel):
# None is used for old data so that we can differentiate between old and new data which only has boolean
interacted_with_sibling: bool | None = None
interacted_with_parent: bool | None = None
skip_remaining_actions: bool | None = None

def __str__(self) -> str:
results = [f"ActionResult(success={self.success}"]
Expand All @@ -35,6 +36,8 @@ def __str__(self) -> str:
results.append(f"interacted_with_sibling={self.interacted_with_sibling}")
if self.interacted_with_parent is not None:
results.append(f"interacted_with_parent={self.interacted_with_parent}")
if self.skip_remaining_actions is not None:
results.append(f"skip_remaining_actions={self.skip_remaining_actions}")

return ", ".join(results) + ")"

Expand Down
Loading

0 comments on commit 2a55fe9

Please sign in to comment.