In this file, we will briefly introduce the usage of each sub-folder:
- document warehouse:
documents/
- task format of examples:
examples/
- real accounts:
settings/
To support retrieval augmented generation framework, we crawl and pre-process HTML from official documentation of various applications. For detailed information, see Documents.
Here we introduce the format of task examples. Take the BigQuery task bigquery/3363a913-d3e9-42c2-9d76-9cd9e9bafec7
as an example, the .json
configuration file is:
{
"id": "3363a913-d3e9-42c2-9d76-9cd9e9bafec7",
"snapshot": "bigquery",
"instruction": "I want to know how many austin bike stations are active? Save the query results into '/home/user/Downloads/answer.csv'.",
"source": [
"https://cloud.google.com/bigquery/docs/quickstarts/query-public-dataset-console"
],
"action_number": 9,
"config": [
{
"type": "bigquery_init",
"parameters": {
"config_file": "evaluation_examples/settings/google/gcp_config.json",
"project_name": "bigquery-project",
"actions": [
{
"type": "empty"
}
]
}
},
{
"type": "google_chrome_browser",
"parameters": {
"debugging_port": 1337,
"listening_port": 9222,
"urls": [
"https://www.bing.com/"
]
}
},
{
"type": "bigquery_login",
"parameters": {
"settings_file": "evaluation_examples/settings/google/settings.json",
"config_file": "evaluation_examples/settings/google/gcp_config.json",
"project_name": "bigquery-project"
}
}
],
"related_apps": [
"bigquery",
"chromium"
],
"tags": [
"cli+gui",
"account",
"data_warehousing",
"abstract"
],
"evaluator": {
"func": "compare_csv",
"result": {
"type": "vm_file",
"path": "/home/user/Downloads/answer.csv",
"dest": "pred.csv"
},
"expected": {
"type": "local_file",
"path": "evaluation_examples/examples/bigquery/3363a913-d3e9-42c2-9d76-9cd9e9bafec7/answer.csv",
"dest": "gold_output.csv"
},
"options": {}
},
"counterpart": "a954c45e-fb59-4231-a052-a7f74dc16bf1"
}
-
id
: the globally unique uuid of the current example -
snapshot
: it indicates the focused professional application -
instruction
: the task instruction, user intent, or task goal -
source
: the list of URLs based on which we construct the current example -
action_number
: under the folder of each task example, we also provide a special fileverbose_instruction.txt
, which indicates how to complete the current task step-by-step. This field is exactly the number of actions based onverbose_instruction.txt
-
related_apps
: the list of professional applications used in the current example. The complete set includes$21$ applications:{'airflow', 'dagster', 'snowflake', 'duckdb', 'bigquery', 'jupyter', 'dbt', 'mysql', 'servicenow', 'terminal', 'metabase', 'airbyte', 'docker', 'hasura_cloud', 'sqlite3', 'vscode', 'chromium', 'postgresql', 'superset', 'dbt_cloud', 'excel'}
-
tags
: different categories to label the current example-
verbose
/abstract
: whether theinstruction
field offers a detailed step-by-step solution -
cli
/gui
/cli+gui
: whether a command line interface (cli) or a graphical user interface (gui) is necessary to finish the current task -
account
: whether a real account is needed to finish the current task - data pipeline chosen from
$7$ categories:['data_ingestion_and_integration', 'data_warehousing', 'data_orchestration', 'data_analysis_and_visualization', 'traditional_data_processing', 'it_service_management', 'data_transformation']
-
-
counterpart
: if the current example uses an abstract instruction, this filed indicates the uuid of its counterpart (that is, the same task with a verbose instruction or step-by-step tutorial), or vice versa -
config
: a list of functions to initialize or reset the environment in the virtual machine. Each function is represented by a JSON dict, where thetype
field indicates the function name and theparameters
field indicates the parameters of the function. For example, in this example, it sequentially invokes:-
bigquery_init_setup
function: clear the cloud workspace of GCPbigquery-project
-
google_chrome_browser_setup
function: launch the Google Chrome application -
bigquery_login_setup
function: utilize playwright to automatically sign in the provided Google account insettings/google/settings.json
-
-
evaluator
: this filed defines how to evaluate the final outcome. Concretely,-
func
defines the metric function used to compare the predicted and golden results. In this example, functioncompare_csv
is utilized -
result
defines how to get the predicted result from the final states of the VM. In this example, we invoke the functionget_vm_file
to copy the file from virtual machine path/home/user/Downloads/answer.csv
to localhost filepred.csv
-
expected
defines how to get the golden result. In this example, we invoke the functionget_local_file
to obtain the oracle filegold_output.csv
from localhost pathevaluation_examples/examples/bigquery/3363a913-d3e9-42c2-9d76-9cd9e9bafec7/answer.csv
-
options
gives the keyword arguments for the metric function. In this case, there is no keyword arguments. In other words, we use the default keyword arguments for metriccompare_csv
-
For task examples that require real accounts (e.g., the tags
field containing value account
, or the related_apps
field containing any value of ['snowflake', 'bigquery', 'servicenow', 'dbt_cloud', 'hasura_cloud']
), we need to sign up and obtain credentials before testing them. The account information and credentials need to be filled into corresponding JSON templates under folder evaluation_examples/settings/
. Please refer to Account Guideline for details on how to register and fill in necessary information.