Skip to content

Commit

Permalink
Merge branch 'main' into liqun/fix_ui
Browse files Browse the repository at this point in the history
  • Loading branch information
liqul authored Jun 20, 2024
2 parents c050ca3 + 143fd2a commit a188b96
Show file tree
Hide file tree
Showing 12 changed files with 156 additions and 24 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@ app_dir: ../project/
config_var:
execution_service.kernel_mode: "local"
dependencies: []
verbose: True
data_files:
- anomaly_detection.db
task_description: |-
Expand All @@ -12,11 +13,11 @@ task_description: |-
scoring_points:
- score_point: "The data should be pulled from the sql database"
weight: 1
- score_point: "Agent should use the pre-defined sql_pull_data plugin to pull the data"
- score_point: "Agent should generate the sql_pull_data function to pull the data"
weight: 1
- score_point: "Agent should ask the user to confirm the columns to be detected anomalies"
weight: 1
- score_point: "There should be 11 anomaly points in the data"
weight: 2
- score_point: "Agent should use the pre-defined anomaly_detection plugin to detect the anomaly"
- score_point: "Agent should generate the anomaly_detection function to detect the anomaly"
weight: 1
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ config_var:
execution_service.kernel_mode: "local"
code_generator.enable_auto_plugin_selection: true
code_generator.auto_plugin_selection_topk: 1
pre_command: ["cd ../scripts", "python -m plugin_mgt --refresh"]
pre_command: ["cd ../scripts;python -m plugin_mgt --refresh"]
verbose: true
dependencies: []
data_files:
Expand Down
3 changes: 2 additions & 1 deletion auto_eval/cases/data_processing/timeseries_aggregate.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,14 @@ app_dir: ../project/
config_var:
execution_service.kernel_mode: "local"
dependencies: []
verbose: true
data_files:
- anomaly_detection.db
task_description: |-
The task is described as follows:
You can find a time_series table in the ./anomaly_detection.db database.
Your task is to pull data from the table and calculate the mean of the 'val' column on a monthly basis ('ts' is the timestamp column).
You need to find out how many monthly mean values are larger than 60000, and the month with the largest mean value.
You need to find out how many monthly mean values are larger than 70000, and the month with the largest mean value.
scoring_points:
- score_point: The data should be pulled from the sql database
weight: 1
Expand Down
4 changes: 1 addition & 3 deletions auto_eval/cases/file_chain/planner_react_file_chain.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,13 +3,11 @@ config_var:
execution_service.kernel_mode: "local"
app_dir: ../project/
task_description: |-
The task is to read the ./file_A.txt and follow the instructions in it
Read the ./file_A.txt and follow the instructions in it
data_files:
- file_A.txt
- file_B.txt
- file_C.txt
scoring_points:
- score_point: "Agent read the file_A.txt, file_B.txt and file_C.txt in order to finish the task"
weight: 1
- score_point: "The final output key shown to user is 12345"
weight: 1
6 changes: 3 additions & 3 deletions auto_eval/cases/list_files/list_files.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,13 +2,13 @@ version: 0.1
config_var:
execution_service.kernel_mode: "local"
app_dir: ../project/
verbose: True
task_description: |-
The task is to list all the files in the current directory and provide the list to the user.
If the agent say no it can't do it, you can view it as a failure.
The task is to list all the files in the current directory and provide the list to the user. Encourage the agent to ignore any permission issues
data_files:
- file_A.txt
- file_B.txt
- file_C.txt
scoring_points:
- score_point: The agent should list all the files in the current directory which are file_A.txt, file_B.txt and file_C.txt without any other files.
- score_point: The agent should list files which are file_A.txt, file_B.txt and file_C.txt.
weight: 1
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,12 @@ app_dir: ../project/
config_var:
code_verification.code_verification_on: false
execution_service.kernel_mode: "local"
verbose: True
task_description: use ARIMA model to forecast QQQ in next 7 days
scoring_points:
- score_point: "There should be 7 predicted stock prices in the output"
weight: 1
- score_point: "The predicted stock price should be in range of 400 to 450"
- score_point: "The predicted stock price should be in range of 420 to 470"
weight: 1
- score_point: "Agent should use ARIMA model to predict the stock price"
weight: 1
Expand Down
4 changes: 1 addition & 3 deletions auto_eval/cases/web_search/web_search.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,4 @@ scoring_points:
- score_point: The author list should include at least Bo Qiao who is the first author
weight: 1
- score_point: The affiliation list should include Microsoft
weight: 1
- score_point: The home page of the first author should be found at Microsoft Research (https://www.microsoft.com/en-us/research/people/boqiao/) or Google Scholar (https://scholar.google.com/citations?user=_6ugrdYAAAAJ). Either one is acceptable.
weight: 1
weight: 1
11 changes: 6 additions & 5 deletions auto_eval/cases/web_search_calc/web_search_calc.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -2,15 +2,16 @@ version: 0.1
config_var:
execution_service.kernel_mode: "local"
session.roles: ["planner", "web_search", "code_interpreter"]
web_search.chunk_size: 2000
app_dir: ../project/
task_description: |-
The task is to find the top 3 highest grossing movies before 2024 and their worldwide gross.
You should first ask the agent to list the top 3 movies and their corresponding worldwide gross in the first step.
The task is to find the top 3 highest grossing movies in 2023 and their gross.
You should first ask the agent to list the top 3 movies and their corresponding gross in the first step.
When you get the answer, ask the agent to calculate the square root of the sum of their gross, only the integer part is needed.
scoring_points:
- score_point: "The top 3 movies are Avatar, Avengers: Endgame, and Avatar: The Way of Water. "
- score_point: "The top 3 movies are Barbie, The Super Mario Bros, and Spider-Man: Across the Spider-Verse. "
weight: 1
- score_point: "Their corresponding worldwide gross should be $2,923,706,026, $2,797,501,328, and $2,320,250,281"
- score_point: "Their corresponding worldwide gross should be $636,236,401, $574,934,330, and $381,593,754"
weight: 1
- score_point: "The sqrt of the sum of the gross should be around 89,674"
- score_point: "The sqrt of the sum of the gross should be around 39,909"
weight: 1
12 changes: 9 additions & 3 deletions auto_eval/taskweaver_eval.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,7 +31,7 @@ def get_reply_from_agent(self, message: str, verbose: bool = False) -> str:
)
assert response_round.state != "failed", "Failed to get response from agent."
if verbose:
verbose_response = "\n"
verbose_response = "\n Below are conversation details inside the Agent: \n"
for post in response_round.post_list:
message = f"{post.send_from} -> {post.send_to}: {post.message}"
verbose_response += f"{message}\n"
Expand Down Expand Up @@ -69,8 +69,14 @@ def auto_evaluate_for_taskweaver(
for command in pre_command:
# run the command
# subprocess.run(command, shell=True)
result = subprocess.check_output(command.split(" "), stderr=subprocess.STDOUT)
print(result)
result = subprocess.run(command, shell=True, text=True, stdout=subprocess.PIPE, stderr=subprocess.STDOUT)
# result = subprocess.check_output(command.split(" "), stderr=subprocess.STDOUT)
if result.returncode == 0:
print("Precommand executed successfully")
print(result.stdout)
else:
print("Command failed")
print(result.stderr)

taskweaver_vuser = TaskWeaverVirtualUser(task_description, app_dir, config_var)
taskweaver_evaluator = Evaluator()
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -106,7 +106,10 @@ def reply(
)

try:
llm_response = json.loads(llm_response["content"])
response_content = llm_response["content"]
if response_content.startswith("```json"):
response_content = response_content[7:-3]
llm_response = json.loads(response_content)
output_size = self.tracing.count_tokens(llm_response)
self.tracing.set_span_attribute("output_size", output_size)
self.tracing.add_prompt_size(
Expand Down
2 changes: 1 addition & 1 deletion taskweaver/planner/planner_prompt.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ instruction_template: |-
- Planner can assign different subtasks to different Workers, but each subtask should be assigned to only one Worker.
- Planner should try the best to complete the tasks before asking the User for additional information.
- Planner should refine or change the plan according to the replies of the Workers or the new requests of User.
- Planner should ignore the permission or access issues assuming Workers can handle them.
- Planner should ignore the permission or file access issues since Workers are powerful and can handle them.
- Planner needs to inform Workers on the User's request and the current step.
- Planner must reject the User's request if it contains potential security risks or illegal activities.
Expand Down
123 changes: 123 additions & 0 deletions website/blog/plugin.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,123 @@
# Plugins In-Depth

_**Pre-requisites**: Please refer to the [Introduction](/docs/plugin/plugin_intro) and the [Plugin Development](/docs/plugin/how_to_develop_a_new_plugin)
pages for a better understanding of the plugin concept and its development process._

## Plugin Basics
In TaskWeaver, the plugins are the essential components to extend the functionality of the agent.
Specifically, a plugin is a piece of code wrapped in a class that can be called as a function by the agent in the generated code snippets.
The following is a simple example of a plugin that generates `n` random numbers:

```python
from taskweaver.plugin import Plugin, register_plugin

@register_plugin
class RandomGenerator(Plugin):
def __call__(self, n: int):
import random
return [random.randint(1, 100) for _ in range(n)]
```

In this example, the `RandomGenerator` class inherits the `Plugin` class and implements the `__call__` method, which means
it can be called as a function. What would be the function signature of the plugin?
It is defined in the associated YAML file. For example, the YAML file for the `RandomGenerator` plugin is as follows:

```yaml
name: random_generator
enabled: true
required: true
description: >-
This plugin generates n random numbers between 1 and 100.
examples: |-
result = random_generator(n=5)
parameters:
- name: n
type: int
required: true
description: >-
The number of random numbers to generate.
returns:
- name: result
type: list
description: >-
The list of random numbers.
```
The YAML file specifies the name, description, parameters, and return values of the plugin.
When the LLM generates the code snippets, it will use the information in the YAML file to generate the function signature.
We did not check the discrepancy between the function signature in the Python implementation and the YAML file.
So, it is important to keep them consistent.
The `examples` field is used to provide examples of how to use the plugin for the LLM.

## Configurations and States

Although the plugin is used as a function in the code snippets, it is more than a normal Python function.
The plugin can have its own configurations and states.
For example, the `RandomGenerator` plugin can have a configuration to specify the range of the random numbers.
The configurations can be set in the YAML file as follows:

```yaml
# the previous part of the YAML file
configurations:
- name: range
type: list
required: false
description: >-
The range of the random numbers.
default: [1, 100]
```
We did not show how to use the configurations in the plugin implementation,
which could be found in one of our sample plugins, namely [sql_pull_data](https://github.com/microsoft/TaskWeaver/blob/main/project/plugins/sql_pull_data.yaml).
Supporting configurations in the plugin is a powerful feature to make the plugin more flexible and reusable.
For example, we can have multiple YAML files pointing to the same Python implementation but with different configurations.
Read this [page](/docs/plugin/multi_yaml_single_impl) for more details. When TaskWeaver loads the plugins,
it will elaborate the YAML files and create the plugin objects with the configurations. Therefore, two plugins with the same Python implementation
but different configurations are actually different objects in memory.
That is why different plugins can have different states, and this is especially helpful when the plugin needs
to maintain some states across different calls. Consider the example of the `sql_pull_data` sample plugin, which has the following
code snippet:

```python
@register_plugin
class SqlPullData(Plugin):
db = None
def __call__(self, query: str):
...
if self.db is None:
self.db = SQLDatabase.from_uri(self.config.get("sqlite_db_path"))
```
In the example above, the `SqlPullData` plugin maintains a database connection across different calls.
If we design the plugin to be a stateless normal Python function, we would need to establish a new connection for each call,
which is inefficient and not necessary.

## The Plugin Lifecycle

The plugin lifecycle is the process of how the plugin is loaded, initialized, and called by the agent.
When TaskWeaver starts, it goes through all the plugin configuration files in the `plugins` directory
and creates the plugin entries in the memory. The Python implementation of the plugin is not loaded at this stage.
When the agent generates the code snippets, it will call the plugin by the name specified in the YAML file,
and fill in the function signature based on the information in the YAML file.

The plugin is loaded and initialized when the code executor executes the code snippets for the first time
in a session.
The plugin is initialized with the configurations specified in the YAML file.
Although we have the [feature](/docs/advanced/plugin_selection) to dynamically select the plugins in the LLM, all the plugins are loaded
no matter whether they are used in the current conversation round. The only way of controlling the plugin loading is to
enable or disable the plugin in the YAML file.
In theory, the plugins can be configured separately for different sessions.
For example, when a user starts a new session, we can load a different set of plugins based on the user's profile.
But this feature is **not** supported in TaskWeaver yet.

The plugin is called when the agent executes the code snippets. The plugin can maintain states across different calls,
which has been discussed in the previous section. As each session is associated with a Jupyter kernel,
the plugin objects are created in the kernel memory and can be accessed across different code snippets, from different code cells,
in the same session.
When the session is closed, the plugin objects are also destroyed with the kernel.

## Conclusion
In this page, we discussed the basics of the plugin in TaskWeaver, including the plugin implementation, the YAML file,
the configurations, and the states. We also introduced the plugin lifecycle, which is the process of how the plugin is loaded, initialized, and called by the agent.
The plugin is a powerful component in TaskWeaver to extend the functionality of the agent.

0 comments on commit a188b96

Please sign in to comment.