Skip to content

Commit

Permalink
Merge pull request #95 from liminghao1630/main
Browse files Browse the repository at this point in the history
Release API-Bank training data
  • Loading branch information
huybery authored Nov 17, 2023
2 parents fe86237 + f30ccf2 commit 11f39fe
Show file tree
Hide file tree
Showing 60 changed files with 2,286 additions and 98 deletions.
24 changes: 13 additions & 11 deletions api-bank/README.md
Original file line number Diff line number Diff line change
@@ -1,25 +1,26 @@
# API-Bank: A Benchmark for Tool-Augmented LLMs
Minghao Li, Feifan Song, Bowen Yu, Haiyang Yu, Zhoujun Li, Fei Huang, Yongbin Li
# API-Bank: A Comprehensive Benchmark for Tool-Augmented LLMs

Minghao Li, Yingxiu Zhao, Bowen Yu, Feifan Song, Hangyu Li, Haiyang Yu, Zhoujun Li, Fei Huang, Yongbin Li

arXiv: [[Abstract]](https://arxiv.org/abs/2304.08244)/[[PDF]](https://arxiv.org/pdf/2304.08244.pdf)
<!-- PDF: [API-Bank-arxiv-version.pdf](API-Bank-arxiv-version.pdf)
-->


## News
- **The Lynx model is released on [HuggingFace Hub](https://huggingface.co/liminghao1630/Lynx-7b).**
- **API-Bank is accepted by EMNLP 2023.**
- **The code and data of API-Bank have been released.**

## Abstract

Recent research has shown that Large Language Models (LLMs) can utilize external tools to improve their contextual processing abilities, moving away from the pure language modeling paradigm and paving the way for Artificial General Intelligence. Despite this, there has been a lack of systematic evaluation to demonstrate the efficacy of LLMs using tools to respond to human instructions. This paper presents API-Bank, the first benchmark tailored for Tool-Augmented LLMs. API-Bank includes 53 commonly used API tools, a complete Tool-Augmented LLM workflow, and 264 annotated dialogues that encompass a total of 568 API calls. These resources have been designed to thoroughly evaluate LLMs' ability to plan step-by-step API calls, retrieve relevant APIs, and correctly execute API calls to meet human needs. The experimental results show that GPT-3.5 emerges the ability to use the tools relative to GPT3, while GPT-4 has stronger planning performance. Nevertheless, there remains considerable scope for further improvement when compared to human performance. Additionally, detailed error analysis and case studies demonstrate the feasibility of Tool-Augmented LLMs for daily use, as well as the primary challenges that future research needs to address.
Recent research has demonstrated that Large Language Models (LLMs) can enhance their capabilities by utilizing external tools. However, three pivotal questions remain unanswered: (1) How effective are current LLMs in utilizing tools? (2) How can we enhance LLMs' ability to utilize tools? (3) What obstacles need to be overcome to leverage tools? To address these questions, we introduce API-Bank, a groundbreaking benchmark, specifically designed for tool-augmented LLMs. For the first question, we develop a runnable evaluation system consisting of 73 API tools. We annotate 314 tool-use dialogues with 753 API calls to assess the existing LLMs' capabilities in planning, retrieving, and calling APIs. For the second question, we construct a comprehensive training set containing 1,888 tool-use dialogues from 2,138 APIs spanning 1,000 distinct domains. Using this dataset, we train Lynx, a tool-augmented LLM initialized from Alpaca. Experimental results demonstrate that GPT-3.5 exhibits improved tool utilization compared to GPT-3, while GPT-4 excels in planning. However, there is still significant potential for further improvement. Moreover, Lynx surpasses Alpaca's tool utilization performance by more than 26 pts and approaches the effectiveness of GPT-3.5. Through error analysis, we highlight the key challenges for future research in this field to answer the third question.

## Tool-Augmented LLMs Paradigm
## Multi-Agent Dataset Synthesis

![Paradigm](https://cdn.jsdelivr.net/gh/liminghao1630/auxiliary_use/figures/flowchart.png)
![multiagent](./figures/multi-agent.png)

## System Design
## Evaluation Tasks

![System](https://cdn.jsdelivr.net/gh/liminghao1630/auxiliary_use/figures/system.png)
![ability](./figures/three_ability.png)

## Demo
As far as we know, there is a conflict between the dependencies of the `googletrans` package and the dependencies of the `gradio` package, which may cause the demo not to run properly. There is no good solution, you can uninstall `googletrans` first when using the demo.
Expand Down Expand Up @@ -50,8 +51,9 @@ JsDelivr: https://cdn.jsdelivr.net/gh/liminghao1630/auxiliary_use/gpt-3.5-demo.g

## Evaluation

The conversation data of level-1 and level-2 are stored in the `lv1-lv2-samples` directory, please follow the code in `evaluator.py` to design the evaluation script.
The evaluation of level-3 needs to be done manually, you can use `simulator.py` or `demo.py` for testing.
The datasets are released on [HuggingFace Hub](https://huggingface.co/datasets/liminghao1630/API-Bank).
The conversation data of level-1 and level-2 are stored in the `lv1-lv2-samples` directory or `test-data`, please follow the code in `evaluator.py`/`evaluator_by_json.py` to design the evaluation script.
The evaluation of level-3 requires `lv3_evaluator.py`.



Expand Down
10 changes: 10 additions & 0 deletions api-bank/api_call_extraction.py
Original file line number Diff line number Diff line change
@@ -1,5 +1,8 @@
import re

def fn(**kwargs):
return kwargs

def get_api_call(model_output):
api_call_pattern = r"\[(\w+)\((.*)\)\]"
api_call_pattern = re.compile(api_call_pattern)
Expand All @@ -16,6 +19,13 @@ def parse_api_call(text):
api_name = match.group(1)
params = match.group(2)

# params = params.replace('\'[', '[')
# params = params.replace(']\'', ']')
# params = params.replace('\'{', '{')
# params = params.replace('}\'', '}')

# param_dict = eval('fn(' + params + ')')

param_pattern = r"(\w+)\s*=\s*['\"](.+?)['\"]|(\w+)\s*=\s*(\[.*\])|(\w+)\s*=\s*(\w+)"
param_dict = {}
for m in re.finditer(param_pattern, params):
Expand Down
2 changes: 1 addition & 1 deletion api-bank/apis/add_agenda.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@


class AddAgenda(API):
description = "The API for adding a schedule item includes parameters for token, content, time, and location."
description = "The API for adding a agenda item includes content, time and location."
input_parameters = {
'token': {'type': 'str', 'description': "User's token."},
'content': {'type': 'str', 'description': 'The content of the agenda.'},
Expand Down
2 changes: 1 addition & 1 deletion api-bank/apis/add_alarm.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@


class AddAlarm(API):
description = "The API for setting an alarm includes a parameter for the time."
description = "The API for setting an alarm includes a parameter for the alarm time."
input_parameters = {
'token': {'type': 'str', 'description': "User's token."},
'time': {'type': 'str', 'description': 'The time for alarm. Format: %Y-%m-%d %H:%M:%S'}
Expand Down
8 changes: 2 additions & 6 deletions api-bank/apis/add_meeting.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,8 @@


class AddMeeting(API):
description = "This API allows users to make a reservation for a meeting and store the meeting information in the database." \
"Function:" \
"Allow users to make a reservation for a meeting." \
"Exception Handling:" \
"1. If the reservation is successful, return a success message." \
"2. If the reservation fails, return a corresponding error message."

description = "This API allows users to make a reservation for a meeting and store the meeting information (e.g., topic, time, location, attendees) in the database."
input_parameters = {
'token': {'type': 'str', 'description': "User's token."},
'meeting_topic': {'type': 'str', 'description': 'The title of the meeting, no more than 50 characters.'},
Expand Down
7 changes: 2 additions & 5 deletions api-bank/apis/add_reminder.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,8 @@


class AddReminder(API):
description = "Add a reminder API that takes three parameters - 'token','content' and 'time'. " \
"The 'token' parameter refers to the user's token " \
"and the 'content' parameter refers to the description of the reminder " \
"and the 'time' parameter specifies the time at which the reminder " \
"should be triggered."

description = "The API for adding a reminder item includes content and time."
input_parameters = {
'token': {'type': 'str', 'description': "User's token."},
'content': {'type': 'str', 'description': 'The content of the conference.'},
Expand Down
8 changes: 2 additions & 6 deletions api-bank/apis/delete_meeting.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,8 @@


class DeleteMeeting(API):
description = "This API allows users to delete a reservation for a meeting and remove the meeting information in the database." \
"Function:" \
"Delete user's reservation for a meeting." \
"Exception Handling:" \
"1. If the deletion is successful, return a success message." \
"2. If the deletion fails, return a corresponding error message."

description = "This API allows users to delete a reservation for a meeting and remove the meeting information in the database."
input_parameters = {
'token': {'type': 'str', 'description': "User's token."},
'meeting_topic': {'type': 'str', 'description': 'The title of the meeting, no more than 50 characters.'},
Expand Down
6 changes: 2 additions & 4 deletions api-bank/apis/delete_reminder.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,8 @@


class DeleteReminder(API):
description = "Delete a reminder API that takes three parameters - 'token','content' and 'time'. " \
"The 'token' parameter refers to the user's token " \
"and the 'content' parameter refers to the description of the reminder " \
"and the 'time' parameter specifies the time at which the reminder should be triggered."

description = "The API for deleting a reminder item includes content and time."
input_parameters = {
'token': {'type': 'str', 'description': "User's token."},
'content': {'type': 'str', 'description': 'The content of the conference.'},
Expand Down
3 changes: 2 additions & 1 deletion api-bank/apis/delete_scene.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
from apis.api import API

class DeleteScene(API):
description = 'This API deletes a scene.'

description = 'This API deletes a scene by its name.'
input_parameters = {
"name": {'type': 'str', 'description': 'The name of the scene.'},
}
Expand Down
3 changes: 1 addition & 2 deletions api-bank/apis/dictionary.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import requests

class Dictionary(API):
description = 'This API searches for a given keyword.'
description = 'This API searches the dictionary for a given keyword.'
input_parameters = {
"keyword": {'type': 'str', 'description': 'The keyword to search.'},
}
Expand Down Expand Up @@ -85,4 +85,3 @@ def check_api_call_correctness(self, response, groundtruth) -> bool:
if response['exception'] != groundtruth['exception']:
return False
return True

3 changes: 2 additions & 1 deletion api-bank/apis/emergency_knowledge.py
Original file line number Diff line number Diff line change
@@ -1,7 +1,8 @@
from apis.api import API

class EmergencyKnowledge(API):
description = 'This API searches for a given symptom.'

description = 'This API searches for a given symptom for emergency knowledge.'
input_parameters = {
"symptom": {'type': 'str', 'description': 'The symptom to search.'},
}
Expand Down
2 changes: 1 addition & 1 deletion api-bank/apis/get_user_token.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import os
class GetUserToken(API):

description = 'Get the user token.'
description = 'Get the user token by username and password.'
input_parameters = {
'username': {'type': 'str', 'description': 'The username of the user.'},
'password': {'type': 'str', 'description': 'The password of the user.'},
Expand Down
2 changes: 1 addition & 1 deletion api-bank/apis/modify_agenda.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@


class ModifyAgenda(API):
description = "The API for modifying a schedule item includes parameters for token, content, time, and location."
description = "The API for modifying a schedule item includes parameters for content, time, and location."
input_parameters = {
'token': {'type': 'str', 'description': "User's token."},
'content': {'type': 'str', 'description': 'The content of the agenda.'},
Expand Down
8 changes: 2 additions & 6 deletions api-bank/apis/modify_meeting.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,8 @@


class ModifyMeeting(API):
description = "This API allows users to modify a reservation for a meeting" \
"Function:" \
"Delete user's reservation for a meeting." \
"Exception Handling:" \
"1. If the modification is successful, return a success message." \
"2. If the modification fails, return a corresponding error message."

description = "This API allows users to modify a reservation for a meeting"
input_parameters = {
'token': {'type': 'str', 'description': "User's token."},
'meeting_topic': {'type': 'str', 'description': 'The title of the meeting, no more than 50 characters.'},
Expand Down
2 changes: 1 addition & 1 deletion api-bank/apis/modify_password.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@

class ModifyPassword(API):

description = 'Modify the password of an account.'
description = 'The API for modifying the password of the account.'
input_parameters = {
'token': {'type': 'str', 'description': 'The token of the user.'},
'old_password': {'type': 'str', 'description': 'The old password of the user.'},
Expand Down
3 changes: 2 additions & 1 deletion api-bank/apis/modify_registration.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
import datetime

class ModifyRegistration(API):

description = 'This API modifies the registration of a patient given appointment ID.'
input_parameters = {
"appointment_id": {'type': 'str', 'description': 'The ID of appointment.'},
Expand Down Expand Up @@ -129,7 +130,7 @@ def check_api_call_correctness(self, response, groundtruth) -> bool:
Returns:
- correctness (bool): the correctness of the API call.
"""
response_appointment_id = response['input']['appointment_id']
response_appointment_id = str(response['input']['appointment_id'])
groundtruth_appointment_id = groundtruth['input']['appointment_id']
response_new_appointment_date = response['input']['new_appointment_date']
groundtruth_new_appointment_date = groundtruth['input']['new_appointment_date']
Expand Down
7 changes: 2 additions & 5 deletions api-bank/apis/modify_reminder.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,8 @@


class ModifyReminder(API):
description = "Modify a reminder API that takes three parameters - 'token','content' and 'time'. " \
"The 'token' parameter refers to the user's token " \
"and the 'content' parameter refers to the description of the reminder " \
"and the 'time' parameter specifies the time at which the reminder " \
"should be triggered."

description = "The API for deleting a reminder item includes content and time."
input_parameters = {
'token': {'type': 'str', 'description': "User's token."},
'content': {'type': 'str', 'description': 'The content of the conference.'},
Expand Down
6 changes: 1 addition & 5 deletions api-bank/apis/open_bank_account.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,7 @@


class OpenBankAccount(API):
description = "This is an API for opening a bank account with three required parameters:" \
" account (string), password (string), and name (string). The API creates a new " \
"account with the specified account identifier, password, and account holder's name. " \
"If an account with the same identifier already exists, the API will return an error message. " \
"If the account is successfully created, the API will return a success message."
description = "This is an API for opening a bank account for a user, given the account, password and name."
input_parameters = {
'account': {'type': 'str', 'description': 'The account for the user.'},
'password': {'type': 'str', 'description': 'The password.'},
Expand Down
2 changes: 1 addition & 1 deletion api-bank/apis/query_history_today.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import datetime

class QueryHistoryToday(API):
description = 'This API queries the history of a given user today.'
description = 'This API queries the history of the given date.'
input_parameters = {
'date': {'type': 'str', 'description': 'The date of the history. Format: %m-%d'},
}
Expand Down
8 changes: 2 additions & 6 deletions api-bank/apis/query_meeting.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,12 +5,8 @@


class QueryMeeting(API):
description = "This API allows users to query a reservation for a meeting." \
"Function:" \
"Query infomation for a meeting." \
"Exception Handling:" \
"1. If the Query is successful, return a meeting infomation with json." \
"2. If the Query fails, return a error message."

description = "This API allows users to query the information of a meeting."
input_parameters = {
'token': {'type': 'str', 'description': "User's token."},
'meeting_topic': {'type': 'str', 'description': 'The title of the meeting, no more than 50 characters.'},
Expand Down
3 changes: 2 additions & 1 deletion api-bank/apis/query_registration.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
import datetime

class QueryRegistration(API):
description = 'This API queries the registration of a patient given patient ID.'

description = 'This API queries the registration of a patient, given patient ID.'
input_parameters = {
"patient_name": {'type': 'str', 'description': 'The name of patient.'},
"date": {'type': 'str', 'description': 'The date of appointment. Format be like %Y-%m-%d'},
Expand Down
4 changes: 2 additions & 2 deletions api-bank/apis/query_reminder.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@


class QueryReminder(API):
description = "Query a reminder API that takes three parameters - 'token','content' and 'time'. " \
"The API used to help user query a reminder. If the reminder exists, the API will return the reminder information. "

description = "The API for querying a reminder item includes content and time."
input_parameters = {
'token': {'type': 'str', 'description': "User's token."},
'content': {'type': 'str', 'description': 'The content of the reminder.'},
Expand Down
3 changes: 2 additions & 1 deletion api-bank/apis/query_stock.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
import datetime

class QueryStock(API):
description = 'This API queries the stock price of a given stock.'

description = 'This API queries the stock price of a given stock code and date.'
input_parameters = {
"stock_code": {'type': 'str', 'description': 'The stock code of the given stock.'},
"date": {'type': 'str', 'description': 'The date of the stock price. Format: %Y-%m-%d'}
Expand Down
17 changes: 9 additions & 8 deletions api-bank/apis/record_health_data.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,8 @@
import datetime

class RecordHealthData(API):
description = 'This API records the health history of a patient given user ID, time and health data.'

description = 'This API records the health data of a user.'
input_parameters = {
"user_id": {'type': 'str', 'description': 'The ID of user.'},
"time": {'type': 'str', 'description': 'The time of health data. Format: %Y-%m-%d %H:%M:%S'},
Expand Down Expand Up @@ -126,7 +127,7 @@ def check_api_call_correctness(self, response, groundtruth) -> bool:
Returns:
- correctness (bool): the correctness of the API call.
"""
response_user_id = response['input']['user_id']
response_user_id = str(response['input']['user_id'])
groundtruth_user_id = groundtruth['input']['user_id']
response_time = response['input']['time']
groundtruth_time = groundtruth['input']['time']
Expand All @@ -137,18 +138,18 @@ def check_api_call_correctness(self, response, groundtruth) -> bool:
groundtruth_user_id = groundtruth_user_id.upper().strip()
response_time = self.format_check(response_time)
groundtruth_time = self.format_check(groundtruth_time)
response_health_data = [{"name":str(i["name"]),"value":str(i["value"])} for i in response_health_data]
groundtruth_health_data = [{"name":str(i["name"]),"value":str(i["value"])} for i in groundtruth_health_data]
response_health_data.sort(key=lambda x: str(x))
groundtruth_health_data.sort(key=lambda x: str(x))
# response_health_data = [{"name":str(i["name"]),"value":str(i["value"])} for i in response_health_data]
# groundtruth_health_data = [{"name":str(i["name"]),"value":str(i["value"])} for i in groundtruth_health_data]
# response_health_data.sort(key=lambda x: str(x))
# groundtruth_health_data.sort(key=lambda x: str(x))


if response_user_id != groundtruth_user_id:
return False
if response_time != groundtruth_time:
return False
if response_health_data != groundtruth_health_data:
return False
# if response_health_data != groundtruth_health_data:
# return False
if response['output'] != groundtruth['output']:
return False
if response['exception'] != groundtruth['exception']:
Expand Down
2 changes: 1 addition & 1 deletion api-bank/apis/register_user.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@

class RegisterUser(API):

description = 'Register a user.'
description = 'The API for registering a account, given the username, password and email.'
input_parameters = {
'username': {'type': 'str', 'description': 'The username of the user.'},
'password': {'type': 'str', 'description': 'The password of the user.'},
Expand Down
Loading

0 comments on commit 11f39fe

Please sign in to comment.