-
Notifications
You must be signed in to change notification settings - Fork 27.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Soft reboot Agents & Tools #29396
Soft reboot Agents & Tools #29396
Conversation
…ching the toolbox at init
@LysandreJik this is ready for review! 🤗 |
The rationale for rebuilding the python code interpreter from scratch was that we wanted the Agent to only chain tools, and that's it. Anything else might have ended up a security risk so we wanted this to be very limited by design. I'll take a look at the PR, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your PR! I'm not answering your questions (yet), but here's an initial review to get it in a better state. If I understand correctly, this is the gist of your changes:
You migrated the agent to a CodeAgent
and kept usage mostly unchanged but refactored (introduction of extract_action
, move of clean_code_for_run
at a module-level. You kept our python interpreter for that task, but added binary operations.
These changes sound good to me, the resulting code is quite clean. Let's indeed see how the run
/step
methods could eventually be abstracted a bit more together, but it's not a blocker for now.
You added a ReactAgent
that follows the ReAct framework.
Here as well the code is clear, I've left a few comments. Something which becomes apparent when reading the ReactAgent
class is how verbose the console is going to be given the logs present. I've left a comment regarding how to reduce the amount of logs being present in the console, but additionally to that, I wonder if we can, instead of outputting everything to the console, keep some kind of record of everything happening to have a short summary afterwards.
Something like:
task_output, output_summary = agent.run(task, output_summary=True)
print(output_summary)
# ReactAgent performed task "Task: draw rivers and lakes"
# Split the work in the following tasks:
# - Task 1
# - Tool chosen: X
# - Failures: [failure 0, failure 1]
# - Succesfully returned: Y
# - Task 2
# - Tool chosen Z
# [...]
IMO there are a few things we could do to build upon this rework:
- We need a clear documentation regarding the difference between
CodeAgent
andReactAgent
and can take this opportunity to explain simply what is theReactFramework
. As we continue building on this project, having a nice documentation providing an overview of the different kinds of agents we're putting in the repo would be nice. - We have a suite to evaluate agents on small problems: evaluate_agent.py. We used this to test out new model releases and to quickly see the effect of our changes. Did you try and re-use it with your changes here? I'd be quite curious to see the resulting scores.
# See the License for the specific language governing permissions and | ||
# limitations under the License. | ||
import math | ||
import numexpr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This needs to be defined as a soft dependency with an explicit error-out in case of missing dep
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If absolutely necessary you can add the dependency to the agents
extra here https://github.com/huggingface/transformers/blob/main/setup.py#L411-L413
src/transformers/tools/base.py
Outdated
for attr, expected_type in required_attributes.items(): | ||
if attr not in dct or not isinstance(dct[attr], expected_type): | ||
raise TypeError(f"{attr} must exist and be of type {expected_type.__name__}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like too hard a check, I'm getting a failure in python 3.8 (which is still supported) and 3.10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this seems to hard error out with remotely-defined tools; it's fine to log an info mentioning it and to expect weird behavior, but we shouldn't error out locally for a ill-defined remote file
src/transformers/tools/agents.py
Outdated
) | ||
self._toolbox["final_answer"] = FinalAnswerTool() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than manually adding the tool here, I'd be in favor of the toolbox having a small abstraction layer allowing tool addition/removal/replacement. The toolbox
can be passed as a parameter and here we're doing a blind replace of a tool that might exist otherwise. Having a finer abstraction layer would ensure we don't override tools that we don't want to override (and would allow users to dynamically build their toolboxes rather than touching a private attribute to do so).
src/transformers/tools/agents.py
Outdated
self.tool_description_template = tool_description_template if tool_description_template else self.default_tool_description_template | ||
self.additional_args = additional_args | ||
self.max_iterations = max_iterations | ||
self.log = lambda x: print(to_text(x)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice to have a .log
method; I'd be in favor of reusing the logging
mechanism to output logs of varying levels of information.
For example I would put the prompt being printed at an info
level (maybe even debug
, given its length) while the tool selection and command to be run at a warning
(as directly necessary for the end user to validate that the agent isn't causing mayhem on their machine).
You already have the logger instantiated at the top of the file so you can proceed with logger.warning
/logger.info
/logger.debug
/etc.
Feel free to also add a prefix to the messages by defining a formatter if you want to differentiate from other library logs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@LysandreJik thank you, we will use the transformer's logging
mechanism as default.
We would need to log these:
- agent rationale (can be done with logger.info
)
- bugs (logger.debug
)
- tool call
- tool output
So tool calls and outputs should be logged differently: maybe we can enrich the default transformer's logging
to do this?
On longer term we think of building a Gradio UI that would start on agent run()
launch and lets you explore all steps taken by your agent. So this could be a drop-in replacement for the defaultlogger
.
result = result[: -len(stop_seq)] | ||
return result | ||
|
||
self.memory = [self.system_prompt] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If memory is accessible by the user (public attribute), let's define and create it at instance initialization. If not, let's put it to private.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's good if it's public!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's already created at instance initialization in Agent
: the idea here was that run
should reset the agent's memory, since the philosophy is "1 agent = 1 task", and run()
should solve the task at hand.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But we could change run()
to not reset the memory, depends on the philosophy we want to have here: should an agent be defined just for (1 fixed task, 1 memory stream, 1 fixed toolbox)
, or could all of these be modified at any time at the risk of less clarity?
self.task = task | ||
task_message = f"Task: {self.task}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if self.task
already begins with Task:
? 😁
src/transformers/tools/agents.py
Outdated
prompt = self.system_prompt + f"\nTask: {task}" | ||
llm_output = self.llm_callable(prompt, stop=["Task:"]) | ||
self.log("====Executing with this prompt====") | ||
self.log(prompt) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You were saying that having clear visibility of the prompt
would be very useful for these classes; what about setting it as an attribute (self.prompt
) in order to have it easily obtainable for end-users?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great idea! Done here
Co-authored-by: Lysandre Debut <[email protected]>
|
||
|
||
@staticmethod | ||
def from_langchain(langchain_tool): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Jofthomas what do you think of moving the method definition inside the class like this? (I copied the existing from_gradio
method)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems fine to me :)
@LysandreJik after my 3 commits this afternoon, all the discussions we had should be solved on the code side: there's only the doc to do in a later PR! |
c12691a
to
0aba8c1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is starting to look great! Given how the code you're touching here doesn't evolve at all, I wonder if we shouldn't keep this branch independent from huggingface/main
and merge smaller PRs into it now that we've got a great base.
PR Closed since it's merged in |
What does this PR do?
Following up on the discovery that Open Source models can be a competitive engine for agent setups (cf this blog post), this PR starts from our previous efforts on Agents to push further.
The driving idea of this change is to leave the user more freedom and clarity in creating agents: so we aim for small building blocks rather than monolithic classes, and will strive for simplicity.
We will enable sharing our Agents & Tools on the Hub, thus enabling efficient improvement by the community.
Main changes:
HfAgent
,OpenaiAgent
, ... depended on the underlying LLM used as an engine. Although this ensured that the LLMs would be properly initialized, it introduced a black-box aspect that we remove here: we decide to resort to a simplellm_callable
that the user has the responsability - thus the freedom - to initialize as they please.Tool
class and anAgent
class, but propose two agents that the user can run:CodeAgent
acts in one shot: generates all the necessary code to solve the task, then executes it all at once.ReactAgent
acts step by step: each step consists of one thought, then one function execution.This is built with @Jofthomas and @cyril-k.
@LysandreJik @pacman100 the PR is still WIP: you're welcome to give your inputs! 🤗
For instance we still have the following questions:
llm_callable
to raise informative errors if improperly initialized (for instance raise an error if it returns another type than a string) helping the user correct it, while still leaving them enough freedom to choose this llm as they want?push_to_hub
method for tools seems broken)CodeAgent
andReactAgent
'srun
methods, to put as much of it as we can in the baseAgent
class?