Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Soft reboot Agents & Tools #29396

Closed
wants to merge 68 commits into from
Closed

Conversation

aymeric-roucher
Copy link
Contributor

What does this PR do?

Following up on the discovery that Open Source models can be a competitive engine for agent setups (cf this blog post), this PR starts from our previous efforts on Agents to push further.

The driving idea of this change is to leave the user more freedom and clarity in creating agents: so we aim for small building blocks rather than monolithic classes, and will strive for simplicity.

We will enable sharing our Agents & Tools on the Hub, thus enabling efficient improvement by the community.

Main changes:

  • Previous Agent classes HfAgent, OpenaiAgent, ... depended on the underlying LLM used as an engine. Although this ensured that the LLMs would be properly initialized, it introduced a black-box aspect that we remove here: we decide to resort to a simple llm_callable that the user has the responsability - thus the freedom - to initialize as they please.
  • We keep basing the implementation on a Tool class and an Agent class, but propose two agents that the user can run:
    • CodeAgent acts in one shot: generates all the necessary code to solve the task, then executes it all at once.
    • ReactAgent acts step by step: each step consists of one thought, then one function execution.

This is built with @Jofthomas and @cyril-k.

@LysandreJik @pacman100 the PR is still WIP: you're welcome to give your inputs! 🤗

For instance we still have the following questions:

  • how do we constrain the llm_callable to raise informative errors if improperly initialized (for instance raise an error if it returns another type than a string) helping the user correct it, while still leaving them enough freedom to choose this llm as they want?
  • which tests should we add: use remote tools ? (this might take more work since the push_to_hub method for tools seems broken)
  • should we already add the support for tools running online (useful for instance for text-to-img tools) or should that be for a later PR?
  • how can we unite more of the CodeAgent and ReactAgent's run methods, to put as much of it as we can in the base Agent class?

@aymeric-roucher
Copy link
Contributor Author

@LysandreJik this is ready for review! 🤗

@LysandreJik
Copy link
Member

The rationale for rebuilding the python code interpreter from scratch was that we wanted the Agent to only chain tools, and that's it. Anything else might have ended up a security risk so we wanted this to be very limited by design.

I'll take a look at the PR, thanks!

Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your PR! I'm not answering your questions (yet), but here's an initial review to get it in a better state. If I understand correctly, this is the gist of your changes:


You migrated the agent to a CodeAgent and kept usage mostly unchanged but refactored (introduction of extract_action, move of clean_code_for_run at a module-level. You kept our python interpreter for that task, but added binary operations.

These changes sound good to me, the resulting code is quite clean. Let's indeed see how the run/step methods could eventually be abstracted a bit more together, but it's not a blocker for now.


You added a ReactAgent that follows the ReAct framework.

Here as well the code is clear, I've left a few comments. Something which becomes apparent when reading the ReactAgent class is how verbose the console is going to be given the logs present. I've left a comment regarding how to reduce the amount of logs being present in the console, but additionally to that, I wonder if we can, instead of outputting everything to the console, keep some kind of record of everything happening to have a short summary afterwards.

Something like:

task_output, output_summary = agent.run(task, output_summary=True)

print(output_summary)
# ReactAgent performed task "Task: draw rivers and lakes"
# Split the work in the following tasks:
# - Task 1
#     - Tool chosen: X
#     - Failures: [failure 0, failure 1]
#     - Succesfully returned: Y
# - Task 2
#    -  Tool chosen Z
# [...]

IMO there are a few things we could do to build upon this rework:

  • We need a clear documentation regarding the difference between CodeAgent and ReactAgent and can take this opportunity to explain simply what is the ReactFramework. As we continue building on this project, having a nice documentation providing an overview of the different kinds of agents we're putting in the repo would be nice.
  • We have a suite to evaluate agents on small problems: evaluate_agent.py. We used this to test out new model releases and to quickly see the effect of our changes. Did you try and re-use it with your changes here? I'd be quite curious to see the resulting scores.

# See the License for the specific language governing permissions and
# limitations under the License.
import math
import numexpr
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs to be defined as a soft dependency with an explicit error-out in case of missing dep

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If absolutely necessary you can add the dependency to the agents extra here https://github.com/huggingface/transformers/blob/main/setup.py#L411-L413

Comment on lines 97 to 99
for attr, expected_type in required_attributes.items():
if attr not in dct or not isinstance(dct[attr], expected_type):
raise TypeError(f"{attr} must exist and be of type {expected_type.__name__}")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like too hard a check, I'm getting a failure in python 3.8 (which is still supported) and 3.10.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this seems to hard error out with remotely-defined tools; it's fine to log an info mentioning it and to expect weird behavior, but we shouldn't error out locally for a ill-defined remote file

)
self._toolbox["final_answer"] = FinalAnswerTool()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than manually adding the tool here, I'd be in favor of the toolbox having a small abstraction layer allowing tool addition/removal/replacement. The toolbox can be passed as a parameter and here we're doing a blind replace of a tool that might exist otherwise. Having a finer abstraction layer would ensure we don't override tools that we don't want to override (and would allow users to dynamically build their toolboxes rather than touching a private attribute to do so).

self.tool_description_template = tool_description_template if tool_description_template else self.default_tool_description_template
self.additional_args = additional_args
self.max_iterations = max_iterations
self.log = lambda x: print(to_text(x))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice to have a .log method; I'd be in favor of reusing the logging mechanism to output logs of varying levels of information.

For example I would put the prompt being printed at an info level (maybe even debug, given its length) while the tool selection and command to be run at a warning (as directly necessary for the end user to validate that the agent isn't causing mayhem on their machine).

You already have the logger instantiated at the top of the file so you can proceed with logger.warning/logger.info/logger.debug/etc.

Feel free to also add a prefix to the messages by defining a formatter if you want to differentiate from other library logs.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@LysandreJik thank you, we will use the transformer's logging mechanism as default.
We would need to log these:
- agent rationale (can be done with logger.info)
- bugs (logger.debug)
- tool call
- tool output
So tool calls and outputs should be logged differently: maybe we can enrich the default transformer's logging to do this?
On longer term we think of building a Gradio UI that would start on agent run() launch and lets you explore all steps taken by your agent. So this could be a drop-in replacement for the defaultlogger.

result = result[: -len(stop_seq)]
return result

self.memory = [self.system_prompt]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If memory is accessible by the user (public attribute), let's define and create it at instance initialization. If not, let's put it to private.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's good if it's public!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already created at instance initialization in Agent: the idea here was that run should reset the agent's memory, since the philosophy is "1 agent = 1 task", and run() should solve the task at hand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But we could change run() to not reset the memory, depends on the philosophy we want to have here: should an agent be defined just for (1 fixed task, 1 memory stream, 1 fixed toolbox), or could all of these be modified at any time at the risk of less clarity?

Comment on lines +434 to +435
self.task = task
task_message = f"Task: {self.task}"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if self.task already begins with Task: ? 😁

Comment on lines 345 to 348
prompt = self.system_prompt + f"\nTask: {task}"
llm_output = self.llm_callable(prompt, stop=["Task:"])
self.log("====Executing with this prompt====")
self.log(prompt)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You were saying that having clear visibility of the prompt would be very useful for these classes; what about setting it as an attribute (self.prompt) in order to have it easily obtainable for end-users?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great idea! Done here

src/transformers/tools/agents.py Outdated Show resolved Hide resolved


@staticmethod
def from_langchain(langchain_tool):
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Jofthomas what do you think of moving the method definition inside the class like this? (I copied the existing from_gradio method)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

seems fine to me :)

@aymeric-roucher
Copy link
Contributor Author

@LysandreJik after my 3 commits this afternoon, all the discussions we had should be solved on the code side: there's only the doc to do in a later PR!

@aymeric-roucher aymeric-roucher force-pushed the main branch 2 times, most recently from c12691a to 0aba8c1 Compare March 22, 2024 13:58
Copy link
Member

@LysandreJik LysandreJik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to look great! Given how the code you're touching here doesn't evolve at all, I wonder if we shouldn't keep this branch independent from huggingface/main and merge smaller PRs into it now that we've got a great base.

@aymeric-roucher
Copy link
Contributor Author

PR Closed since it's merged in agents branch with #29931.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants