Can I do SFT with dataset that includes tool usage with TorchTune? #1921

albertbou92 · 2024-10-30T00:08:02Z

albertbou92
Oct 30, 2024

Hello!

I am working on supervised fine-tuning (SFT) for Llama models using a chat dataset that includes tool calling in the OpenAI format.

I'm unsure if this specific setup is directly supported by TorchTune. I see that fine-tuning on a chat dataset without tool calling works well, and I also noticed that there is a role (e.g., ipython) intended for tool calling.

My questions are:

Is SFT on a chat dataset with tool calling supported? I assume it is.
Do I need to define a transform to adapt my data, which is already in the OpenAI SFT format for tool calling?
Are there any dataset examples or code references that could guide me with this setup?

Thanks a lot for any help!

Answered by RdoubleA

Oct 30, 2024

Yes, tool-calling is supported in SFT as long as the model tokenizer you are using supports it. A tool call would be Message(role="assistant", ipython=True) and the return from the tool call would be Message(role="ipython")

You will just need to ensure that your dataset gets translated to Messages correctly. You may need to make a custom message transform, using the torchtune.data.OpenAIToMessages as a starting point. We might need to update that class to ensure tool calls and tool returns are converted correctly, so please let us know if you have any trouble with this.

I've been meaning to add a dataset example with tool calls and tool returns but haven't gotten a chance to. What dataset…

View full answer

RdoubleA · 2024-10-30T01:07:04Z

RdoubleA
Oct 30, 2024
Collaborator

Yes, tool-calling is supported in SFT as long as the model tokenizer you are using supports it. A tool call would be Message(role="assistant", ipython=True) and the return from the tool call would be Message(role="ipython")

You will just need to ensure that your dataset gets translated to Messages correctly. You may need to make a custom message transform, using the torchtune.data.OpenAIToMessages as a starting point. We might need to update that class to ensure tool calls and tool returns are converted correctly, so please let us know if you have any trouble with this.

I've been meaning to add a dataset example with tool calls and tool returns but haven't gotten a chance to. What datasets have you been looking at in particular (if they're on HF)?

cc @joecummings whos been thinking about task-based dataset builders, maybe tool calling could be one?

5 replies

albertbou92 Oct 30, 2024
Author

Hi! Thank you very much for the quick reply. I will definitely give it a shot at adapting the torchtune.data.OpenAIToMessages transform for my use case. I am using a local custom dataset generated with https://github.com/Future-House/aviary, a library that allows to define language environments with tools to interact with language agents. But the dataset follows the same structure required by OpenAI to do SFT.

Just a couple of follow up questions:
1- How can I know if a tokenizer supports tool calling? Does for example the tokenizer of Llama-3.2-1B support it?
2- Is there an existing way to provide the tool definitions to the Agent? Or should I embed them into a system Message inside my transformer?

RdoubleA Oct 31, 2024
Collaborator

The Llama models should support it, and Qwen2.5 after the PR is merged. I think Gemma needs to be updated to support this, and I'm not sure if Phi3 does. It might be good to just have a table for all our models and indicate which type of data is supported (image, tool, etc) (cc @ebsmothers or @joecummings)
Great question. For now, you will need to add to the system message. But I think this experience can be better so users can "register" available tools.

We haven't thoroughly tried out e2e tool fine-tuning and integrate with tool environments like the one you linked. Do you know which ones are popular and commonly used? If you do end up working on this on a fork or a separate repo using torchtune, would love to check it out and see which components might be general enough to add to the library to improve the tool fine-tuning experience. Please let us know if there's any pain points in the library that block you or frustrate you. Also happy to provide any guidance, you can find me on the discord server as RdoubleA if you want to reach out.

albertbou92 Oct 31, 2024
Author

Got it! thanks once again for the quick reply.

I was able to put together a Transform based on torchtune.data.OpenAIToMessages as suggested, create a dataset with trajectories obtained from the repo I linked and train LLama-3.2-Instruct on it!

For now the only issue I have found is this check.
https://github.com/pytorch/torchtune/blob/main/torchtune/datasets/_sft.py#L123

I can adapt my data to the required format (as I did in my first test). However, I would prefer to be able to relax a bit this constraint and tell the SFTDataset not to enforce it. For example, in case I want to have 2 consecutive messages with the same role . I could do a PR to optionally avoid that check, while keeping it as default. If that makes sense.

If I come up with something general enough for e2e tool fine tuning I am happy to do another PR!

RdoubleA Oct 31, 2024
Collaborator

Hm yeah the message validation was created quite a while ago and the expected structure of a "conversation" has changed since then so it may be worth a refresh. Although, if you have two consecutive messages with the same role, why not just combine them into one message? Some things like role headers and end of message tokens might get added in between the messages (this may or may not be a bad thing depending on what the model expects).

albertbou92 Nov 6, 2024
Author

ok, I will do that for now. Thanks a lot for all the help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Can I do SFT with dataset that includes tool usage with TorchTune? #1921

{{title}}

Replies: 1 comment 5 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

Can I do SFT with dataset that includes tool usage with TorchTune? #1921

albertbou92 Oct 30, 2024

Replies: 1 comment · 5 replies

RdoubleA Oct 30, 2024 Collaborator

albertbou92 Oct 30, 2024 Author

RdoubleA Oct 31, 2024 Collaborator

albertbou92 Oct 31, 2024 Author

RdoubleA Oct 31, 2024 Collaborator

albertbou92 Nov 6, 2024 Author

albertbou92
Oct 30, 2024

Replies: 1 comment 5 replies

RdoubleA
Oct 30, 2024
Collaborator

albertbou92 Oct 30, 2024
Author

RdoubleA Oct 31, 2024
Collaborator

albertbou92 Oct 31, 2024
Author

RdoubleA Oct 31, 2024
Collaborator

albertbou92 Nov 6, 2024
Author