From 85817d98fb60977c97e3014196a462b732d2ed1a Mon Sep 17 00:00:00 2001 From: Steven Liu <59462357+stevhliu@users.noreply.github.com> Date: Thu, 8 Aug 2024 13:43:14 -0700 Subject: [PATCH] [docs] Translation guide (#32547) clarify --- docs/source/en/tasks/translation.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/source/en/tasks/translation.md b/docs/source/en/tasks/translation.md index e933fda461b1ae..bcbad0ba052c36 100644 --- a/docs/source/en/tasks/translation.md +++ b/docs/source/en/tasks/translation.md @@ -90,7 +90,7 @@ The next step is to load a T5 tokenizer to process the English-French language p The preprocessing function you want to create needs to: 1. Prefix the input with a prompt so T5 knows this is a translation task. Some models capable of multiple NLP tasks require prompting for specific tasks. -2. Tokenize the input (English) and target (French) separately because you can't tokenize French text with a tokenizer pretrained on an English vocabulary. +2. Set the target language (French) in the `text_target` parameter to ensure the tokenizer processes the target text correctly. If you don't set `text_target`, the tokenizer processes the target text as English. 3. Truncate sequences to be no longer than the maximum length set by the `max_length` parameter. ```py