Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug in whisper finetuning tutorial? "Multiple languages detected when trying to predict..." #28814

Closed
3 of 4 tasks
SethvdAxe opened this issue Feb 1, 2024 · 22 comments · Fixed by #29938
Closed
3 of 4 tasks

Comments

@SethvdAxe
Copy link

System Info

Transformers version: 4.38.0.dev0
Python version: Python3.10 venv (local)
Platform: MacOS Venture 13.5

Who can help?

@sanchit-gandhi

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

Thank you for the amazing whisper finetuning tutorial at: https://huggingface.co/blog/fine-tune-whisper

When I download the ipynb and run it locally it runs fine.

However, when I change a single line (the last line) from:

trainer.train()

to:

eval_results = trainer.evaluate()

I get the following error:

ValueError: Multiple languages detected when trying to predict the most likely target language for transcription.

Full error log:

{
	"name": "ValueError",
	"message": "Multiple languages detected when trying to predict the most likely target language for transcription. It is currently not supported to transcribe to different languages in a single batch. Please make sure to either force a single language by passing `language='...'` or make sure all input audio is of the same language.",
	"stack": "---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[21], line 1
----> 1 eval_results = trainer.evaluate()
      2 print(eval_results)

File ~some_path/venv/lib/python3.10/site-packages/transformers/trainer_seq2seq.py:166, in Seq2SeqTrainer.evaluate(self, eval_dataset, ignore_keys, metric_key_prefix, **gen_kwargs)
    164 self.gather_function = self.accelerator.gather
    165 self._gen_kwargs = gen_kwargs
--> 166 return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)

File ~some_path/venv/lib/python3.10/site-packages/transformers/trainer.py:3136, in Trainer.evaluate(self, eval_dataset, ignore_keys, metric_key_prefix)
   3133 start_time = time.time()
   3135 eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
-> 3136 output = eval_loop(
   3137     eval_dataloader,
   3138     description=\"Evaluation\",
   3139     # No point gathering the predictions if there are no metrics, otherwise we defer to
   3140     # self.args.prediction_loss_only
   3141     prediction_loss_only=True if self.compute_metrics is None else None,
   3142     ignore_keys=ignore_keys,
   3143     metric_key_prefix=metric_key_prefix,
   3144 )
   3146 total_batch_size = self.args.eval_batch_size * self.args.world_size
   3147 if f\"{metric_key_prefix}_jit_compilation_time\" in output.metrics:

File ~some_path/venv/lib/python3.10/site-packages/transformers/trainer.py:3325, in Trainer.evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
   3322         batch_size = observed_batch_size
   3324 # Prediction step
-> 3325 loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
   3326 main_input_name = getattr(self.model, \"main_input_name\", \"input_ids\")
   3327 inputs_decode = self._prepare_input(inputs[main_input_name]) if args.include_inputs_for_metrics else None

File ~some_path/venv/lib/python3.10/site-packages/transformers/trainer_seq2seq.py:296, in Seq2SeqTrainer.prediction_step(self, model, inputs, prediction_loss_only, ignore_keys, **gen_kwargs)
    288 if (
    289     \"labels\" in generation_inputs
    290     and \"decoder_input_ids\" in generation_inputs
    291     and generation_inputs[\"labels\"].shape == generation_inputs[\"decoder_input_ids\"].shape
    292 ):
    293     generation_inputs = {
    294         k: v for k, v in inputs.items() if k not in (\"decoder_input_ids\", \"decoder_attention_mask\")
    295     }
--> 296 generated_tokens = self.model.generate(**generation_inputs, **gen_kwargs)
    298 # Temporary hack to ensure the generation config is not initialized for each iteration of the evaluation loop
    299 # TODO: remove this hack when the legacy code that initializes generation_config from a model config is
    300 # removed in https://github.com/huggingface/transformers/blob/98d88b23f54e5a23e741833f1e973fdf600cc2c5/src/transformers/generation/utils.py#L1183
    301 if self.model.generation_config._from_model_config:

File ~some_path/venv/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py:533, in WhisperGenerationMixin.generate(self, input_features, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, return_timestamps, task, language, is_multilingual, prompt_ids, prompt_condition_type, condition_on_prev_tokens, temperature, compression_ratio_threshold, logprob_threshold, no_speech_threshold, num_segment_frames, attention_mask, time_precision, return_token_timestamps, return_segments, return_dict_in_generate, **kwargs)
    527 self._set_prompt_condition_type(
    528     generation_config=generation_config,
    529     prompt_condition_type=prompt_condition_type,
    530 )
    532 # pass self.config for backward compatibility
--> 533 init_tokens = self._retrieve_init_tokens(
    534     input_features,
    535     generation_config=generation_config,
    536     config=self.config,
    537     num_segment_frames=num_segment_frames,
    538     kwargs=kwargs,
    539 )
    540 # TODO(Sanchit) - passing `decoder_input_ids` is deprecated. One should use `prompt_ids` instead
    541 # This function should be be removed in v4.39
    542 self._check_decoder_input_ids(
    543     prompt_ids=prompt_ids, init_tokens=init_tokens, is_shortform=is_shortform, kwargs=kwargs
    544 )

File ~some_path/venv/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py:1166, in WhisperGenerationMixin._retrieve_init_tokens(self, input_features, generation_config, config, num_segment_frames, kwargs)
   1158 lang_ids = self.detect_language(
   1159     input_features=input_features,
   1160     encoder_outputs=kwargs.get(\"encoder_outputs\", None),
   1161     generation_config=generation_config,
   1162     num_segment_frames=num_segment_frames,
   1163 )
   1165 if torch.unique(lang_ids).shape[0] > 1:
-> 1166     raise ValueError(
   1167         \"Multiple languages detected when trying to predict the most likely target language for transcription. It is currently not supported to transcribe to different languages in a single batch. Please make sure to either force a single language by passing `language='...'` or make sure all input audio is of the same language.\"
   1168     )
   1170 lang_id = lang_ids[0].item()
   1172 # append or replace lang_id to init_tokens

ValueError: Multiple languages detected when trying to predict the most likely target language for transcription. It is currently not supported to transcribe to different languages in a single batch. Please make sure to either force a single language by passing `language='...'` or make sure all input audio is of the same language."
}

Is this expected behaviour? Thank you kindly in advance.

Expected behavior

A normal evaluation run to evaluate the performance of the model on the language before starting to train it.

@SethvdAxe
Copy link
Author

SethvdAxe commented Feb 1, 2024

Ok, can confirm that on 4.37.2 this bug does not appear.
Something to do with #28687 I guess?

@ArthurZucker
Copy link
Collaborator

cc @patrickvonplaten as well

@chicodespons
Copy link

I to have the same error. Verified my dataset, this is 1 language.

@patrickvonplaten
Copy link
Contributor

patrickvonplaten commented Feb 9, 2024

Sorry for being a bit late here. Yes this error is expected, we've recently changed the default behavior to language detection when not specifying which language is to be evaluated.

If you train your model on Hindi as shown in the notebook, can you make sure to pass:

- eval_results = trainer.evaluate()
+ eval_results = trainer.evaluate(language="hi")
Screenshot 2024-02-09 at 16 28 40

so that the model doesn't try to detect the language it has to transcribe?

@patrickvonplaten
Copy link
Contributor

@sanchit-gandhi we should probably also make sure to install accelerate in the notebook (newer versions of Transformes require accelerate for training) and I'd say we also pin transformers in the blog no? It's currently set to "main" of Transformers

@rishabhjain16
Copy link

I am getting a similar error during training. Any help is appreciated.

Screenshot 2024-02-12 at 12 17 32

@patrickvonplaten
Copy link
Contributor

patrickvonplaten commented Feb 12, 2024

Hey @rishabhjain16,

Ah yes indeed the training loop runs the evaluation loop inside and sadly doesn't let the user pass any generation key word params such as "language". You can however fix this easily by replacing the following cell in the notebook:

Screenshot 2024-02-12 at 19 15 02

with:

from transformers import WhisperForConditionalGeneration

model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
model.generation_config.language = "hi"  # define your language of choice here

and the training should work!

@rishabhjain16
Copy link

Hey @rishabhjain16,

Ah yes indeed the training loop runs the evaluation loop inside and sadly doesn't let the user pass any generation key word params such as "language". You can however fix this easily by replacing the following cell in the notebook:

Screenshot 2024-02-12 at 19 15 02 with:
from transformers import WhisperForConditionalGeneration

model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
model.generation_config.language = "hi"  # define your language of choice here

and the training should work!

Thank you @patrickvonplaten for getting back to me so quickly. I will give it a try.

@s0620013
Copy link

Hi,everyone
I have a problem with my program.

I added this program

`from transformers import WhisperForConditionalGeneration

model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
model.generation_config.language = "ja" # define your language of choice here`

Then,Erros occurred.
May you help me !

`---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in <cell line: 1>()
----> 1 trainer.train()

17 frames
/usr/local/lib/python3.10/dist-packages/datasets/utils/_dill.py in save(self, obj, save_persistent_id)
39 import spacy # type: ignore
40
---> 41 if issubclass(obj_type, spacy.Language):
42 pklregister(obj_type)(_save_spacyLanguage)
43 if "tiktoken" in sys.modules:

AttributeError: module 'spacy' has no attribute 'Language'`

2024-02-20 16 03 57 colab research google com 5c9c960678d5

@ArthurZucker
Copy link
Collaborator

Hey! The error seems to point to a dataset issue. Would recommend to upgrade that. Without a proper reproducer there is nothing we can do for you 🤗

@SethvdAxe
Copy link
Author

Thank you kindly for your effort in reacting all. I was busy for a few weeks with a different project. Now back at it.

I am not sure if this is related at all or not but I have a bug, that I had a few weeks ago also. Back then it was solved by:

forced_decoder_ids = processor.tokenizer.get_decoder_prompt_ids(language="Dutch", task="transcribe")
# and ensuring you are on transformers 4.37.2 resolves this. Setting the forced decoder prompt ids currently does not work on the dev branch.

Now this solution this not work anymore and I'm pulling my hairs out what I am missing right now that I did not miss back then.

Evaluating the trainer has WER of 20% on Dutch common voice while Inference Pipeline has WER on 2.5% on exactly the same data. The problem even persists even when I first define the inference pipeline and then use pipeline.tokenizer, pipeline.feature_extractor and pipeline.model as arguments for the Trainer and then immediately do trainer.evaluate().

See also: https://discuss.huggingface.co/t/whisper-finetuning-dutch-weird-double-characters/71338/2

@ArthurZucker
Copy link
Collaborator

There has a been a lot of updates to make the API a lot better for the user. The model card available here mentions the generate_kwargs which should help you.

I am going to close this issue as both @patrickvonplaten and my comments should have adresse your inquiries.

@AQEEL-SHAFY
Copy link

AQEEL-SHAFY commented Mar 18, 2024

Due to a bug fix in #28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass language='en'.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-34-3435b262f1ae> in <cell line: 1>()
----> 1 trainer.train()

8 frames
/usr/local/lib/python3.10/dist-packages/transformers/models/whisper/generation_whisper.py in _retrieve_init_tokens(self, input_features, generation_config, config, num_segment_frames, kwargs)
   1164 
   1165             if torch.unique(lang_ids).shape[0] > 1:
-> 1166                 raise ValueError(
   1167                     "Multiple languages detected when trying to predict the most likely target language for transcription. It is currently not supported to transcribe to different languages in a single batch. Please make sure to either force a single language by passing `language='...'` or make sure all input audio is of the same language."
   1168                 )

ValueError: Multiple languages detected when trying to predict the most likely target language for transcription. It is currently not supported to transcribe to different languages in a single batch. Please make sure to either force a single language by passing `language='...'` or make sure all input audio is of the same language.

I'm getting this error... Plz anyone can help me

@SethvdAxe
Copy link
Author

This was my question basically too. I was not getting how to pass these now-required language arguments to the trainer rather than evaluate. What I ended up doing was this:

model = model = WhisperForConditionalGeneration.from_pretrained(....)

def custom_generate(self, *args, **kwargs):
    kwargs["language"] = your_language # 'en', 'nl'

    return WhisperForConditionalGeneration.generate(self, *args, **kwargs)

model.generate = custom_generate.__get__(model, WhisperForConditionalGeneration)

I am pretty sure a better solution will come along soon, but this works!

@sanchit-gandhi
Copy link
Contributor

Fixed in #29938 and huggingface/blog#1944

@mrmuminov
Copy link

I fix this by installing transformers==4.37.2

@sanchit-gandhi
Copy link
Contributor

Ideally, you should update to the latest version of transformers:

pip install --upgrade transformers

While also using the latest version of the fine-tuning tutorial (which also installs the latest version of all the relevant libraries).

@asierhv
Copy link

asierhv commented Apr 11, 2024

I fix this by installing transformers==4.37.2

Thanks! This worked for me.

@Komalsai234
Copy link

Hi everyone, I have a problem in my code. I am trying to fine the whisper model on Sanskrit on which the whisper is not trained. I took the tokenizer from existing hugging face repo of Bidwill/Sanskrit-Asr-Whisper-small.

while doing the trained i am getting this error. Please Help
Screenshot (3201)

@asierhv
Copy link

asierhv commented May 6, 2024

Hi everyone, I have a problem in my code. I am trying to fine the whisper model on Sanskrit on which the whisper is not trained. I took the tokenizer from existing hugging face repo of Bidwill/Sanskrit-Asr-Whisper-small.

while doing the trained i am getting this error. Please Help Screenshot (3201)

Which version of transformers are you working on? I tried to hardcode that 'language' flag to a single one and nothing worked. You should try to downgrade first and see if it gets fixed, it's what worked for me.

@ArthurZucker
Copy link
Collaborator

For both of you, a reproducer would be needed, along with the version of transformers that you are using

cc @kamilakesbi and @ylacombe

@yaojingguo
Copy link

After running pip install --upgrade transformers to update transformers to 4.41.1, my problem is solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.