Bug in whisper finetuning tutorial? "Multiple languages detected when trying to predict..." #28814

SethvdAxe · 2024-02-01T09:44:45Z

System Info

Transformers version: 4.38.0.dev0
Python version: Python3.10 venv (local)
Platform: MacOS Venture 13.5

Who can help?

@sanchit-gandhi

Information

The official example scripts
My own modified scripts

Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)

Reproduction

Thank you for the amazing whisper finetuning tutorial at: https://huggingface.co/blog/fine-tune-whisper

When I download the ipynb and run it locally it runs fine.

However, when I change a single line (the last line) from:

trainer.train()

to:

eval_results = trainer.evaluate()

I get the following error:

ValueError: Multiple languages detected when trying to predict the most likely target language for transcription.

Full error log:

{
	"name": "ValueError",
	"message": "Multiple languages detected when trying to predict the most likely target language for transcription. It is currently not supported to transcribe to different languages in a single batch. Please make sure to either force a single language by passing `language='...'` or make sure all input audio is of the same language.",
	"stack": "---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[21], line 1
----> 1 eval_results = trainer.evaluate()
      2 print(eval_results)

File ~some_path/venv/lib/python3.10/site-packages/transformers/trainer_seq2seq.py:166, in Seq2SeqTrainer.evaluate(self, eval_dataset, ignore_keys, metric_key_prefix, **gen_kwargs)
    164 self.gather_function = self.accelerator.gather
    165 self._gen_kwargs = gen_kwargs
--> 166 return super().evaluate(eval_dataset, ignore_keys=ignore_keys, metric_key_prefix=metric_key_prefix)

File ~some_path/venv/lib/python3.10/site-packages/transformers/trainer.py:3136, in Trainer.evaluate(self, eval_dataset, ignore_keys, metric_key_prefix)
   3133 start_time = time.time()
   3135 eval_loop = self.prediction_loop if self.args.use_legacy_prediction_loop else self.evaluation_loop
-> 3136 output = eval_loop(
   3137     eval_dataloader,
   3138     description=\"Evaluation\",
   3139     # No point gathering the predictions if there are no metrics, otherwise we defer to
   3140     # self.args.prediction_loss_only
   3141     prediction_loss_only=True if self.compute_metrics is None else None,
   3142     ignore_keys=ignore_keys,
   3143     metric_key_prefix=metric_key_prefix,
   3144 )
   3146 total_batch_size = self.args.eval_batch_size * self.args.world_size
   3147 if f\"{metric_key_prefix}_jit_compilation_time\" in output.metrics:

File ~some_path/venv/lib/python3.10/site-packages/transformers/trainer.py:3325, in Trainer.evaluation_loop(self, dataloader, description, prediction_loss_only, ignore_keys, metric_key_prefix)
   3322         batch_size = observed_batch_size
   3324 # Prediction step
-> 3325 loss, logits, labels = self.prediction_step(model, inputs, prediction_loss_only, ignore_keys=ignore_keys)
   3326 main_input_name = getattr(self.model, \"main_input_name\", \"input_ids\")
   3327 inputs_decode = self._prepare_input(inputs[main_input_name]) if args.include_inputs_for_metrics else None

File ~some_path/venv/lib/python3.10/site-packages/transformers/trainer_seq2seq.py:296, in Seq2SeqTrainer.prediction_step(self, model, inputs, prediction_loss_only, ignore_keys, **gen_kwargs)
    288 if (
    289     \"labels\" in generation_inputs
    290     and \"decoder_input_ids\" in generation_inputs
    291     and generation_inputs[\"labels\"].shape == generation_inputs[\"decoder_input_ids\"].shape
    292 ):
    293     generation_inputs = {
    294         k: v for k, v in inputs.items() if k not in (\"decoder_input_ids\", \"decoder_attention_mask\")
    295     }
--> 296 generated_tokens = self.model.generate(**generation_inputs, **gen_kwargs)
    298 # Temporary hack to ensure the generation config is not initialized for each iteration of the evaluation loop
    299 # TODO: remove this hack when the legacy code that initializes generation_config from a model config is
    300 # removed in https://github.com/huggingface/transformers/blob/98d88b23f54e5a23e741833f1e973fdf600cc2c5/src/transformers/generation/utils.py#L1183
    301 if self.model.generation_config._from_model_config:

File ~some_path/venv/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py:533, in WhisperGenerationMixin.generate(self, input_features, generation_config, logits_processor, stopping_criteria, prefix_allowed_tokens_fn, synced_gpus, return_timestamps, task, language, is_multilingual, prompt_ids, prompt_condition_type, condition_on_prev_tokens, temperature, compression_ratio_threshold, logprob_threshold, no_speech_threshold, num_segment_frames, attention_mask, time_precision, return_token_timestamps, return_segments, return_dict_in_generate, **kwargs)
    527 self._set_prompt_condition_type(
    528     generation_config=generation_config,
    529     prompt_condition_type=prompt_condition_type,
    530 )
    532 # pass self.config for backward compatibility
--> 533 init_tokens = self._retrieve_init_tokens(
    534     input_features,
    535     generation_config=generation_config,
    536     config=self.config,
    537     num_segment_frames=num_segment_frames,
    538     kwargs=kwargs,
    539 )
    540 # TODO(Sanchit) - passing `decoder_input_ids` is deprecated. One should use `prompt_ids` instead
    541 # This function should be be removed in v4.39
    542 self._check_decoder_input_ids(
    543     prompt_ids=prompt_ids, init_tokens=init_tokens, is_shortform=is_shortform, kwargs=kwargs
    544 )

File ~some_path/venv/lib/python3.10/site-packages/transformers/models/whisper/generation_whisper.py:1166, in WhisperGenerationMixin._retrieve_init_tokens(self, input_features, generation_config, config, num_segment_frames, kwargs)
   1158 lang_ids = self.detect_language(
   1159     input_features=input_features,
   1160     encoder_outputs=kwargs.get(\"encoder_outputs\", None),
   1161     generation_config=generation_config,
   1162     num_segment_frames=num_segment_frames,
   1163 )
   1165 if torch.unique(lang_ids).shape[0] > 1:
-> 1166     raise ValueError(
   1167         \"Multiple languages detected when trying to predict the most likely target language for transcription. It is currently not supported to transcribe to different languages in a single batch. Please make sure to either force a single language by passing `language='...'` or make sure all input audio is of the same language.\"
   1168     )
   1170 lang_id = lang_ids[0].item()
   1172 # append or replace lang_id to init_tokens

ValueError: Multiple languages detected when trying to predict the most likely target language for transcription. It is currently not supported to transcribe to different languages in a single batch. Please make sure to either force a single language by passing `language='...'` or make sure all input audio is of the same language."
}

Is this expected behaviour? Thank you kindly in advance.

Expected behavior

A normal evaluation run to evaluate the performance of the model on the language before starting to train it.

The text was updated successfully, but these errors were encountered:

SethvdAxe · 2024-02-01T12:33:55Z

Ok, can confirm that on 4.37.2 this bug does not appear.
Something to do with #28687 I guess?

ArthurZucker · 2024-02-01T13:22:06Z

cc @patrickvonplaten as well

chicodespons · 2024-02-02T20:14:51Z

I to have the same error. Verified my dataset, this is 1 language.

patrickvonplaten · 2024-02-09T15:25:53Z

Sorry for being a bit late here. Yes this error is expected, we've recently changed the default behavior to language detection when not specifying which language is to be evaluated.

If you train your model on Hindi as shown in the notebook, can you make sure to pass:

- eval_results = trainer.evaluate()
+ eval_results = trainer.evaluate(language="hi")

so that the model doesn't try to detect the language it has to transcribe?

patrickvonplaten · 2024-02-09T15:26:35Z

@sanchit-gandhi we should probably also make sure to install accelerate in the notebook (newer versions of Transformes require accelerate for training) and I'd say we also pin transformers in the blog no? It's currently set to "main" of Transformers

rishabhjain16 · 2024-02-12T12:47:38Z

I am getting a similar error during training. Any help is appreciated.

patrickvonplaten · 2024-02-12T18:14:40Z

Hey @rishabhjain16,

Ah yes indeed the training loop runs the evaluation loop inside and sadly doesn't let the user pass any generation key word params such as "language". You can however fix this easily by replacing the following cell in the notebook:

with:

from transformers import WhisperForConditionalGeneration

model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
model.generation_config.language = "hi"  # define your language of choice here

and the training should work!

rishabhjain16 · 2024-02-13T09:24:35Z

Hey @rishabhjain16,

Ah yes indeed the training loop runs the evaluation loop inside and sadly doesn't let the user pass any generation key word params such as "language". You can however fix this easily by replacing the following cell in the notebook:
with:
from transformers import WhisperForConditionalGeneration

model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
model.generation_config.language = "hi"  # define your language of choice here
and the training should work!

Thank you @patrickvonplaten for getting back to me so quickly. I will give it a try.

s0620013 · 2024-02-20T07:09:57Z

Hi,everyone
I have a problem with my program.

I added this program

`from transformers import WhisperForConditionalGeneration

model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
model.generation_config.language = "ja" # define your language of choice here`

Then,Erros occurred.
May you help me !

`---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
in <cell line: 1>()
----> 1 trainer.train()

17 frames
/usr/local/lib/python3.10/dist-packages/datasets/utils/_dill.py in save(self, obj, save_persistent_id)
39 import spacy # type: ignore
40
---> 41 if issubclass(obj_type, spacy.Language):
42 pklregister(obj_type)(_save_spacyLanguage)
43 if "tiktoken" in sys.modules:

AttributeError: module 'spacy' has no attribute 'Language'`

ArthurZucker · 2024-02-20T09:21:33Z

Hey! The error seems to point to a dataset issue. Would recommend to upgrade that. Without a proper reproducer there is nothing we can do for you 🤗

SethvdAxe · 2024-02-29T15:13:06Z

Thank you kindly for your effort in reacting all. I was busy for a few weeks with a different project. Now back at it.

I am not sure if this is related at all or not but I have a bug, that I had a few weeks ago also. Back then it was solved by:

forced_decoder_ids = processor.tokenizer.get_decoder_prompt_ids(language="Dutch", task="transcribe")
# and ensuring you are on transformers 4.37.2 resolves this. Setting the forced decoder prompt ids currently does not work on the dev branch.

Now this solution this not work anymore and I'm pulling my hairs out what I am missing right now that I did not miss back then.

Evaluating the trainer has WER of 20% on Dutch common voice while Inference Pipeline has WER on 2.5% on exactly the same data. The problem even persists even when I first define the inference pipeline and then use pipeline.tokenizer, pipeline.feature_extractor and pipeline.model as arguments for the Trainer and then immediately do trainer.evaluate().

See also: https://discuss.huggingface.co/t/whisper-finetuning-dutch-weird-double-characters/71338/2

ArthurZucker · 2024-03-04T07:37:38Z

There has a been a lot of updates to make the API a lot better for the user. The model card available here mentions the generate_kwargs which should help you.

I am going to close this issue as both @patrickvonplaten and my comments should have adresse your inquiries.

AQEEL-SHAFY · 2024-03-18T20:48:52Z

Due to a bug fix in #28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass language='en'.

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-34-3435b262f1ae> in <cell line: 1>()
----> 1 trainer.train()

8 frames
/usr/local/lib/python3.10/dist-packages/transformers/models/whisper/generation_whisper.py in _retrieve_init_tokens(self, input_features, generation_config, config, num_segment_frames, kwargs)
   1164 
   1165             if torch.unique(lang_ids).shape[0] > 1:
-> 1166                 raise ValueError(
   1167                     "Multiple languages detected when trying to predict the most likely target language for transcription. It is currently not supported to transcribe to different languages in a single batch. Please make sure to either force a single language by passing `language='...'` or make sure all input audio is of the same language."
   1168                 )

ValueError: Multiple languages detected when trying to predict the most likely target language for transcription. It is currently not supported to transcribe to different languages in a single batch. Please make sure to either force a single language by passing `language='...'` or make sure all input audio is of the same language.

I'm getting this error... Plz anyone can help me

SethvdAxe · 2024-03-19T07:18:21Z

This was my question basically too. I was not getting how to pass these now-required language arguments to the trainer rather than evaluate. What I ended up doing was this:

model = model = WhisperForConditionalGeneration.from_pretrained(....)

def custom_generate(self, *args, **kwargs):
    kwargs["language"] = your_language # 'en', 'nl'

    return WhisperForConditionalGeneration.generate(self, *args, **kwargs)

model.generate = custom_generate.__get__(model, WhisperForConditionalGeneration)

I am pretty sure a better solution will come along soon, but this works!

sanchit-gandhi · 2024-03-28T15:30:27Z

Fixed in #29938 and huggingface/blog#1944

mrmuminov · 2024-04-02T12:56:19Z

I fix this by installing transformers==4.37.2

sanchit-gandhi · 2024-04-02T14:45:27Z

Ideally, you should update to the latest version of transformers:

pip install --upgrade transformers

While also using the latest version of the fine-tuning tutorial (which also installs the latest version of all the relevant libraries).

asierhv · 2024-04-11T11:51:05Z

I fix this by installing transformers==4.37.2

Thanks! This worked for me.

Komalsai234 · 2024-05-06T02:03:11Z

Hi everyone, I have a problem in my code. I am trying to fine the whisper model on Sanskrit on which the whisper is not trained. I took the tokenizer from existing hugging face repo of Bidwill/Sanskrit-Asr-Whisper-small.

while doing the trained i am getting this error. Please Help

asierhv · 2024-05-06T08:41:57Z

Hi everyone, I have a problem in my code. I am trying to fine the whisper model on Sanskrit on which the whisper is not trained. I took the tokenizer from existing hugging face repo of Bidwill/Sanskrit-Asr-Whisper-small.

while doing the trained i am getting this error. Please Help

Which version of transformers are you working on? I tried to hardcode that 'language' flag to a single one and nothing worked. You should try to downgrade first and see if it gets fixed, it's what worked for me.

ArthurZucker · 2024-05-23T07:14:13Z

For both of you, a reproducer would be needed, along with the version of transformers that you are using

cc @kamilakesbi and @ylacombe

yaojingguo · 2024-05-30T14:50:06Z

After running pip install --upgrade transformers to update transformers to 4.41.1, my problem is solved.

ArthurZucker closed this as completed Mar 4, 2024

sanchit-gandhi mentioned this issue Mar 28, 2024

[examples] update whisper fine-tuning #29938

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug in whisper finetuning tutorial? "Multiple languages detected when trying to predict..." #28814

Bug in whisper finetuning tutorial? "Multiple languages detected when trying to predict..." #28814

SethvdAxe commented Feb 1, 2024

SethvdAxe commented Feb 1, 2024 •

edited

Loading

ArthurZucker commented Feb 1, 2024

chicodespons commented Feb 2, 2024

patrickvonplaten commented Feb 9, 2024 •

edited

Loading

patrickvonplaten commented Feb 9, 2024

rishabhjain16 commented Feb 12, 2024

patrickvonplaten commented Feb 12, 2024 •

edited

Loading

rishabhjain16 commented Feb 13, 2024

s0620013 commented Feb 20, 2024

ArthurZucker commented Feb 20, 2024

SethvdAxe commented Feb 29, 2024

ArthurZucker commented Mar 4, 2024

AQEEL-SHAFY commented Mar 18, 2024 •

edited by ArthurZucker

Loading

SethvdAxe commented Mar 19, 2024

sanchit-gandhi commented Mar 28, 2024

mrmuminov commented Apr 2, 2024

sanchit-gandhi commented Apr 2, 2024

asierhv commented Apr 11, 2024

Komalsai234 commented May 6, 2024

asierhv commented May 6, 2024

ArthurZucker commented May 23, 2024

yaojingguo commented May 30, 2024

Bug in whisper finetuning tutorial? "Multiple languages detected when trying to predict..." #28814

Bug in whisper finetuning tutorial? "Multiple languages detected when trying to predict..." #28814

Comments

SethvdAxe commented Feb 1, 2024

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

SethvdAxe commented Feb 1, 2024 • edited Loading

ArthurZucker commented Feb 1, 2024

chicodespons commented Feb 2, 2024

patrickvonplaten commented Feb 9, 2024 • edited Loading

patrickvonplaten commented Feb 9, 2024

rishabhjain16 commented Feb 12, 2024

patrickvonplaten commented Feb 12, 2024 • edited Loading

rishabhjain16 commented Feb 13, 2024

s0620013 commented Feb 20, 2024

ArthurZucker commented Feb 20, 2024

SethvdAxe commented Feb 29, 2024

ArthurZucker commented Mar 4, 2024

AQEEL-SHAFY commented Mar 18, 2024 • edited by ArthurZucker Loading

SethvdAxe commented Mar 19, 2024

sanchit-gandhi commented Mar 28, 2024

mrmuminov commented Apr 2, 2024

sanchit-gandhi commented Apr 2, 2024

asierhv commented Apr 11, 2024

Komalsai234 commented May 6, 2024

asierhv commented May 6, 2024

ArthurZucker commented May 23, 2024

yaojingguo commented May 30, 2024

SethvdAxe commented Feb 1, 2024 •

edited

Loading

patrickvonplaten commented Feb 9, 2024 •

edited

Loading

patrickvonplaten commented Feb 12, 2024 •

edited

Loading

AQEEL-SHAFY commented Mar 18, 2024 •

edited by ArthurZucker

Loading