-
Notifications
You must be signed in to change notification settings - Fork 27.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bug in whisper finetuning tutorial? "Multiple languages detected when trying to predict..." #28814
Comments
Ok, can confirm that on 4.37.2 this bug does not appear. |
cc @patrickvonplaten as well |
I to have the same error. Verified my dataset, this is 1 language. |
@sanchit-gandhi we should probably also make sure to install |
Hey @rishabhjain16, Ah yes indeed the training loop runs the evaluation loop inside and sadly doesn't let the user pass any generation key word params such as with: from transformers import WhisperForConditionalGeneration
model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small")
model.generation_config.language = "hi" # define your language of choice here and the training should work! |
Thank you @patrickvonplaten for getting back to me so quickly. I will give it a try. |
Hi,everyone I added this program `from transformers import WhisperForConditionalGeneration model = WhisperForConditionalGeneration.from_pretrained("openai/whisper-small") Then,Erros occurred. `--------------------------------------------------------------------------- 17 frames AttributeError: module 'spacy' has no attribute 'Language'` |
Hey! The error seems to point to a |
Thank you kindly for your effort in reacting all. I was busy for a few weeks with a different project. Now back at it. I am not sure if this is related at all or not but I have a bug, that I had a few weeks ago also. Back then it was solved by:
Now this solution this not work anymore and I'm pulling my hairs out what I am missing right now that I did not miss back then. Evaluating the trainer has WER of 20% on Dutch common voice while Inference Pipeline has WER on 2.5% on exactly the same data. The problem even persists even when I first define the inference pipeline and then use pipeline.tokenizer, pipeline.feature_extractor and pipeline.model as arguments for the Trainer and then immediately do trainer.evaluate(). See also: https://discuss.huggingface.co/t/whisper-finetuning-dutch-weird-double-characters/71338/2 |
There has a been a lot of updates to make the API a lot better for the user. The model card available here mentions the I am going to close this issue as both @patrickvonplaten and my comments should have adresse your inquiries. |
Due to a bug fix in #28687 transcription using a multilingual Whisper will default to language detection followed by transcription instead of translation to English.This might be a breaking change for your use case. If you want to instead always translate your audio to English, make sure to pass
I'm getting this error... Plz anyone can help me |
This was my question basically too. I was not getting how to pass these now-required language arguments to the trainer rather than evaluate. What I ended up doing was this:
I am pretty sure a better solution will come along soon, but this works! |
Fixed in #29938 and huggingface/blog#1944 |
I fix this by installing transformers==4.37.2 |
Ideally, you should update to the latest version of transformers:
While also using the latest version of the fine-tuning tutorial (which also installs the latest version of all the relevant libraries). |
Thanks! This worked for me. |
For both of you, a reproducer would be needed, along with the version of transformers that you are using cc @kamilakesbi and @ylacombe |
After running |
System Info
Transformers version: 4.38.0.dev0
Python version: Python3.10 venv (local)
Platform: MacOS Venture 13.5
Who can help?
@sanchit-gandhi
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Thank you for the amazing whisper finetuning tutorial at: https://huggingface.co/blog/fine-tune-whisper
When I download the ipynb and run it locally it runs fine.
However, when I change a single line (the last line) from:
to:
I get the following error:
Full error log:
Is this expected behaviour? Thank you kindly in advance.
Expected behavior
A normal evaluation run to evaluate the performance of the model on the language before starting to train it.
The text was updated successfully, but these errors were encountered: