How to set the target language for examples in README? #130

clstaudt · 2024-05-01T11:52:00Z

The code examples in the README do not make it obvious how to set the language of the audio to transcribe.

The default settings create garbled english text if the audio language is different.

CheshireCC · 2024-05-05T12:33:58Z

it seams that this model only output English subtitles.

clstaudt · 2024-05-05T12:40:29Z

@CheshireCC If that is the case, would it be a distilled version of Whisper?

"Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. "

https://openai.com/index/whisper

CheshireCC · 2024-05-12T10:18:02Z

@clstaudt

maybe distilled version requires re-training of the model，just like fine-tuning a model.
https://github.com/huggingface/distil-whisper#:~:text=Note%3A%20Distil,checkpoints%20when%20ready!

"Note: Distil-Whisper is currently only available for English speech recognition. We are working with the community to distill 
Whisper on other languages. If you are interested in distilling Whisper in your language, check out the provided training code. 
We will soon update the repository with multilingual checkpoints when ready!"

sanchit-gandhi · 2024-05-20T16:46:05Z

Indeed - as @CheshireCC has mentioned, you can train your own multilingual distil-whisper checkpoint according to the training readme. This has been done successfully in a number of languages, such as for French and German.

Also cc @eustlb having done some extensive experimentation into French distillation.

eustlb · 2024-05-22T10:32:19Z

Hey @clstaudt @CheshireCC, indeed distil-large-v3 has been trained to do English-only transcriptions. More details about motivations here.

clstaudt · 2024-05-22T10:36:30Z

Thanks for clarifying @eustlb. I'm about to give a presentation praising the potential of distillation with distil-whisper as the prime example. While the speedup is impressive, I think it's important to add that it's just one language while the teacher model was multilingual. What do you think will be the speedup and size reduction for a multilingual distil-whisper?

eustlb · 2024-05-22T11:59:08Z

Thanks for promoting distil-whisper, @clstaudt!

Actually, you can find the info about this here on the README and here on the model card, but thanks for mentioning it! It may not be clear enough.

Concerning the multilingual distilled Whisper, it is a very difficult question to answer without proper experimentation, and I prefer not to give false insights. There are a lot of factors to take into account (e.g., number of languages, dataset sizes, etc.). Yet, I would say that were you to have large enough datasets for a few languages and manage to get good results with a 4-layers decoder, the size reduction would be 48%—an exact value—(compared to 51% for a 2-layers decoder) and the speed-up should be around 5.5x—a rough estimation, to be taken with a big pinch of salt—(compared to 6.3x for a 2-layers decoder).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to set the target language for examples in README? #130

How to set the target language for examples in README? #130

clstaudt commented May 1, 2024

CheshireCC commented May 5, 2024

clstaudt commented May 5, 2024

CheshireCC commented May 12, 2024

sanchit-gandhi commented May 20, 2024 •

edited

Loading

eustlb commented May 22, 2024

clstaudt commented May 22, 2024 •

edited

Loading

eustlb commented May 22, 2024

How to set the target language for examples in README? #130

How to set the target language for examples in README? #130

Comments

clstaudt commented May 1, 2024

CheshireCC commented May 5, 2024

clstaudt commented May 5, 2024

CheshireCC commented May 12, 2024

sanchit-gandhi commented May 20, 2024 • edited Loading

eustlb commented May 22, 2024

clstaudt commented May 22, 2024 • edited Loading

eustlb commented May 22, 2024

sanchit-gandhi commented May 20, 2024 •

edited

Loading

clstaudt commented May 22, 2024 •

edited

Loading