-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
2x slowdown compared to whisper-turbo #257
Comments
Hi @obenjiro,
There is no multi lingual version of Distil Whisper unfortunately, so regular Whisper would be the best way to go. |
There is now multilingual "whisper turbo"! Stay tuned! |
Thanks for your hard work on a project! :)
Can I help with bringing some of them to ratchet? Mb there is some low hanging fruit that I can help with (have some experience with Rust, Wasm, JS/TS)
Ye saw that too 🚀 (PS quality of Whisper-Turbo model is more then good) |
The most interesting one (and most likely the one with the biggest impact if my memory serves correctly) was the caching for static models. Whisper consists of 2 models, and encoder and a decoder. At this time, Ratchet JIT compiles each of the models like the following: For the encoder model, where everything is completely static (only the input data changes, structure/dims remains the same), this is wasteful. We could cache the How we model this and do it in a more general way is a bit of a challenge! |
I created a prototype using Whisper-Turbo, which performed well and processed files quickly. I was using an 8-bit quantized medium model (specifically this one: https://rmbl.us/whisper-turbo/medium-q8g16.bin). However, since Whisper-Turbo is no longer supported, I had to switch to Ratchet.
In Ratchet, I used the FL33TW00D-HF/whisper-medium model with an 8-bit quantized medium bin (https://rmbl.us/FL33TW00D-HF/whisper-medium:medium_q8.bin). Unfortunately, this model was about 2x-3x slower than Whisper-Turbo. It's possible that the slowdown is due to changes in the runtime environment rather than the model itself.
Here's a test for 45 second audio file (same model size whisper-medium-8bit):
Whisper-Turbo: 20sec
Ratchet: 62 sec
I've been experimenting with DistilWhisperLargeV3 and I'm seeing some impressive results - it can process 45 second audio file in just 13 seconds. However, it seems to be limited to English language inputs only, so it doesn't work for non-English languages :/
Could you help me out by checking if there's a multilingual version of DistilWhisperLargeV3 model available on Hugging Face (mb we can use https://huggingface.co/distil-whisper/distil-large-v3), or maybe we could look into Whisper Medium and figure out the problem with slower processing time?
The text was updated successfully, but these errors were encountered: