Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2x slowdown compared to whisper-turbo #257

Open
obenjiro opened this issue Sep 26, 2024 · 4 comments
Open

2x slowdown compared to whisper-turbo #257

obenjiro opened this issue Sep 26, 2024 · 4 comments

Comments

@obenjiro
Copy link
Contributor

obenjiro commented Sep 26, 2024

I created a prototype using Whisper-Turbo, which performed well and processed files quickly. I was using an 8-bit quantized medium model (specifically this one: https://rmbl.us/whisper-turbo/medium-q8g16.bin). However, since Whisper-Turbo is no longer supported, I had to switch to Ratchet.

In Ratchet, I used the FL33TW00D-HF/whisper-medium model with an 8-bit quantized medium bin (https://rmbl.us/FL33TW00D-HF/whisper-medium:medium_q8.bin). Unfortunately, this model was about 2x-3x slower than Whisper-Turbo. It's possible that the slowdown is due to changes in the runtime environment rather than the model itself.

Here's a test for 45 second audio file (same model size whisper-medium-8bit):
Whisper-Turbo: 20sec
Ratchet: 62 sec

I've been experimenting with DistilWhisperLargeV3 and I'm seeing some impressive results - it can process 45 second audio file in just 13 seconds. However, it seems to be limited to English language inputs only, so it doesn't work for non-English languages :/

Could you help me out by checking if there's a multilingual version of DistilWhisperLargeV3 model available on Hugging Face (mb we can use https://huggingface.co/distil-whisper/distil-large-v3), or maybe we could look into Whisper Medium and figure out the problem with slower processing time?

@FL33TW00D
Copy link
Collaborator

Hi @obenjiro,
Thanks for your continued support for these projects.

whisper-turbo had a lot of whisper specific optimizations, so it's unsurprising that Ratchet is 3x slower.
That said, all of those ideas aren't lost, and I'd love to ship them in Ratchet.

There is no multi lingual version of Distil Whisper unfortunately, so regular Whisper would be the best way to go.

@FL33TW00D
Copy link
Collaborator

There is now multilingual "whisper turbo"! Stay tuned!

@obenjiro
Copy link
Contributor Author

obenjiro commented Oct 2, 2024

Thanks for your hard work on a project! :)

whisper-turbo had a lot of whisper specific optimizations,

Can I help with bringing some of them to ratchet? Mb there is some low hanging fruit that I can help with (have some experience with Rust, Wasm, JS/TS)

There is now multilingual "whisper turbo"! Stay tuned!

Ye saw that too 🚀 (PS quality of Whisper-Turbo model is more then good)

@FL33TW00D
Copy link
Collaborator

Thanks for your hard work on a project! :)

whisper-turbo had a lot of whisper specific optimizations,

Can I help with bringing some of them to ratchet? Mb there is some low hanging fruit that I can help with (have some experience with Rust, Wasm, JS/TS)

There is now multilingual "whisper turbo"! Stay tuned!

Ye saw that too 🚀 (PS quality of Whisper-Turbo model is more then good)

The most interesting one (and most likely the one with the biggest impact if my memory serves correctly) was the caching for static models.

Whisper consists of 2 models, and encoder and a decoder. At this time, Ratchet JIT compiles each of the models like the following:
.resolve() -> allocate_storage() -> compile_gpu() -> executable.dispatch().

For the encoder model, where everything is completely static (only the input data changes, structure/dims remains the same), this is wasteful. We could cache the executable and just change the input data. This would be much faster, and it's what whisper-turbo did.

How we model this and do it in a more general way is a bit of a challenge!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants