Questions: using v3 turbo + quantization + faster-whisper #9

thiswillbeyourgithub · 2024-10-02T18:02:25Z

I'm running a faster-whisper-server backend on a computer and am blown away by the rapid advancement in the field.

Notably, on my cheap low end hardware I'm still able to transcribe text way faster than I can speak it using the large v3 turbo model and int8 quantization. Using faster-whisper, using v3 turbo models and using quantization all resulted in a subjective leap in speed.

Naturally, some questions come to mind regarding the voice input of FUTO on android:

Does FUTO plan on redoing the training with the latest v3 turbo models? They are more compact so we can imagine having near real time transcription on non large models on android right? If not planned, why not / what is missing? Is there anything the community can do to help?
Actually looking at the notebook seems to indicate that this work was based on the original whisper implementation by openai. Is there a reason not to use faster-whisper from SYSTRAN? It's a way faster implementation. I made a FUTO shoutout there btw
In the same vein, I don't think I see any quantization applied in the notebook, would that be a huge speed enhancement too? By making the model smaller and faster we can also considerably reduce the model loading time.

Thanks a lot for everything you've been doing.

abb128 · 2025-01-06T07:35:52Z

Hi, thank you for the comments. I haven't tried evaluating v3 turbo models yet but plan to do so. As I understand it, the encoder is still large-sized which may present difficulties running on mobile and memory-constrained devices but I'd be interested to hear your experience. In the app we use whisper.cpp and quantized models, there's not much reason to switch to faster-whisper over it, in fact the benchmarks in their README indicate it uses 2x more memory in the case of small models on CPU

thiswillbeyourgithub · 2025-01-08T16:26:44Z

Hi, thank you for the comments. I haven't tried evaluating v3 turbo models yet but plan to do so.

Great to hear!

As I understand it, the encoder is still large-sized which may present difficulties running on mobile and memory-constrained devices

I have not seen anywhere this mentioned. I thought it was a true drop in replacement. I could totally be wrong though.

but I'd be interested to hear your experience.

I have no experience with using the v3 models for android or on acft fine tunes. I only have experience with your vanilla models and the faster whisper v3 on my computer.

I can only report on using it with my consumer hardware from years ago (GTX 1080), the large v3 model faster whisper and int8 quantized has extremely low latency, maybe 5 to 10 times better than openai direct calls! I have never noticed any difference in quality between the openai api and my own version, it is just much faster and only takes about 900mb of vram IIRC.

In the app we use whisper.cpp and quantized models, there's not much reason to switch to faster-whisper over it, in fact the benchmarks in their README indicate it uses 2x more memory in the case of small models on CPU

I didn't know about this. But still a lot of people with nowaday's phone and their huge RAM might not be bothered by taking up twice as much ram and would prefer it over waiting. In any case when I see what can be done with 900MB on my GPU I'm optimistic about what could be done on phone. Well you proved it already actually but I meant we can maybe have a pareto improvement.

thiswillbeyourgithub changed the title ~~questions: using v3 turbo + quantization + faster-whisper~~ Questions: using v3 turbo + quantization + faster-whisper Oct 2, 2024

This was referenced Oct 2, 2024

Is it possible to add audio context length parameter like in whisper.cpp SYSTRAN/faster-whisper#171

Closed

Is it possibile to controll the number of seconds the input audio file is chuncked? SYSTRAN/faster-whisper#428

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Questions: using v3 turbo + quantization + faster-whisper #9

Questions: using v3 turbo + quantization + faster-whisper #9

thiswillbeyourgithub commented Oct 2, 2024 •

edited

Loading

abb128 commented Jan 6, 2025

thiswillbeyourgithub commented Jan 8, 2025

Questions: using v3 turbo + quantization + faster-whisper #9

Questions: using v3 turbo + quantization + faster-whisper #9

Comments

thiswillbeyourgithub commented Oct 2, 2024 • edited Loading

abb128 commented Jan 6, 2025

thiswillbeyourgithub commented Jan 8, 2025

thiswillbeyourgithub commented Oct 2, 2024 •

edited

Loading