Replies: 3 comments 3 replies
-
Thanks! Looks nice and seems very easy to set up (I haven't tried yet though). Possible extensions that came to mind:
Some other possible features, e.g. confidence color-coding, diarization, word-level timestamps, prompts, would of course also require (UI) work on the Kõnele side. Btw, it seems possible to run Whisper also directly on the phone (I haven't tried yet though), see e.g. https://github.com/alex-vt/WhisperInput |
Beta Was this translation helpful? Give feedback.
-
Thank you for the words of encouragement! I was not aware of WhisperInput, thanks! I just tried it out and it is excellent. On my phone (Pixel 5a), with my pretty fast network connection and pretty fast self-hosted server running whisper.cpp, WhisperInput is generally quite a bit slower doing the voice recognition on the phone compared to sending it across network to my fast server using my websocket thing. I expect this would not be true on an iPhone where whisper.cpp is more optimized. And it would depend on the length of the text being recognized with longer text favoring the faster server over the phone. Still it is good to have an offline option. All of your suggestions for additions sound great and after playing with this for a while I may work to improve the server and maybe try to add some features to K6nele. Meanwhile I am very happy! Thank you for your work on K6nele on which my addition relies. |
Beta Was this translation helpful? Give feedback.
-
Hi, I also wrote an interface for konele, including both POST and websocket methods. However, I am using the Additionally, since English is not my first language, I set a rule that when I specify English as the target language, it enters whisper's translation mode. Real-time voice translation from any languages to English, it's really cool! The code I'm using is here https://github.com/heimoshuiyu/whisper-fastapi, it's very short, feel free to take a look. |
Beta Was this translation helpful? Give feedback.
-
For several years I have been self-hosting a Kaldi-based server that I use with my Android phone for voice recognition services with the excellent Konele app on the client side. Recently I became excited by the outstanding performance of a port of the Whisper voice recognition system by ggerganov (https://github.com/ggerganov/whisper.cpp#whispercpp). I implemented a very quick websocket server to that code for Konele to use, in the spirit of the Kaldi gstreamer server (https://github.com/alumae/kaldi-gstreamer-server).
It is very crude but works very well because Whisper is fantastic, and the ggerganov implementation is very efficient! Here is the repo: https://github.com/rpdrewes/whisper-websocket-server
I'd appreciate comments and suggestions.
Beta Was this translation helpful? Give feedback.
All reactions