[whisper] Add OpenAI API compatibility #17921

dalgwen · 2024-12-17T21:54:41Z

Use an external OpenAI compatible API as another option besides the locally embedded whisper.

Whisper is excellent as an embedded service, but it can also be remotely exposed, with services such as OpenAI obviously, but also by some other open source projects wrapping it (backed by whisper.cpp, or faster-whisper), such as faster-whisper-server.

So, this MR adds the possibility to use a remote Whisper by calling an OpenAI compatible API exposing it. I briefly thought about doing a new add-on, but deemed it overkill. It is another way of using Whisper, so I hope it's appropriate to do it here ?

Some refactors I hope @GiviMAD can take a look at :

whisper related code (initialization) delayed to the last moment in order to keep API mode / local mode separated as much as possible. So, is it ok to initialize WhisperState inside the new recognizeLocal method, effectively doing it in each iteration of the main while loop ? Or should I keep it as a global parameter, initialized once, outside the while loop ?
float32 array conversion delayed to the last moment (short16 is used for remote whisper so I kept it as the main format, as I didn't want to do short16 -> float32 -> short16)
inside the newly extracted method recognizeLocal, I use exception and null return as a signal to break the external loop, instead of the break keyword. Branching condition and output should be the same nonetheless.
I added language as a configuration parameter, to not depend on the system locale if the user wants so.

Thanks !

Signed-off-by: Gwendal Roulleau <[email protected]>

GiviMAD · 2024-12-17T22:13:47Z

Hello @dalgwen, this is great, I was about to look at it because I bought an Nvidia Orin Nano development kit and I want to host whisper there using an API that mimics OpenAI so this also solves my problem.

I'm going to make a quick test tomorrow but I just reviewed the code and everything looks good to me.

Also I'm about to upgrade the whisper-jni library to use the latests whisper.cpp version, but I think I will wait to create a PR until this gets merge.

GiviMAD · 2024-12-21T14:04:40Z

Hello again @dalgwen, sorry for the delay.

I was able to test the changes today and everything works correctly.

I was even able to use the new API mode to run the transcription against a whisper.cpp server instance running on the other machine as I commented early. The strange thing is that I had to run the server using the convert functionally, so I think they are still expecting raw float raw pcm data, but with that enabled it works like charm. ./build/bin/server --convert --host 0.0.0.0 -m models/ggml-large-v3-turbo-q5_0.bin

The large turbo model runs really well in the Nvidia Orin nano, and I saw the other day that they have lowered the price to 250$, so seems like a great alternative to power these kind of services (it's a pity it only has 8Gb of ram).

About the PR, ~~I think it will be good to annotate on the readme grammar section that it only works for local~~ (I just realized it's now a subsection of local mode, sorry), and also you need to regenerate the translation files, but apart from that I think it's ok. Thank you so much for it!

lsiepel

Some style improvements and the properties file needs to be regenerated

lsiepel · 2024-12-21T23:34:23Z

....voice.whisperstt/src/main/java/org/openhab/voice/whisperstt/internal/WhisperSTTService.java

@@ -245,34 +286,20 @@ public Set<AudioFormat> getSupportedFormats() {
    @Override
    public STTServiceHandle recognize(STTListener sttListener, AudioStream audioStream, Locale locale, Set<String> set)
            throws STTException {
+


Suggested change

Thank you, done

lsiepel · 2024-12-21T23:34:53Z

....voice.whisperstt/src/main/java/org/openhab/voice/whisperstt/internal/WhisperSTTService.java

+    }
+
+    private String recognizeAPI(int audioSamplesOffset, short[] audioStream, String language) throws STTException {
+


Suggested change

dalgwen · 2024-12-24T09:21:00Z

Some style improvements and the properties file needs to be regenerated

Thanks, will do ASAP.

The strange thing is that I had to run the server using the convert functionally, so I think they are still expecting raw float raw pcm data

Disclaimer : I'm not very knowledgeable about audio format. I just took a exemple somewhere on internet and thought that the the int16 format was the base for the whisper service. So I'm of course OK to use float32 if it's simpler.

So, I flip the code again and use float32 as a base, and run a test : float32 also works with faster-whisper (and faster-whisper), BUT kinda only. The transcription is bad, really bad. Not totally wrong, but its unusable nonetheless. I must have done something wrong. The commit is here, if you see something please let me know ?
Depending on your answer, I see two possibilities:

some server / implementation must (or prefer to) use a specific format to get good result, and so we put a parameter to choose this output format ? (or reuse the one for the "Create WAV Record" option)
I did something wrong, and we can use float32 once it is fixed

The large turbo model runs really well in the Nvidia Orin nano

Very interesting device ! Regarding your comment, do you also use whisper.cpp on this device, or another backend ?
I didn't made any comparative test, but beside the fact that I had issue with my old CPU with whisper.ccp, that I had not with faster-whisper, I read online that faster-whisper is way faster than whisper.cpp. Do you have a reason to use whisper.cpp (quality, etc) ?

dalgwen · 2024-12-24T11:10:15Z

I made another test, this time with the whisper-server (included in official whisper release. Is this the one you tested ?).
Even with a fp32 output, I have to use the "--convert" option.
The results in english were good, but were utterly nonsense in french (despite of course using a universal model).

GiviMAD

Hey @dalgwen, I realized of the problem this morning after digging a little in the whisper.cpp server source code, seems like we were sending a corrupted wav header as the sample size in the header was incorrect. So the server was not expecting raw audio as I initially assumed. After fixing that it correctly works without enabling the convert option.

....voice.whisperstt/src/main/java/org/openhab/voice/whisperstt/internal/WhisperSTTService.java

GiviMAD · 2024-12-28T15:12:21Z

Disclaimer : I'm not very knowledgeable about audio format. I just took a exemple somewhere on internet and thought that the the int16 format was the base for the whisper service. So I'm of course OK to use float32 if it's simpler.

My nether, I assumed the float32 sample format looking at the code types, but after reading it carefully you are correct it's expecting s16 WAVE files.

Very interesting device ! Regarding your comment, do you also use whisper.cpp on this device, or another backend ?
I didn't made any comparative test, but beside the fact that I had issue with my old CPU with whisper.ccp, that I had not with faster-whisper, I read online that faster-whisper is way faster than whisper.cpp. Do you have a reason to use whisper.cpp (quality, etc) ?

I'm using whisper.cpp but I haven't tested any alternatives. I assume that since I'm running the inference in the gpu the performance changes must be insignificant, but I'll let you know if I end up trying anything else.

I made another test, this time with the whisper-server (included in official whisper release. Is this the one you tested ?).

I'm using the one available in the whisper.cpp project under examples/server, I built it using the v1.7.3 tag.

Apply PR comments Signed-off-by: Gwendal Roulleau <[email protected]>

dalgwen · 2024-12-30T10:56:16Z

Regarding the translation properties file regeneration, I don't know why but the automatic regeneration using maven didn't work (even when deleting the entire config.xml file).
I add to resort to manual meddling.
And so, should I also modify the italian translation, or is it taken care of by crowdin ?

Apart from that, I applied your remarks. Thank again lsiepel and GiviMAD.

GiviMAD

LGTM

dalgwen requested a review from GiviMAD as a code owner December 17, 2024 21:54

dalgwen added enhancement An enhancement or new feature for an existing add-on about to close Issue potentially resolved and no feedback received and removed about to close Issue potentially resolved and no feedback received labels Dec 17, 2024

[whisper] Add OpenAI API compatibility

e404735

Signed-off-by: Gwendal Roulleau <[email protected]>

dalgwen force-pushed the whisper/openai_api branch from 73ebbae to e404735 Compare December 17, 2024 22:06

lsiepel requested changes Dec 21, 2024

View reviewed changes

GiviMAD requested changes Dec 28, 2024

View reviewed changes

....voice.whisperstt/src/main/java/org/openhab/voice/whisperstt/internal/WhisperSTTService.java Outdated Show resolved Hide resolved

[whisper] Add OpenAI API compatibility

5487ef1

Apply PR comments Signed-off-by: Gwendal Roulleau <[email protected]>

GiviMAD approved these changes Dec 31, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[whisper] Add OpenAI API compatibility #17921

[whisper] Add OpenAI API compatibility #17921

dalgwen commented Dec 17, 2024

GiviMAD commented Dec 17, 2024

GiviMAD commented Dec 21, 2024 •

edited

Loading

lsiepel left a comment

lsiepel Dec 21, 2024

dalgwen Dec 30, 2024

lsiepel Dec 21, 2024

dalgwen Dec 30, 2024

dalgwen commented Dec 24, 2024 •

edited

Loading

dalgwen commented Dec 24, 2024

GiviMAD left a comment

GiviMAD commented Dec 28, 2024 •

edited

Loading

dalgwen commented Dec 30, 2024

GiviMAD left a comment

		}

		private String recognizeAPI(int audioSamplesOffset, short[] audioStream, String language) throws STTException {

[whisper] Add OpenAI API compatibility #17921

Are you sure you want to change the base?

[whisper] Add OpenAI API compatibility #17921

Conversation

dalgwen commented Dec 17, 2024

GiviMAD commented Dec 17, 2024

GiviMAD commented Dec 21, 2024 • edited Loading

lsiepel left a comment

Choose a reason for hiding this comment

lsiepel Dec 21, 2024

Choose a reason for hiding this comment

dalgwen Dec 30, 2024

Choose a reason for hiding this comment

lsiepel Dec 21, 2024

Choose a reason for hiding this comment

dalgwen Dec 30, 2024

Choose a reason for hiding this comment

dalgwen commented Dec 24, 2024 • edited Loading

dalgwen commented Dec 24, 2024

GiviMAD left a comment

Choose a reason for hiding this comment

GiviMAD commented Dec 28, 2024 • edited Loading

dalgwen commented Dec 30, 2024

GiviMAD left a comment

Choose a reason for hiding this comment

GiviMAD commented Dec 21, 2024 •

edited

Loading

dalgwen commented Dec 24, 2024 •

edited

Loading

GiviMAD commented Dec 28, 2024 •

edited

Loading