speak to text , and Text to Text for language trans "extensions" #203

gedw99 · 2024-10-29T02:24:34Z

whisper can be wrapped with golang easily and then the system can do speak to text.

working demo here:

https://github.com/gedw99/galene-stt that is NOT integrated with broadcast-box yet.

This makefile works everywhere and "dep-test" will run and do an audio to text...

Text to Text might also be useful as another "extension".

Just raising to see if there is support for integration or not.

ChaseCares · 2024-10-29T15:34:33Z

Hello! This is a interesting idea, I can't speak to whether adding extensions would be feasible, or if that is something Sean would like to add. However, I do have reservations to adding speech recognition. I use and rely on stt and have first hand experience in how inaccurate the transcribing can be. I don't believe that it would be accurate enough to be useful, in my experience the transcriptions often require heavy editing to match what was spoken.

With that said, if it was going to be implemented, I think adding some warnings about accuracy would be a good idea, and that the text may be inaccurate or misleading. This is important because if a user is exclusively relying on the text to understand what is going on they would have no way to verify the accuracy.

I am optimistic about the technology, it has gotten significantly better recently. I would be very interesting to hear other people's opinions about it.

Thanks for the suggestion!

gedw99 · 2024-12-06T00:22:42Z

Agree that accuracy warnings are required. I have found it to work really well , it proof is in the trying .

Once you use local STT it’s highly compelling because the latency is gone, so it feels so much more natural.

There is a PR that adds Pocketbase Auth to Broadcast box and I could add STT server side via benthos plugins . That’s one way to do it and not glog up the main binary. It’s a WASM plugin them .

This also means it can be added client side as a WASM plugin too. The OS have their own STT but it’s not available to webrtc browsers I believe ?

this architectural approach is 1 that I have found useful because it’s much easier to extend a system and not get package dependency bloat . Benthos is very light if you use the version with no extra stuff , and then use WASM or stdio to call the plugins.

@Sean-Der
@neilschark

any thoughts on this idea ???

gedw99 changed the title ~~speak to text , and Text to Text for language trans~~ speak to text , and Text to Text for language trans "extensions" Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

speak to text , and Text to Text for language trans "extensions" #203

speak to text , and Text to Text for language trans "extensions" #203

gedw99 commented Oct 29, 2024 •

edited

Loading

ChaseCares commented Oct 29, 2024

gedw99 commented Dec 6, 2024

speak to text , and Text to Text for language trans "extensions" #203

speak to text , and Text to Text for language trans "extensions" #203

Comments

gedw99 commented Oct 29, 2024 • edited Loading

ChaseCares commented Oct 29, 2024

gedw99 commented Dec 6, 2024

gedw99 commented Oct 29, 2024 •

edited

Loading