-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
speak to text , and Text to Text for language trans "extensions" #203
Comments
Hello! This is a interesting idea, I can't speak to whether adding extensions would be feasible, or if that is something Sean would like to add. However, I do have reservations to adding speech recognition. I use and rely on stt and have first hand experience in how inaccurate the transcribing can be. I don't believe that it would be accurate enough to be useful, in my experience the transcriptions often require heavy editing to match what was spoken. With that said, if it was going to be implemented, I think adding some warnings about accuracy would be a good idea, and that the text may be inaccurate or misleading. This is important because if a user is exclusively relying on the text to understand what is going on they would have no way to verify the accuracy. I am optimistic about the technology, it has gotten significantly better recently. I would be very interesting to hear other people's opinions about it. Thanks for the suggestion! |
Agree that accuracy warnings are required. I have found it to work really well , it proof is in the trying . Once you use local STT it’s highly compelling because the latency is gone, so it feels so much more natural. There is a PR that adds Pocketbase Auth to Broadcast box and I could add STT server side via benthos plugins . That’s one way to do it and not glog up the main binary. It’s a WASM plugin them . This also means it can be added client side as a WASM plugin too. The OS have their own STT but it’s not available to webrtc browsers I believe ? this architectural approach is 1 that I have found useful because it’s much easier to extend a system and not get package dependency bloat . Benthos is very light if you use the version with no extra stuff , and then use WASM or stdio to call the plugins. any thoughts on this idea ??? |
whisper can be wrapped with golang easily and then the system can do speak to text.
working demo here:
https://github.com/gedw99/galene-stt that is NOT integrated with broadcast-box yet.
This makefile works everywhere and "dep-test" will run and do an audio to text...
Text to Text might also be useful as another "extension".
Just raising to see if there is support for integration or not.
The text was updated successfully, but these errors were encountered: