-
I am using the library in a situation where there's a lot of noise around, with a somewhat predictable timeframe which the stt should capture the audio( A specific person makes questions in sequence ). What happens is that sometimes, when there are other people nearby speaking at the same time i start recording, the mic stays open for too long, increasing the processing time of the model and also poluting the audio which i extract the question the person in relevance makes. |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 4 replies
-
This is mostly due to silero vad detecting "speech" where there is only background noise. First thing to do would be to reduce silero vad sensitivity and to set silero_deactivity_detection to True. Depending on the noise level you would need a more sophisticated approach: What gives more stable results but is also more complicated to implement would be to check the realtime transcription for changes. If there is no additional text incoming for a while we consider speech is now finished and end recording. You can find an example implementation here. |
Beta Was this translation helpful? Give feedback.
If you call abort() the text() method will return an empty string ("").
But I'm not sure if stop() works 100% correctly here. The stop() method was intended to stop a recording which was started with the start() method before. So for manual recording. We use it here for recording that was started by VAD, using the text() method.
The abort() method was intended to quick end such a VAD initiated recording without causing a final transcription. So it's not really the right approach here, but tbh the stop method isn't either, because it will initiate the transcription but it leaves the text method in an undefined state. I need to think about that, maybe calling abort() after stop() method hel…