Is RealtimeSTT able to recognize audio from ffmpeg stream? #137

qbxdp · 2024-10-26T00:48:26Z

qbxdp
Oct 26, 2024

Hello, I have a project where I'm streaming videos from my phone using IP camera and loads this stream on my computer, I want to have a program monitoring this stream, and when I say certain keywords it will then record my next sentence using STT and take a photo of current camera view, then pass these 2 into a local LLM as a prompt.

Right now the picture-taking part is easy using OpenCV, but I can't seem to make the STT runs as planned, I tried Vosks, speech recognition, and Porcupine, but none of these 3 seems able to monitor ffmpeg streaming (Or perhaps I'm doing it wrong since I don't have any experience on voice processing).

Can RealtimeSTT achieve this? Or am I having some sort of misunderstanding on how the voice is processed? Is there any better way to monitor a real-time audio stream?

On a side note, any other suggestions regarding my project idea are welcome!

Answered by KoljaB

Oct 26, 2024

You'll need to convert the audio stream from FFmpeg into PCM WAV 16 kHz and then use feed_audio method. Depending on the actual mp3 format of the chunks the conversion can be rather straightforward (plain mp3) or quite complicated (if the mp3 chunks depend on each other).

If it's easy format conversion can be done with pydub:

from pydub import AudioSegment
segment = AudioSegment.from_file(io.BytesIO(chunk), format="mp3")

Or you can use a ffmpeg cli command to convert.

feed_audio method will require 16 kHz mono PCM chunks of 1024 samples feeded in realtime (chunks have to come in with correct timing). Demo code:

if __name__ == "__main__":
    import threading
    import pyaudio
    from Re…

View full answer

KoljaB · 2024-10-26T08:07:16Z

KoljaB
Oct 26, 2024
Maintainer

You'll need to convert the audio stream from FFmpeg into PCM WAV 16 kHz and then use feed_audio method. Depending on the actual mp3 format of the chunks the conversion can be rather straightforward (plain mp3) or quite complicated (if the mp3 chunks depend on each other).

If it's easy format conversion can be done with pydub:

from pydub import AudioSegment
segment = AudioSegment.from_file(io.BytesIO(chunk), format="mp3")

Or you can use a ffmpeg cli command to convert.

feed_audio method will require 16 kHz mono PCM chunks of 1024 samples feeded in realtime (chunks have to come in with correct timing). Demo code:

if __name__ == "__main__":
    import threading
    import pyaudio
    from RealtimeSTT import AudioToTextRecorder

    # Audio stream configuration constants
    CHUNK = 1024                  # Number of audio samples per buffer
    FORMAT = pyaudio.paInt16      # Sample format (16-bit integer)
    CHANNELS = 1                  # Mono audio
    RATE = 16000                  # Sampling rate in Hz (expected by the recorder)

    # Initialize the audio-to-text recorder without using the microphone directly
    # Since we are feeding audio data manually, set use_microphone to False
    recorder = AudioToTextRecorder(
        use_microphone=False,     # Disable built-in microphone usage
        spinner=False             # Disable spinner animation in the console
    )

    # Event to signal when to stop the threads
    stop_event = threading.Event()

    def feed_audio_thread():
        """Thread function to read audio data and feed it to the recorder."""
        p = pyaudio.PyAudio()

        # Open an input audio stream with the specified configuration
        stream = p.open(
            format=FORMAT,
            channels=CHANNELS,
            rate=RATE,
            input=True,
            frames_per_buffer=CHUNK
        )

        try:
            print("Speak now")
            while not stop_event.is_set():
                # Read audio data from the stream (in the expected format)
                data = stream.read(CHUNK)
                # Feed the audio data to the recorder
                recorder.feed_audio(data)
        except Exception as e:
            print(f"feed_audio_thread encountered an error: {e}")
        finally:
            # Clean up the audio stream
            stream.stop_stream()
            stream.close()
            p.terminate()
            print("Audio stream closed.")

    def recorder_transcription_thread():
        """Thread function to handle transcription and process the text."""
        def process_text(full_sentence):
            """Callback function to process the transcribed text."""
            print("Transcribed text:", full_sentence)
            # Check for the stop command in the transcribed text
            if "stop recording" in full_sentence.lower():
                print("Stop command detected. Stopping threads...")
                stop_event.set()
                recorder.abort()
        try:
            while not stop_event.is_set():
                # Get transcribed text and process it using the callback
                recorder.text(process_text)
        except Exception as e:
            print(f"transcription_thread encountered an error: {e}")
        finally:
            print("Transcription thread exiting.")

    # Create and start the audio feeding thread
    audio_thread = threading.Thread(target=feed_audio_thread)
    audio_thread.daemon = False    # Ensure the thread doesn't exit prematurely
    audio_thread.start()

    # Create and start the transcription thread
    transcription_thread = threading.Thread(target=recorder_transcription_thread)
    transcription_thread.daemon = False    # Ensure the thread doesn't exit prematurely
    transcription_thread.start()

    # Wait for both threads to finish
    audio_thread.join()
    transcription_thread.join()

    print("Recording and transcription have stopped.")
    recorder.shutdown()

1 reply

qbxdp Oct 26, 2024
Author

Tysm!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Is RealtimeSTT able to recognize audio from ffmpeg stream? #137

{{title}}

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Is RealtimeSTT able to recognize audio from ffmpeg stream? #137

qbxdp Oct 26, 2024

Replies: 1 comment · 1 reply

KoljaB Oct 26, 2024 Maintainer

qbxdp Oct 26, 2024 Author

qbxdp
Oct 26, 2024

Replies: 1 comment 1 reply

KoljaB
Oct 26, 2024
Maintainer

qbxdp Oct 26, 2024
Author