[Feature request] Stream audio from input device in real-time #3

nnyj · 2023-07-09T16:48:05Z

I combined this amazing tool with the real-time implementation by https://github.com/facebookresearch/denoiser . Hopefully this might be useful to someone.

Proof-of-concept:
https://github.com/nnyj/python-audio-separator-live

beveradb · 2023-08-05T21:54:56Z

Woah, this is super cool, thanks for sharing!
What is/was your use case, out of curiosity?

I see you had to pull in the audio_separator code into your own project and make a bunch of changes to it to make sense for a live stream, which is understandable but also kinda unfortunate as it means any further improvements to this project won't be easy to pull in.

@nnyj - how would you feel if I refactored / reintegrated your code into this project, to essentially just add a live mode to audio-separator? I'd of course then add you as a maintainer of this project too so you could push your own updates / continued improvements to it.
Totally fair if you'd prefer to keep your work in your own repo / a separate project, but just thought I'd ask 😄

nnyj · 2023-08-06T01:57:05Z

Hey, thanks for your interest. Feel free to re-integrate into this project, it is open source after all! 😄. Although the POC code is admittedly messy.

The idea to implement inferencing in real-time was more out of a curiousity since GPUs has been fast enough to split the stems in multiple times of real-time. A quick test using the UVR GUI, I was able to achieve 5.48x real-time during conversion with an ensemble model (MDX-NET Inst Main + Inst 3 + Kim Vocal 2).

The use case is of course endless, I'm a huge fan of instrumental music and enjoy listening to my library without having to convert everything. Anyone could also say, stream any music from online services such as youtube/spotify without needing/having the actual audio files.

A quick look at other projects show that there were similar interests/requests:

Source separation has come a long way and I found the MDX-Net models striked a good balance between inference time and audio quality. But I think there is an inherent buffer of at least 1-2 seconds required for the models to do its magic, so it may never achieve "full" realtime. For my use case, that is perfectly fine though.

SuperKogito · 2023-12-08T15:27:26Z

Hello @nnyj

you have provided a nice summary here. I am also facing the same constraint of minimum 1.5 seconds needed for e.g. spectrogram computation so that the AI has enough temporal/ contextual information to do the separation.

Does anyone here have an idea on how to overcome this? this should be possible since there are audio plugins that are "real-time".
Here, the author speaks even of a low latency of 46.4 ms
james34602/SpleeterRT#8

beveradb added the enhancement New feature or request label Sep 22, 2023

beveradb added the help wanted Extra attention is needed label Dec 21, 2023

beveradb mentioned this issue Jan 12, 2024

How to increase speed? #32

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature request] Stream audio from input device in real-time #3

[Feature request] Stream audio from input device in real-time #3

nnyj commented Jul 9, 2023

beveradb commented Aug 5, 2023 •

edited

Loading

nnyj commented Aug 6, 2023

SuperKogito commented Dec 8, 2023

[Feature request] Stream audio from input device in real-time #3

[Feature request] Stream audio from input device in real-time #3

Comments

nnyj commented Jul 9, 2023

beveradb commented Aug 5, 2023 • edited Loading

nnyj commented Aug 6, 2023

SuperKogito commented Dec 8, 2023

beveradb commented Aug 5, 2023 •

edited

Loading