Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature request] Stream audio from input device in real-time #3

Open
nnyj opened this issue Jul 9, 2023 · 3 comments
Open

[Feature request] Stream audio from input device in real-time #3

nnyj opened this issue Jul 9, 2023 · 3 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@nnyj
Copy link

nnyj commented Jul 9, 2023

I combined this amazing tool with the real-time implementation by https://github.com/facebookresearch/denoiser . Hopefully this might be useful to someone.

Proof-of-concept:
https://github.com/nnyj/python-audio-separator-live

@beveradb
Copy link
Collaborator

beveradb commented Aug 5, 2023

Woah, this is super cool, thanks for sharing!
What is/was your use case, out of curiosity?

I see you had to pull in the audio_separator code into your own project and make a bunch of changes to it to make sense for a live stream, which is understandable but also kinda unfortunate as it means any further improvements to this project won't be easy to pull in.

@nnyj - how would you feel if I refactored / reintegrated your code into this project, to essentially just add a live mode to audio-separator? I'd of course then add you as a maintainer of this project too so you could push your own updates / continued improvements to it.
Totally fair if you'd prefer to keep your work in your own repo / a separate project, but just thought I'd ask 😄

@nnyj
Copy link
Author

nnyj commented Aug 6, 2023

Hey, thanks for your interest. Feel free to re-integrate into this project, it is open source after all! 😄. Although the POC code is admittedly messy.

The idea to implement inferencing in real-time was more out of a curiousity since GPUs has been fast enough to split the stems in multiple times of real-time. A quick test using the UVR GUI, I was able to achieve 5.48x real-time during conversion with an ensemble model (MDX-NET Inst Main + Inst 3 + Kim Vocal 2).

The use case is of course endless, I'm a huge fan of instrumental music and enjoy listening to my library without having to convert everything. Anyone could also say, stream any music from online services such as youtube/spotify without needing/having the actual audio files.

A quick look at other projects show that there were similar interests/requests:

Source separation has come a long way and I found the MDX-Net models striked a good balance between inference time and audio quality. But I think there is an inherent buffer of at least 1-2 seconds required for the models to do its magic, so it may never achieve "full" realtime. For my use case, that is perfectly fine though.

@beveradb beveradb added the enhancement New feature or request label Sep 22, 2023
@SuperKogito
Copy link

Hello @nnyj

you have provided a nice summary here. I am also facing the same constraint of minimum 1.5 seconds needed for e.g. spectrogram computation so that the AI has enough temporal/ contextual information to do the separation.

Does anyone here have an idea on how to overcome this? this should be possible since there are audio plugins that are "real-time".
Here, the author speaks even of a low latency of 46.4 ms
james34602/SpleeterRT#8

@beveradb beveradb added the help wanted Extra attention is needed label Dec 21, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants