-
-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature request] Stream audio from input device in real-time #3
Comments
Woah, this is super cool, thanks for sharing! I see you had to pull in the audio_separator code into your own project and make a bunch of changes to it to make sense for a live stream, which is understandable but also kinda unfortunate as it means any further improvements to this project won't be easy to pull in. @nnyj - how would you feel if I refactored / reintegrated your code into this project, to essentially just add a live mode to |
Hey, thanks for your interest. Feel free to re-integrate into this project, it is open source after all! 😄. Although the POC code is admittedly messy. The idea to implement inferencing in real-time was more out of a curiousity since GPUs has been fast enough to split the stems in multiple times of real-time. A quick test using the UVR GUI, I was able to achieve 5.48x real-time during conversion with an ensemble model (MDX-NET Inst Main + Inst 3 + Kim Vocal 2). The use case is of course endless, I'm a huge fan of instrumental music and enjoy listening to my library without having to convert everything. Anyone could also say, stream any music from online services such as youtube/spotify without needing/having the actual audio files. A quick look at other projects show that there were similar interests/requests:
Source separation has come a long way and I found the MDX-Net models striked a good balance between inference time and audio quality. But I think there is an inherent buffer of at least 1-2 seconds required for the models to do its magic, so it may never achieve "full" realtime. For my use case, that is perfectly fine though. |
Hello @nnyj you have provided a nice summary here. I am also facing the same constraint of minimum 1.5 seconds needed for e.g. spectrogram computation so that the AI has enough temporal/ contextual information to do the separation. Does anyone here have an idea on how to overcome this? this should be possible since there are audio plugins that are "real-time". |
I combined this amazing tool with the real-time implementation by https://github.com/facebookresearch/denoiser . Hopefully this might be useful to someone.
Proof-of-concept:
https://github.com/nnyj/python-audio-separator-live
The text was updated successfully, but these errors were encountered: