My own easy-to-understand implementation of the paper Jansson et al., "Singing Voice Separation with Deep U-Net Convolutional Networks" using PyTorch
and librosa
.
-
Put audio files with instrument-only track on the left channel and mixed (with vocal) track on the right channel into the
data
directory. -
Run train.py
- Specify input media in
inference.py
. - Run
inference.py
- The result will be saved as
result.wav
.