-
Notifications
You must be signed in to change notification settings - Fork 194
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Question] Ideal/oracle performance of source estimate + mix phase #83
Comments
OK, it seems to be working. Here's a piece of code, hacked together from https://github.com/sigsep/sigsep-mus-oracle/blob/master/IRM.py and unmix: def atan2(y, x):
r"""Element-wise arctangent function of y/x.
copied from umx, replace torch with np
"""
pi = 2 * np.arcsin(1.0)
x += ((x == 0) & (y == 0)) * 1.0
out = np.arctan(y / x)
out += ((y >= 0) & (x < 0)) * pi
out -= ((y < 0) & (x < 0)) * pi
out *= 1 - ((y > 0) & (x == 0)) * 1.0
out += ((y > 0) & (x == 0)) * (pi / 2)
out *= 1 - ((y < 0) & (x == 0)) * 1.0
out += ((y < 0) & (x == 0)) * (-pi / 2)
return out
def ideal_mixphase(track, eval_dir=None):
"""
ideal performance of magnitude from estimated source + phase of mix
which is the default umx strategy for separation
"""
X = stft(track.audio.T, nperseg=4096, noverlap=1024)[-1].astype(np.complex64)
(I, F, T) = X.shape
# Compute sources spectrograms
P = {}
# compute model as the sum of spectrograms
model = eps
# parallelize this
for name, source in track.sources.items():
# compute spectrogram of target source:
# magnitude of STFT
src_coef = stft(source.audio.T, nperseg=4096, noverlap=1024)[-1].astype(np.complex64)
P[name] = np.abs(src_coef)
# store the original, not magnitude, in the mix
model += src_coef
# now performs separation
estimates = {}
accompaniment_source = 0
for name, source in track.sources.items():
source_mag = P[name]
# get mix phase/angle
mix_phase = atan2(model.imag, model.real)
# use source magnitude estimate + mix phase
Yj = np.multiply(source_mag, np.cos(mix_phase)) + 1j*np.multiply(source_mag, np.sin(mix_phase))
# invert to time domain
target_estimate = istft(Yj, nperseg=self.nperseg, noverlap=self.noverlap)[1].T[:self.N, :].astype(np.float32)
# set this as the source estimate
estimates[name] = target_estimate
# accumulate to the accompaniment if this is not vocals
if name != 'vocals':
accompaniment_source += target_estimate
estimates['accompaniment'] = accompaniment_source
bss_scores = museval.eval_mus_track(
track,
estimates,
output_dir=eval_dir,
).scores
return estimates, bss_scores The maximum SDR of the "oracle mix phase" is lower than soft masking. Is that expected? |
it's a very interesting idea, I like it could you provide numbers ? how is it behaving compared to the other oracles ? |
Open-Unmix is not the first time I've seen the source estimate magnitude + mix phase inversion. It's also used in the CDAE source separation algorithm (https://arxiv.org/abs/1703.08019) but I'm still curious why it is preferred to soft masking. I will upload my code to generate the above results (it mostly just wraps sigsep tools) in a cleanly reproducible separate repo so I can link it here. I might be doing something wrong in my code somewhere. |
Here: https://github.com/sevagh/mss-oracle-experiments#oracle-performance-of-mpi-mix-phase-inversion Apologies if there is a lot of irrelevant code (related to the NSGT), but I hope the specific part of the new "Mixed Phase Inversion" oracle makes sense and is reproducible. |
Also, I suppose SDR is not necessarily the king of metrics - we can see dramatically better ISR on the mix-phase (but that could be a consequence of its reduced separation/SDR/SIR/SAR). Also maybe mix-phase is more "robust" to worse estimates? |
Hello,
I've been interested in running various oracle benchmark methods to check if different types of spectrogram (CQT, etc.) can be useful for source separation.
Initially, I was working with the IRM1/2 and IBM1/2 from https://github.com/sigsep/sigsep-mus-oracle
However I noticed that Open-Unmix uses the strategy of "estimate of source magnitude + phase of original mix" (but it has an option to use soft masking instead). Is it valuable to create an "oracle phase-inversion" method?
So, soft mask/IRM1 "ceiling" of performance (the known IRM1 oracle mask calculation) is like (using vocals stem as an example):
Now, for the phase inversion method, we could do the following:
Does this make sense to do? Has anybody done this before? What could this method be called?
The text was updated successfully, but these errors were encountered: