Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Ideal/oracle performance of source estimate + mix phase #83

Open
sevagh opened this issue May 7, 2021 · 6 comments
Open

Comments

@sevagh
Copy link

sevagh commented May 7, 2021

Hello,
I've been interested in running various oracle benchmark methods to check if different types of spectrogram (CQT, etc.) can be useful for source separation.
Initially, I was working with the IRM1/2 and IBM1/2 from https://github.com/sigsep/sigsep-mus-oracle

However I noticed that Open-Unmix uses the strategy of "estimate of source magnitude + phase of original mix" (but it has an option to use soft masking instead). Is it valuable to create an "oracle phase-inversion" method?

So, soft mask/IRM1 "ceiling" of performance (the known IRM1 oracle mask calculation) is like (using vocals stem as an example):

mix = <load mix>                          # mixed track
vocals_gt = <load vocals stem>   # ground truth

vocals_irm1 = abs(stft(vocals_gt)) / abs(stft(mix))

vocals_est = istft(vocals_irm1 * stft(mix)) # estimate after "round trip" through soft mask

Now, for the phase inversion method, we could do the following:

mix = <load mix>                          # mixed track
vocals_gt = <load vocals stem>   # ground truth

mix_phase = phase(stft(mix))
vocals_gt_magnitude = abs(stft(vocals_gt))

vocals_stft = pol2cart(vocals_gt_magnitude, mix_phase)

vocals_est = istft(vocals_stft)  # estimate after "round trip" through phase inversion

Does this make sense to do? Has anybody done this before? What could this method be called?

@sevagh
Copy link
Author

sevagh commented May 7, 2021

OK, it seems to be working. Here's a piece of code, hacked together from https://github.com/sigsep/sigsep-mus-oracle/blob/master/IRM.py and unmix:

def atan2(y, x):
    r"""Element-wise arctangent function of y/x.
    copied from umx, replace torch with np
    """
    pi = 2 * np.arcsin(1.0)
    x += ((x == 0) & (y == 0)) * 1.0
    out = np.arctan(y / x)
    out += ((y >= 0) & (x < 0)) * pi
    out -= ((y < 0) & (x < 0)) * pi
    out *= 1 - ((y > 0) & (x == 0)) * 1.0
    out += ((y > 0) & (x == 0)) * (pi / 2)
    out *= 1 - ((y < 0) & (x == 0)) * 1.0
    out += ((y < 0) & (x == 0)) * (-pi / 2)
    return out


def ideal_mixphase(track, eval_dir=None):
    """
    ideal performance of magnitude from estimated source + phase of mix
    which is the default umx strategy for separation
    """

    X = stft(track.audio.T, nperseg=4096, noverlap=1024)[-1].astype(np.complex64)

    (I, F, T) = X.shape

    # Compute sources spectrograms
    P = {}
    # compute model as the sum of spectrograms
    model = eps

    # parallelize this
    for name, source in track.sources.items():
        # compute spectrogram of target source:
        # magnitude of STFT
        src_coef = stft(source.audio.T, nperseg=4096, noverlap=1024)[-1].astype(np.complex64)

        P[name] = np.abs(src_coef)

        # store the original, not magnitude, in the mix
        model += src_coef

    # now performs separation
    estimates = {}
    accompaniment_source = 0
    for name, source in track.sources.items():
        source_mag = P[name]

        # get mix phase/angle
        mix_phase = atan2(model.imag, model.real)

        # use source magnitude estimate + mix phase
        Yj = np.multiply(source_mag, np.cos(mix_phase)) + 1j*np.multiply(source_mag, np.sin(mix_phase))

        # invert to time domain
        target_estimate = istft(Yj, nperseg=self.nperseg, noverlap=self.noverlap)[1].T[:self.N, :].astype(np.float32)

        # set this as the source estimate
        estimates[name] = target_estimate

        # accumulate to the accompaniment if this is not vocals
        if name != 'vocals':
            accompaniment_source += target_estimate

    estimates['accompaniment'] = accompaniment_source

    bss_scores = museval.eval_mus_track(
        track,
        estimates,
        output_dir=eval_dir,
    ).scores

    return estimates, bss_scores

The maximum SDR of the "oracle mix phase" is lower than soft masking. Is that expected?

@aliutkus
Copy link
Member

aliutkus commented May 7, 2021

it's a very interesting idea, I like it

could you provide numbers ? how is it behaving compared to the other oracles ?

@sevagh
Copy link
Author

sevagh commented May 7, 2021

It's pretty underwhelming. Here is an evaluation of 4 tracks from the MUSDB18-HQ test set, with IRM1, IRM2, IBM1, IBM2, and the new one, "MPI" (Mixed Phase Inversion), with the Open-Unmix STFT settings (window = 4096, hop = 1024):
image

@sevagh
Copy link
Author

sevagh commented May 7, 2021

Open-Unmix is not the first time I've seen the source estimate magnitude + mix phase inversion. It's also used in the CDAE source separation algorithm (https://arxiv.org/abs/1703.08019) but I'm still curious why it is preferred to soft masking.

I will upload my code to generate the above results (it mostly just wraps sigsep tools) in a cleanly reproducible separate repo so I can link it here. I might be doing something wrong in my code somewhere.

@sevagh
Copy link
Author

sevagh commented May 8, 2021

Here: https://github.com/sevagh/mss-oracle-experiments#oracle-performance-of-mpi-mix-phase-inversion

Apologies if there is a lot of irrelevant code (related to the NSGT), but I hope the specific part of the new "Mixed Phase Inversion" oracle makes sense and is reproducible.

@sevagh
Copy link
Author

sevagh commented May 9, 2021

Also, I suppose SDR is not necessarily the king of metrics - we can see dramatically better ISR on the mix-phase (but that could be a consequence of its reduced separation/SDR/SIR/SAR).

Also maybe mix-phase is more "robust" to worse estimates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants