-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
About MAS (monotonic alignment search) #2
Comments
Hi, can you help me understand few things about CFM, if it is not a problem? Add me on discord |
Thanks for sharing. I have one question after looking at the code, is the generation performance good even detaching the posterior output? I thought it would cause overfitting in the latent domain and CFM learning would not work well, but I'm surprised it works. I'm currently using Prior loss and WaveNet-based CFM, but I don't know the performance yet. diff_loss, _ = self.decoder.compute_loss(x1=z_spec.detach(), mask=y_mask, mu=mu_y, spks=spks, cond=cond) Tensorboard에 KST가 나와있어서 한국어로도 질문드립니다 (위는 DeepL 번역본) |
|
If my understanding is correct, z_spec is soley trained by reconstruction loss (hifigan). Is it correct? So I thought detaching z_spec can be viewed as training autoencoder without restriction (such as removing prior in VITS, VQ in NaturalSpeech2) -- resulting in high-variance latent space, which is hard to be estimated by the prior. |
Yes, you are correct. I think I will try both ways - with and without prior loss and check how it goes. |
https://github.com/lmnt-com/wavegrad if wavegrad can do it, we can do it too. CFM directly in the wav dimension? (am i mistaken?) |
I've also implemented an E2E system using a CFM prior (different flow matching architecture instead of 1D-Unet). Despite using the prior loss in Grad-TTS, alignment framework fails to converge. (Prior loss between text encoder output and latent variable z_0) Has anyone managed to solve this problem without using an external aligner (MFA)?
The text was updated successfully, but these errors were encountered: