Models to our paper "Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy"
This release contains all models of our paper "Anonymizing Speech with Generative Adversarial Networks to Preserve
Speaker Privacy".
There are three anonymization models (pool, random, and gan), one ASR model, one FastSpeech 2 and one HifiGAN model for speech synthesis model. All models except the ones for the gan anonymization and the ASR have been part from release v1.0 already.
The models for anonymization, TTS and ASR are released as grouped zip folders to ensure that they are placed in the required directory structure as given in the run_inference.py. If you decide for a different structure, you need to change it accordingly in run_inference.py.
Place the unzipped folders in a models directory located directly under root. So, the structure should look like follows:
speaker-anonymization
└─ models
└─ anonymization
└─ gan
└─ pool_minmax_ecapa+xvector
└─ random_in-scale_ecapa+xvector
└─ asr
└─ asr_improved_tts-phn_en.zip
└─ tts
└─ FastSpeech2_Multi
└─ trained_on_ground_truth.pt
└─ HiFiGAN_combined
└─ best.pt
Note: Do not unzip the ASR models but keep them as zip folders! They will be unzipped during runtime.