Skip to content

Models to our paper "Speaker Anonymization with Phonetic Intermediate Representations"

Compare
Choose a tag to compare
@SarinaMeyer SarinaMeyer released this 14 Sep 11:33

This release contains all models as described in our paper "Speaker Anonymization with Phonetic Intermediate Representations".

There are three anonymization models (pool, pool raw and random), three ASR models (phones, STT and TTS), and four FastSpeech2 TTS models (trained_on_ground_truth_phonemes, trained_on_asr_phoneme_outputs, trained_on_libri600_asr_phoneme_outputs and trained_on_libri600_ground_truth_phonemes) together with one HiFiGAN model (best). The models for anonymization, TTS and ASR are released as grouped zip folders to ensure that they are placed in the required directory structure as given in the run_inference.py. If you decide for a different structure, you need to change it accordingly in run_inference.py.

Place the unzipped folders in a models directory located directly under root. So, the structure should look like follows:

speaker-anonymization
   └─ models
        └─ anonymization
            └─ pool_minmax_ecapa+xvector
            └─ pool_raw_ecapa+xvector
            └─ random_in-scale_ecapa+xvector
        └─ asr
            └─ asr_stt_en.zip
            └─ asr_tts_en.zip
            └─ asr_tts-phn_en.zip
       └─ tts
            └─ FastSpeech2_Multi
                └─ trained_on_ground_truth_phonemes.pt
                └─ trained_on_asr_phoneme_outputs.pt
                └─ trained_on_libri600_asr_phoneme_outputs.pt
                └─ trained_on_libri600_ground_truth_phonemes.pt
            └─ HiFiGAN_combined
                └─ best.pt

Note: Do not unzip the ASR models but keep them as zip folders! They will be unzipped during runtime.