Releases · DigitalPhonetics/speaker-anonymization

14 Mar 14:11

c51e64a

Intermediate Speech Representations for LibriSpeech

This release contains the intermediate representations of linguistic content (phonetic transcription), prosody (pitch, energy, duration), and speaker embedding (GST, trained jointly with TTS) of the pipeline for the LibriSpeech train-clean-360, dev and test data of the VPC 2024. You can significantly reduce the run time of the pipeline by using these precomputed representations instead of computing them from scratch.

Assets 3

04 Jan 21:56

SarinaMeyer

v1.2

e494e84

Models to our paper "Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy"

This release contains all models of our paper "Anonymizing Speech with Generative Adversarial Networks to Preserve
Speaker Privacy".

There are three anonymization models (pool, random, and gan), one ASR model, one FastSpeech 2 and one HifiGAN model for speech synthesis model. All models except the ones for the gan anonymization and the ASR have been part from release v1.0 already.
The models for anonymization, TTS and ASR are released as grouped zip folders to ensure that they are placed in the required directory structure as given in the run_inference.py. If you decide for a different structure, you need to change it accordingly in run_inference.py.

Place the unzipped folders in a models directory located directly under root. So, the structure should look like follows:

speaker-anonymization
   └─ models
        └─ anonymization
            └─ gan
            └─ pool_minmax_ecapa+xvector
            └─ random_in-scale_ecapa+xvector
        └─ asr
            └─ asr_improved_tts-phn_en.zip
       └─ tts
            └─ FastSpeech2_Multi
                └─ trained_on_ground_truth.pt
            └─ HiFiGAN_combined
                └─ best.pt

Note: Do not unzip the ASR models but keep them as zip folders! They will be unzipped during runtime.

Assets 5

28 Oct 14:24

SarinaMeyer

v2.0

318a5ef

Models for using prosody cloning and GAN-generated speaker embeddings Latest

Latest

This release contains all models of our latest pipeline version capable of generating artificial speaker embeddings using a GAN, prosody cloning and prosody modifications using offsets.

Place the unzipped folders in a models directory located directly under root. So, the structure should look like follows:

speaker-anonymization
   └─ models
        └─ anonymization
            └─ gan_style-embed
                └─ settings.json
                └─ style-embed_wgan.pt
        └─ asr
            └─ asr_branchformer_tts-phn_en.zip
       └─ tts
            └─ Aligner
                └─ aligner.pt
            └─ Embedding
                └─ embedding_function.pt
            └─ FastSpeech2_Multi
                └─ prosody_cloning.pt
            └─ HiFiGAN_combined
                └─ best.pt

Note: Do not unzip the ASR models but keep them as zip folders! They will be unzipped during runtime.

Assets 6

14 Sep 11:33

SarinaMeyer

v1.0

be1a719

Models to our paper "Speaker Anonymization with Phonetic Intermediate Representations"

This release contains all models as described in our paper "Speaker Anonymization with Phonetic Intermediate Representations".

There are three anonymization models (pool, pool raw and random), three ASR models (phones, STT and TTS), and four FastSpeech2 TTS models (trained_on_ground_truth_phonemes, trained_on_asr_phoneme_outputs, trained_on_libri600_asr_phoneme_outputs and trained_on_libri600_ground_truth_phonemes) together with one HiFiGAN model (best). The models for anonymization, TTS and ASR are released as grouped zip folders to ensure that they are placed in the required directory structure as given in the run_inference.py. If you decide for a different structure, you need to change it accordingly in run_inference.py.

Place the unzipped folders in a models directory located directly under root. So, the structure should look like follows:

speaker-anonymization
   └─ models
        └─ anonymization
            └─ pool_minmax_ecapa+xvector
            └─ pool_raw_ecapa+xvector
            └─ random_in-scale_ecapa+xvector
        └─ asr
            └─ asr_stt_en.zip
            └─ asr_tts_en.zip
            └─ asr_tts-phn_en.zip
       └─ tts
            └─ FastSpeech2_Multi
                └─ trained_on_ground_truth_phonemes.pt
                └─ trained_on_asr_phoneme_outputs.pt
                └─ trained_on_libri600_asr_phoneme_outputs.pt
                └─ trained_on_libri600_ground_truth_phonemes.pt
            └─ HiFiGAN_combined
                └─ best.pt

Note: Do not unzip the ASR models but keep them as zip folders! They will be unzipped during runtime.

Assets 5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: DigitalPhonetics/speaker-anonymization

Intermediate Speech Representations for LibriSpeech

Models to our paper "Anonymizing Speech with Generative Adversarial Networks to Preserve Speaker Privacy"

Models for using prosody cloning and GAN-generated speaker embeddings

Models to our paper "Speaker Anonymization with Phonetic Intermediate Representations"