Skip to content

Latest commit

 

History

History
25 lines (18 loc) · 1.65 KB

README.md

File metadata and controls

25 lines (18 loc) · 1.65 KB

Setup Environment

(Tested on a Quadro RTX 5000 with NVIDIA-SMI Driver Version: 535.104.05, CUDA Version: 12.2 on a UBUNTU 22.04)

  1. Build Dockerfile

     docker build -t whispervits-svc .
  2. Enter Docker container

  3. Download the Timbre Encoder: Speaker-Encoder by @mueller91, put best_model.pth.tar into speaker_pretrain/.

  4. Download whisper model whisper-large-v2. Make sure to download large-v2.pt,put it into whisper_pretrain/.

  5. Download hubert_soft model,put hubert-soft-0d54a1f4.pt into hubert_pretrain/.

  6. Download pitch extractor crepe full,put full.pth into crepe/assets.

    Note: crepe full.pth is 84.9 MB, not 6kb

  7. Download trained model lesd5_100.pretrain.pth, and put it into vits_pretrain/.

  8. Make sure you have downloaded the wav_spk_1 folder from the Benchmarking-SGDD repository. Then, run the script.

python convert-TWH-spk1.py /path/to/wav_spk_1

The output will be a folder containing all conversions used on the evaluation. The same that is found on this google drive.