FastSpeech2 implementation with Mid-attribute speaker generation

Forked from https://github.com/Wataru-Nakata/FastSpeech2-JSUT

GE2E module from https://github.com/Aria-K-Alethia/Multilingual-Speaker-Encoder-with-Domain-Adaptation/tree/main

TacoSpawn: https://arxiv.org/abs/2111.05095

Mid-attribute speaker generation: https://arxiv.org/abs/2210.09916

How to setup

Download corpora

JSUT: https://sites.google.com/site/shinnosuketakamichi/publication/jsut JVS: https://sites.google.com/site/shinnosuketakamichi/research-topics/jvs_corpus VCTK: http://www.udialogue.org/ja/download-ja/cstr-vctk-corpus.html

Setup environment

git submodule update --init
unzip hifigan/generator_universal.pth.tar.zip -d hifigan/

# Setup virtual environment before pip install if you want
pip install -r requirements.txt

Do preprocessing

Before preprocessing, overwrite corpus_path of preprocess_*.yaml.

In this version, a configuration for using the GE2E loss is necessary. Therefore, instead of using the config/JVS-VCTK configuration, you should use any of the configurations under config/JVS-VCTK_langemb_configs.

Prepare TextGrid

For JSUT

mkdir -p raw_data/JSUT/JSUT
cp path/to/JSUT/*/wav/*.wav raw_data/JSUT/JSUT
python retriever/retrieve_transcripts_jsut.py
python prepare_tg_accent_jsut.py jsut-lab/ preprocessed_data/JSUT/ JSUT --with_accent True

For JVS

The prepare_tg_accent_jvs.py script modifies the time formats of .lab.

mkdir -p raw_data/JVS
python retriever/retrieve_jvs.py
python prepare_tg_accent_jvs.py config/JVS-VCTK/

For VCTK

You have to prepare .lab by yourself. If you want to use prepare_tg_hts.py, you should prepare HTK/HTS-style .lab with the directory structure below:

.
└ lab
  └ VCTK
      ├ p225(speaker ID)
      |  └ labels
      |     ├ p225_001(utterance ID).lab
      |     ├ p225_002.lab
      |     ⋮
      |     └ p225_366.lab
      ├ p226/
      ⋮
      └ p376/

mkdir -p raw_data/VCTK
python retriever/retrieve_vctk.py
python prepare_tg_accent_hts.py config/JVS-VCTK VCTK

Prepare other features (pitch, duration, energy)

python preprocess.py config/JVS-VCTK

Train

python train.py config/JVS-VCTK

Synthesize from existent speaker

python3 synthesize.py --text "音声合成、たのちい" --speaker_id 0 --restore_step 20000 --mode single -c config/JVS-VCTK

Synthesize from non-existent speaker

Under construction (examples_gen.py may help you)

Memo

JVS has some wrong alignments. Remove or fix them before training.

jvs070-VOICEACTRESS100_001

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
Multilingual-Speaker-Encoder-with-Domain-Adaptation		Multilingual-Speaker-Encoder-with-Domain-Adaptation
config		config
hifigan		hifigan
img		img
jsut-lab @ f5dea7c		jsut-lab @ f5dea7c
lexicon		lexicon
model		model
phoneme_alignment_hts		phoneme_alignment_hts
preprocessor		preprocessor
retriever		retriever
text		text
transformer		transformer
utils		utils
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
check.py		check.py
convert_label.py		convert_label.py
convert_label_jvs.py		convert_label_jvs.py
dataset.py		dataset.py
evaluate.py		evaluate.py
examples_gen.py		examples_gen.py
examples_gen_distri.py		examples_gen_distri.py
gather.py		gather.py
graph2phone_lab.py		graph2phone_lab.py
index.html		index.html
jvs_speaker.py		jvs_speaker.py
prepare_align.py		prepare_align.py
prepare_align_hts.py		prepare_align_hts.py
prepare_tg_accent_jsut.py		prepare_tg_accent_jsut.py
prepare_tg_accent_jvs.py		prepare_tg_accent_jvs.py
prepare_tg_hts.py		prepare_tg_hts.py
preprocess.py		preprocess.py
requirements.txt		requirements.txt
retrieve.py		retrieve.py
speaker_gen.py		speaker_gen.py
synth.sh		synth.sh
synthesize.py		synthesize.py
synthesize_from_speaker.py		synthesize_from_speaker.py
train.py		train.py
train.sh		train.sh
train_ganlike.py		train_ganlike.py
train_ganlike.sh		train_ganlike.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FastSpeech2 implementation with Mid-attribute speaker generation

How to setup

Download corpora

Setup environment

Do preprocessing

Prepare TextGrid

Prepare other features (pitch, duration, energy)

Train

Synthesize from existent speaker

Synthesize from non-existent speaker

Memo

About

Releases

Packages

Contributors 2

Languages

License

sarulab-speech/Mid-Attribute-Speaker-Generation

Folders and files

Latest commit

History

Repository files navigation

FastSpeech2 implementation with Mid-attribute speaker generation

How to setup

Download corpora

Setup environment

Do preprocessing

Prepare TextGrid

Prepare other features (pitch, duration, energy)

Train

Synthesize from existent speaker

Synthesize from non-existent speaker

Memo

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages