From dfaf35acae9068c305fa95bea45f512ff15b25bc Mon Sep 17 00:00:00 2001 From: Chris Carini <6374067+ChrisCarini@users.noreply.github.com> Date: Thu, 4 Apr 2024 18:05:33 -0700 Subject: [PATCH 1/2] fix typos --- README.md | 9 ++++----- 1 file changed, 4 insertions(+), 5 deletions(-) diff --git a/README.md b/README.md index d15654d..a910354 100644 --- a/README.md +++ b/README.md @@ -24,11 +24,11 @@ To clone or edit an unseen voice, VoiceCraft needs only a few seconds of referen ## How to run TTS inference There are two ways: 1. with docker. see [quickstart](#quickstart) -2. without docker. see [envrionment setup](#environment-setup) +2. without docker. see [environment setup](#environment-setup) -When you are inside the docker image or you have installed all dependencies, Checkout [`inference_tts.ipynb`](./inference_tts.ipynb). +When you are inside the docker image, or you have installed all dependencies, checkout [`inference_tts.ipynb`](./inference_tts.ipynb). -If you want to do model development such as training/finetuning, I recommend following [envrionment setup](#environment-setup) and [training](#training). +If you want to do model development such as training/finetuning, I recommend following [environment setup](#environment-setup) and [training](#training). ## QuickStart :star: To try out TTS inference with VoiceCraft, the best way is using docker. Thank [@ubergarm](https://github.com/ubergarm) and [@jayc88](https://github.com/jay-c88) for making this happen. @@ -119,7 +119,7 @@ python phonemize_encodec_encode_hf.py \ --batch_size 32 \ --max_len 30000 ``` -where encodec_model_path is avaliable [here](https://huggingface.co/pyp1/VoiceCraft). This model is trained on Gigaspeech XL, it has 56M parameters, 4 codebooks, each codebook has 2048 codes. Details are described in our [paper](https://jasonppy.github.io/assets/pdfs/VoiceCraft.pdf). If you encounter OOM during extraction, try decrease the batch_size and/or max_len. +where encodec_model_path is available [here](https://huggingface.co/pyp1/VoiceCraft). This model is trained on Gigaspeech XL, it has 56M parameters, 4 codebooks, each codebook has 2048 codes. Details are described in our [paper](https://jasonppy.github.io/assets/pdfs/VoiceCraft.pdf). If you encounter OOM during extraction, try decrease the batch_size and/or max_len. The extracted codes, phonemes, and vocab.txt will be stored at `path/to/store_extracted_codes_and_phonemes/${dataset_size}/{encodec_16khz_4codebooks,phonemes,vocab.txt}`. As for manifest, please download train.txt and validation.txt from [here](https://huggingface.co/datasets/pyp1/VoiceCraft_RealEdit/tree/main), and put them under `path/to/store_extracted_codes_and_phonemes/manifest/`. Please also download vocab.txt from [here](https://huggingface.co/datasets/pyp1/VoiceCraft_RealEdit/tree/main) if you want to use our pretrained VoiceCraft model (so that the phoneme-to-token matching is the same). @@ -160,4 +160,3 @@ We thank Feiteng for his [VALL-E reproduction](https://github.com/lifeiteng/vall ## Disclaimer Any organization or individual is prohibited from using any technology mentioned in this paper to generate or edit someone's speech without his/her consent, including but not limited to government leaders, political figures, and celebrities. If you do not comply with this item, you could be in violation of copyright laws. - From 7ff895f79b68ba7fd03358b37486598cd5cdacc7 Mon Sep 17 00:00:00 2001 From: Chris Carini <6374067+ChrisCarini@users.noreply.github.com> Date: Sat, 1 Jun 2024 03:18:00 -0700 Subject: [PATCH 2/2] Update README.md --- README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/README.md b/README.md index 6e49afc..76553cc 100644 --- a/README.md +++ b/README.md @@ -18,7 +18,7 @@ see [quickstart command line](#quickstart-command-line). When you are inside the docker image or you have installed all dependencies, Checkout [`inference_tts.ipynb`](./inference_tts.ipynb). -If you want to do model development such as training/finetuning, I recommend following [envrionment setup](#environment-setup) and [training](#training). +If you want to do model development such as training/finetuning, I recommend following [environment setup](#environment-setup) and [training](#training). ## News :star: 04/22/2024: 330M/830M TTS Enhanced Models are up [here](https://huggingface.co/pyp1), load them through [`gradio_app.py`](./gradio_app.py) or [`inference_tts.ipynb`](./inference_tts.ipynb)! Replicate demo is up, major thanks to [@chenxwh](https://github.com/chenxwh)!