This repository includes a intuitive editing framework for text-to-sound genereation. Edits are controlled by text, and it aims to preserve some content of the original output. The framework is based on prompt-to-prompt, and we implemented it in Diffsound. It is tested by using a pre-trained model (please check the setup section).
This project was created as part of the Artificial Intelligence Exercise, one of the student experiments in the Department of Electrical and Electronic Engineering, Department of Information and Communication Engineering, Faculty of Engineering, The University of Tokyo.
For more information, please check the slide (in Japanese).
Note: Due to time constraints, development has not been completed and correct operation cannot be guaranteed.
- Clone this repository
git clone https://github.com/decfrr/Text-to-sound-Synthesis-Prompt2Prompt.git
cd Text-to-sound-Synthesis-Prompt2Prompt
- Create environment with Python 3.9
# For pip
pip install -r requirements.txt
# For conda (conda-forge)
conda env create -n diffsound -f requirements.yml
- Download pre-trained model (please check the following section) and place them to
Diffsound/pre_model
- Start
text-to-sound.ipynb
diffsound_audiocaps.pth
From Text-to-sound-Synthesis readme.md
2022/08/09 We upload trained diffsound model on audiocaps dataset, and the baseline AR model, and the codebook trained on audioset with the size of 512. You can refer to https://pan.baidu.com/s/1R9YYxECqa6Fj1t4qbdVvPQ . The password is lsyr
If you can not open the Baidu disk, please try to refer to PKU disk https://disk.pku.edu.cn:443/link/DA2EAC5BBBF43C9CAB37E0872E50A0E4
More details will be updated as soon as.
Download diffsound/diffsound_audiocaps.pth
from PKU disk.
last.ckpt
From Text-to-sound-Synthesis readme.md
2022/08/06 We uppoad the pre-trained model on google drive. please refer to https://drive.google.com/drive/folders/193It90mEBDPoyLghn4kFzkugbkF_aC8v?usp=sharing
Download Diffsound/2022-04-24T23-17-27_audioset_codebook256/checkpoints/last.ckpt
from Google Drive
ViT-B-32.pt
# Require aria2
aria2c https://facevcstandard.blob.core.windows.net/t-shuygu/release_model/VQ-Diffusion/pretrained_model/ViT-B-32.pt --auto-file-renaming=false -o OUTPUT/pretrained_model/ViT-B-32.pt
This project based on following open source code.