Skip to content

decfrr/Text-to-sound-Synthesis-Prompt2Prompt

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

33 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Text-to-sound-Synthesis-Prompt2Prompt

About

structure

This repository includes a intuitive editing framework for text-to-sound genereation. Edits are controlled by text, and it aims to preserve some content of the original output. The framework is based on prompt-to-prompt, and we implemented it in Diffsound. It is tested by using a pre-trained model (please check the setup section).

This project was created as part of the Artificial Intelligence Exercise, one of the student experiments in the Department of Electrical and Electronic Engineering, Department of Information and Communication Engineering, Faculty of Engineering, The University of Tokyo.

For more information, please check the slide (in Japanese).

Note: Due to time constraints, development has not been completed and correct operation cannot be guaranteed.

Contributors

Slide

Slide Page

Setup

  1. Clone this repository
git clone https://github.com/decfrr/Text-to-sound-Synthesis-Prompt2Prompt.git
cd Text-to-sound-Synthesis-Prompt2Prompt
  1. Create environment with Python 3.9
# For pip
pip install -r requirements.txt
# For conda (conda-forge)
conda env create -n diffsound -f requirements.yml
  1. Download pre-trained model (please check the following section) and place them to Diffsound/pre_model
  2. Start text-to-sound.ipynb

Pre-trained Model

  • diffsound_audiocaps.pth

From Text-to-sound-Synthesis readme.md

2022/08/09 We upload trained diffsound model on audiocaps dataset, and the baseline AR model, and the codebook trained on audioset with the size of 512. You can refer to https://pan.baidu.com/s/1R9YYxECqa6Fj1t4qbdVvPQ . The password is lsyr
If you can not open the Baidu disk, please try to refer to PKU disk https://disk.pku.edu.cn:443/link/DA2EAC5BBBF43C9CAB37E0872E50A0E4
More details will be updated as soon as.

Download diffsound/diffsound_audiocaps.pth from PKU disk.

  • last.ckpt

From Text-to-sound-Synthesis readme.md

2022/08/06 We uppoad the pre-trained model on google drive. please refer to https://drive.google.com/drive/folders/193It90mEBDPoyLghn4kFzkugbkF_aC8v?usp=sharing

Download Diffsound/2022-04-24T23-17-27_audioset_codebook256/checkpoints/last.ckpt from Google Drive

  • ViT-B-32.pt
# Require aria2
aria2c https://facevcstandard.blob.core.windows.net/t-shuygu/release_model/VQ-Diffusion/pretrained_model/ViT-B-32.pt --auto-file-renaming=false -o OUTPUT/pretrained_model/ViT-B-32.pt

Reference

This project based on following open source code.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.4%
  • Other 1.6%