T5 Speech from Microsoft

Based on their paper SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing and the original repository on Github we bring you a simple script for using the SpeechT5 model from Microsoft using Python to generate speech from text.

Example code and the model card can be found on the Huggingface model page.

How it Works

The script uses the Huggingface Transformers library to load the model and tokenizer. The model is then used to generate speech from text. The script is very simple and can be easily modified to suit your needs. It loads the file prompt.txt and processes it in batches that contain two lines of input. For every two lines, it outputs a wav file to disk as speech_#.wave where # is the number of the batch. The script will also print the the results to the console as it generates the audio files.

After editing prompt.txt to contain the text you want to generate speech from, you can run the script like this;

python3 app.py

You'll end up with files such as;

speech_0.wav
speech_1.wav
speech_2.wav

If you need a combined MP3 file of all the generated audio, you can use the mp3.py script to combine each of the files output by the model into a single MP3 file (output.mp3). You can run it like this;

python3 mp3.py

NOTE: This script will overwrite any existing `speech_#.wav` files in the directory!

We do not archive files, we overwrite them each run to keep the script simple. If you want to keep the files, move them to another directory before running the script again.

Write your script in prompt.txt
Run python3 app.py
Optionally preview speech_#.wav files
Run python3 mp3.py to combine all the speech_#.wav files into a single output.mp3 file
Optionally preview output.mp3 file
Archive prompt.txt, speech_#.wav and output.mp3 files into their own directory.

A future version may simply create a directory for each run and archive the files there by UUID.

Requirements

You'll need to install the Python libraries;

pip install -r requirements.txt

This will load the following libraries - use a virtual environment if you want to keep your system clean;

transformers
numpy
torch
datasets
transformers
accelerate
soundfile
pathlib
pydub
sentencepiece

License

This code is released fully under a GNU GPL v3 license.

See the Free Software Foundation's page for more information.

Credits

Microsoft
Huggingface
The Henzi Foundation

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

T5 Speech from Microsoft

How it Works

NOTE: This script will overwrite any existing `speech_#.wav` files in the directory!

Requirements

License

Credits

Files

README.md

Latest commit

History

README.md

File metadata and controls

T5 Speech from Microsoft

How it Works

NOTE: This script will overwrite any existing speech_#.wav files in the directory!

Requirements

License

Credits

NOTE: This script will overwrite any existing `speech_#.wav` files in the directory!