Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

convert to pkg, reorganize repo #228

Merged
merged 36 commits into from
Oct 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
7c4034f
group files in f5_tts directory
rsxdalv Oct 16, 2024
ddd2b8a
add setup.py
rsxdalv Oct 16, 2024
963d066
use global imports
rsxdalv Oct 17, 2024
943337c
simplify demo
rsxdalv Oct 17, 2024
a7df999
add install directions for library mode
rsxdalv Oct 17, 2024
37a6309
fix old huggingface_hub version constraint
rsxdalv Oct 17, 2024
2c387aa
Merge remote-tracking branch 'upstream/main'
rsxdalv Oct 17, 2024
8d47002
move finetune to package
rsxdalv Oct 17, 2024
50e9d09
change imports to f5_tts.model
rsxdalv Oct 17, 2024
28a22f2
bump version
rsxdalv Oct 17, 2024
8e813b1
Merge remote-tracking branch 'upstream/main'
rsxdalv Oct 20, 2024
06abdd6
fix bad merge
rsxdalv Oct 20, 2024
bf01663
Update inference-cli.py
SWivid Oct 21, 2024
4e8d724
Merge remote-tracking branch 'upstream/main'
rsxdalv Oct 22, 2024
1b1e183
fix HF space
rsxdalv Oct 22, 2024
8f0aeca
reformat
rsxdalv Oct 22, 2024
01e57a4
fix utils.py vocab.txt import
rsxdalv Oct 22, 2024
f6bc097
fix format
rsxdalv Oct 22, 2024
156ab15
Merge remote-tracking branch 'upstream/main'
rsxdalv Oct 22, 2024
e44db1a
adapt README for f5_tts package structure
rsxdalv Oct 22, 2024
285dd08
simplify app.py
rsxdalv Oct 22, 2024
1949837
add gradio.Dockerfile and workflow
rsxdalv Oct 22, 2024
f5b5c1f
refactored for pyproject.toml
ajkessel Oct 22, 2024
c3fe551
refactored for pyproject.toml
ajkessel Oct 22, 2024
1c627ed
added in reference to packaged files
ajkessel Oct 22, 2024
24545b5
use fork for testing docker image
rsxdalv Oct 22, 2024
5af1885
added in reference to packaged files
ajkessel Oct 22, 2024
cc2af83
minor tweaks
ajkessel Oct 22, 2024
330d114
fixed inference-cli.toml path
ajkessel Oct 22, 2024
a7b239f
fixed inference-cli.toml path
ajkessel Oct 22, 2024
4dde328
fixed inference-cli.toml path
ajkessel Oct 22, 2024
ebb9d5c
fixed inference-cli.toml path
ajkessel Oct 22, 2024
2565a6d
refactor eval_infer_batch.py
ajkessel Oct 22, 2024
61c47ad
fix typo
ajkessel Oct 22, 2024
1dae0a6
added eval_infer_batch to scripts
ajkessel Oct 22, 2024
7427e0b
Merge pull request #1 from ajkessel/main
rsxdalv Oct 22, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
61 changes: 61 additions & 0 deletions .github/workflows/publish-docker-image.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,61 @@
name: Create and publish a Docker image

# Configures this workflow to run every time a change is pushed to the branch called `release`.
on:
push:
branches: ['main']

# Defines two custom environment variables for the workflow. These are used for the Container registry domain, and a name for the Docker image that this workflow builds.
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}

# There is a single job in this workflow. It's configured to run on the latest available version of Ubuntu.
jobs:
build-and-push-image:
runs-on: ubuntu-latest
# Sets the permissions granted to the `GITHUB_TOKEN` for the actions in this job.
permissions:
contents: read
packages: write
#
steps:
- name: Checkout repository
uses: actions/checkout@v4
- name: Free Up GitHub Actions Ubuntu Runner Disk Space 🔧
uses: jlumbroso/free-disk-space@main
with:
# This might remove tools that are actually needed, if set to "true" but frees about 6 GB
tool-cache: false

# All of these default to true, but feel free to set to "false" if necessary for your workflow
android: true
dotnet: true
haskell: true
large-packages: false
swap-storage: false
docker-images: false
# Uses the `docker/login-action` action to log in to the Container registry registry using the account and password that will publish the packages. Once published, the packages are scoped to the account defined here.
- name: Log in to the Container registry
uses: docker/login-action@65b78e6e13532edd9afa3aa52ac7964289d1a9c1
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
# This step uses [docker/metadata-action](https://github.com/docker/metadata-action#about) to extract tags and labels that will be applied to the specified image. The `id` "meta" allows the output of this step to be referenced in a subsequent step. The `images` value provides the base name for the tags and labels.
- name: Extract metadata (tags, labels) for Docker
id: meta
uses: docker/metadata-action@9ec57ed1fcdbf14dcef7dfbe97b2010124a938b7
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
# This step uses the `docker/build-push-action` action to build the image, based on your repository's `Dockerfile`. If the build succeeds, it pushes the image to GitHub Packages.
# It uses the `context` parameter to define the build's context as the set of files located in the specified path. For more information, see "[Usage](https://github.com/docker/build-push-action#usage)" in the README of the `docker/build-push-action` repository.
# It uses the `tags` and `labels` parameters to tag and label the image with the output from the "meta" step.
- name: Build and push Docker image
uses: docker/build-push-action@f2a1d5e99d037542a71f64918e516c093c6f3fc4
with:
context: .
file: ./gradio.Dockerfile
push: true
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
48 changes: 42 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -63,11 +63,35 @@ pre-commit run --all-files
Note: Some model components have linting exceptions for E722 to accommodate tensor notation


### As a pip package

```bash
pip install git+https://github.com/SWivid/F5-TTS.git
```

```python
import gradio as gr
from f5_tts.gradio_app import app

with gr.Blocks() as main_app:
gr.Markdown("# This is an example of using F5-TTS within a bigger Gradio app")

# ... other Gradio components

app.render()

main_app.launch()

```

## Prepare Dataset

Example data processing scripts for Emilia and Wenetspeech4TTS, and you may tailor your own one along with a Dataset class in `model/dataset.py`.
Example data processing scripts for Emilia and Wenetspeech4TTS, and you may tailor your own one along with a Dataset class in `f5_tts/model/dataset.py`.

```bash
# switch to the main directory
cd f5_tts

# prepare custom dataset up to your need
# download corresponding dataset first, and fill in the path in scripts

Expand All @@ -83,14 +107,17 @@ python scripts/prepare_wenetspeech4tts.py
Once your datasets are prepared, you can start the training process.

```bash
# switch to the main directory
cd f5_tts

# setup accelerate config, e.g. use multi-gpu ddp, fp16
# will be to: ~/.cache/huggingface/accelerate/default_config.yaml
accelerate config
accelerate launch train.py
```
An initial guidance on Finetuning [#57](https://github.com/SWivid/F5-TTS/discussions/57).

Gradio UI finetuning with `finetune_gradio.py` see [#143](https://github.com/SWivid/F5-TTS/discussions/143).
Gradio UI finetuning with `f5_tts/finetune_gradio.py` see [#143](https://github.com/SWivid/F5-TTS/discussions/143).

### Wandb Logging

Expand Down Expand Up @@ -136,6 +163,9 @@ for change model use `--ckpt_file` to specify the model you want to load,
for change vocab.txt use `--vocab_file` to provide your vocab.txt file.

```bash
# switch to the main directory
cd f5_tts

python inference-cli.py \
--model "F5-TTS" \
--ref_audio "tests/ref_audio/test_en_1_ref_short.wav" \
Expand All @@ -161,27 +191,27 @@ Currently supported features:
You can launch a Gradio app (web interface) to launch a GUI for inference (will load ckpt from Huggingface, you may also use local file in `gradio_app.py`). Currently load ASR model, F5-TTS and E2 TTS all in once, thus use more GPU memory than `inference-cli`.

```bash
python gradio_app.py
python f5_tts/gradio_app.py
```

You can specify the port/host:

```bash
python gradio_app.py --port 7860 --host 0.0.0.0
python f5_tts/gradio_app.py --port 7860 --host 0.0.0.0
```

Or launch a share link:

```bash
python gradio_app.py --share
python f5_tts/gradio_app.py --share
```

### Speech Editing

To test speech editing capabilities, use the following command.

```bash
python speech_edit.py
python f5_tts/speech_edit.py
```

## Evaluation
Expand All @@ -199,6 +229,9 @@ python speech_edit.py
To run batch inference for evaluations, execute the following commands:

```bash
# switch to the main directory
cd f5_tts

# batch inference for evaluations
accelerate config # if not set before
bash scripts/eval_infer_batch.sh
Expand Down Expand Up @@ -234,6 +267,9 @@ pip install faster-whisper==0.10.1

Update the path with your batch-inferenced results, and carry out WER / SIM evaluations:
```bash
# switch to the main directory
cd f5_tts

# Evaluation for Seed-TTS test set
python scripts/eval_seedtts_testset.py

Expand Down
3 changes: 3 additions & 0 deletions app.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
from f5_tts.gradio_app import app

app.queue().launch()
27 changes: 27 additions & 0 deletions gradio.Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
FROM pytorch/pytorch:2.4.0-cuda12.4-cudnn9-devel

USER root

ARG DEBIAN_FRONTEND=noninteractive

LABEL github_repo="https://github.com/rsxdalv/F5-TTS"

RUN set -x \
&& apt-get update \
&& apt-get -y install wget curl man git less openssl libssl-dev unzip unar build-essential aria2 tmux vim \
&& apt-get install -y openssh-server sox libsox-fmt-all libsox-fmt-mp3 libsndfile1-dev ffmpeg \
&& rm -rf /var/lib/apt/lists/* \
&& apt-get clean

WORKDIR /workspace

RUN git clone https://github.com/rsxdalv/F5-TTS.git \
&& cd F5-TTS \
&& pip install --no-cache-dir -r requirements.txt

ENV SHELL=/bin/bash

WORKDIR /workspace/F5-TTS/f5_tts

EXPOSE 7860
CMD python gradio_app.py
10 changes: 0 additions & 10 deletions model/__init__.py

This file was deleted.

52 changes: 52 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
[build-system]
requires = ["setuptools >= 61.0", "setuptools-scm>=8.0"]
build-backend = "setuptools.build_meta"

[project]
name = "f5-tts"
dynamic = ["version"]
description = "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
readme = "README.md"
classifiers = [
"License :: OSI Approved :: MIT License",
"Operating System :: OS Independent",
"Programming Language :: Python :: 3",
]
dependencies = [
"accelerate>=0.33.0",
"cached_path @ git+https://github.com/rsxdalv/cached_path@main",
"click",
"datasets",
"einops>=0.8.0",
"einx>=0.3.0",
"ema_pytorch>=0.5.2",
"gradio",
"jieba",
"librosa",
"matplotlib",
"numpy<=1.26.4",
"pydub",
"pypinyin",
"safetensors",
"soundfile",
"tomli",
"torch>=2.0.0",
"torchaudio>=2.0.0",
"torchdiffeq",
"tqdm>=4.65.0",
"transformers",
"vocos",
"wandb",
"x_transformers>=1.31.14",
]

[[project.authors]]
name = "Yushen Chen and Zhikang Niu and Ziyang Ma and Keqi Deng and Chunhui Wang and Jian Zhao and Kai Yu and Xie Chen"

[project.urls]
Homepage = "https://github.com/SWivid/F5-TTS"

[project.scripts]
"finetune-cli" = "f5_tts.finetune_cli:main"
"inference-cli" = "f5_tts.inference_cli:main"
"eval_infer_batch" = "f5_tts.scripts.eval_infer_batch:main"
Loading
Loading