TTS component #19

Kostis-S-Z · 2024-11-26T17:50:04Z

What's changing

Add TTS component
Add model loader for TTS model (parler)
Add text_to_speech module to transform a string to a waveform
Add script_to_audio module that parses the whole podcast script that the text_to_text module generates and assigns each Speaker text to the corresponding speaker profile thus generating a waveform of the complete multi-turn podcast discussion.
Add config file that stores parameters relevant for the generation of the podcast
Add unit and integration tests for above modules
Add .idea/, .vscode and *.wav files to .gitignore
Update readme installations (use bash script for codespaces, pip for local)
Add specific machine specifications for codespaces VM
Add Troubleshooting section in README

From #26

Add headers and sections to the app.
Use personal fork and add setup.sh to work around parler_tts installing issues.
Load models using streamlit cache resources.
- This prevents reloading on each button click
Generate and display audio for each chunk.

How to test it

Inside codespaces, run:

bash .github/setup.sh

Locally, run

pip install -e .

And then start the app with:

python -m streamlit run demo/app.py

Additional notes for reviewers

I tried parler-tts-large-v1, but for some reason the audio generated was completely distorted.

demo/app.py

src/opennotebookllm/inference/model_loaders.py

src/opennotebookllm/podcast_maker/script_to_audio.py

src/opennotebookllm/inference/text_to_speech.py

tests/unit/inference/test_text_to_speech.py

Improve the TTS model loading process

* fix(demo/app.py): Drop nested button. Background in https://discuss.streamlit.io/t/how-to-do-nested-button-and-print-both-button-outputs/33741 * pre-commit

* Updates to generate audio in chunks * Update spinner * wip demo structure * Use forked parler-tts. Use setup.sh * Update demo * Fix input_text * Use cache_resource * Drop print * Add dividers * Lint

src/opennotebookllm/inference/model_loaders.py

daavoo

Just one minor comment on the docstring

src/document_to_podcast/inference/text_to_speech.py

Kostis-S-Z added 7 commits November 26, 2024 16:45

Add .idea (PyCharm files) to gitignore

ae59d27

Add Audio generation section to demo

8f01d75

Add parler TTS model loader

fd84d63

[WIP] Add script to podcast parser

1c8093e

[WIP] Add text to speech code

d567ed5

[WIP] Add simple unit tests

12178c8

Add .wav files to gitignore

12a622f

Kostis-S-Z linked an issue Nov 26, 2024 that may be closed by this pull request

Audio Generation Component #6

Closed

Kostis-S-Z commented Nov 26, 2024

View reviewed changes

demo/app.py Outdated Show resolved Hide resolved

Fix tiny typo in docs

4084db7

Kostis-S-Z commented Nov 26, 2024

View reviewed changes

src/opennotebookllm/inference/model_loaders.py Outdated Show resolved Hide resolved

Kostis-S-Z commented Nov 26, 2024

View reviewed changes

src/opennotebookllm/podcast_maker/script_to_audio.py Outdated Show resolved Hide resolved

Update default sampling rate

d43334e

daavoo reviewed Nov 27, 2024

View reviewed changes

src/opennotebookllm/inference/text_to_speech.py Outdated Show resolved Hide resolved

daavoo reviewed Nov 27, 2024

View reviewed changes

tests/unit/inference/test_text_to_speech.py Outdated Show resolved Hide resolved

Kostis-S-Z added 7 commits November 27, 2024 12:50

Remove outdated fixture

936f433

Add podcast config fixture

3290a01

Update return type in TTS model loader

62df818

Update TTS code to use pydantic Config

89fb11e

Update tests

f91caf0

Add pydantic config for TTS component

ebb261c

Update tests

61c11cb

daavoo mentioned this pull request Nov 27, 2024

Updating default script generation prompt #20

Merged

Kostis-S-Z added 2 commits November 27, 2024 14:28

Use tmp_path in tests to autoremove generated wav files

51afcf1

Update prompt fixture

98a1f33

Kostis-S-Z marked this pull request as ready for review November 27, 2024 12:33

Kostis-S-Z requested a review from daavoo November 27, 2024 12:33

Fix package imports

bf6abae

daavoo assigned Kostis-S-Z Nov 27, 2024

Update comment docs

e64aa88

Kostis-S-Z and others added 13 commits November 28, 2024 10:00

Add pydantic in project requirements

a0eec86

Use Python's wave to save audio file instead of scipy

04033fc

Update from 6-audio-generation-component

610ac9a

Update comment

f213613

Improve the TTS model loading process

9d03672

Improve the TTS model loading process

Add parler_tts to project dependencies

c0ef8b8

Update TTS part of demo

2e6c8b2

Fix wave module saving wav file

b8a5eeb

Use soundfile instead of wave for saving .wav file

3bdbf78

Fix script format

c35135c

fix(demo/app.py): Drop nested button. (#24)

9ee2c91

* fix(demo/app.py): Drop nested button. Background in https://discuss.streamlit.io/t/how-to-do-nested-button-and-print-both-button-outputs/33741 * pre-commit

Merge branch 'main' into 6-audio-generation-component

efa5708

fix(parse_script_to_waveform): Remove extra quote

dae21b3

Kostis-S-Z mentioned this pull request Dec 2, 2024

Updates to demo to include audio part #26

Merged

daavoo mentioned this pull request Dec 2, 2024

7 blueprint guidance docs #30

Merged

Updates to demo to include audio part (#26)

be82900

* Updates to generate audio in chunks * Update spinner * wip demo structure * Use forked parler-tts. Use setup.sh * Update demo * Fix input_text * Use cache_resource * Drop print * Add dividers * Lint

daavoo reviewed Dec 3, 2024

View reviewed changes

src/opennotebookllm/inference/model_loaders.py Outdated Show resolved Hide resolved

Kostis-S-Z added 6 commits December 3, 2024 13:45

Merge from main

0dd8b78

Add minimum codespaces machine specifications

42c1a44

Add text_to_speech reference in API.md docs

96d88a8

Add note in README about cold start of demo taking long time

76a92f8

Add Troubleshooting section in README

43d9383

Use sample rate from model config instead of hardcoded

b3501c3

Kostis-S-Z requested a review from daavoo December 4, 2024 09:28

Update install instructions readme and docs

fa2225a

daavoo approved these changes Dec 4, 2024

View reviewed changes

src/document_to_podcast/inference/text_to_speech.py Outdated Show resolved Hide resolved

Kostis-S-Z added 2 commits December 4, 2024 11:39

Fix outdated docstring example

3b84710

Fix imports and references of old repo name

35ab5c7

Kostis-S-Z merged commit dccd551 into main Dec 4, 2024
2 checks passed

Kostis-S-Z deleted the 6-audio-generation-component branch December 4, 2024 10:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

TTS component #19

TTS component #19

Kostis-S-Z commented Nov 26, 2024 •

edited

Loading

daavoo left a comment

TTS component #19

TTS component #19

Conversation

Kostis-S-Z commented Nov 26, 2024 • edited Loading

What's changing

How to test it

Additional notes for reviewers

daavoo left a comment

Choose a reason for hiding this comment

Kostis-S-Z commented Nov 26, 2024 •

edited

Loading