Skip to content

Commit

Permalink
minor changes to docs and readme
Browse files Browse the repository at this point in the history
  • Loading branch information
stefanfrench committed Dec 11, 2024
1 parent 7116394 commit 483d44d
Show file tree
Hide file tree
Showing 4 changed files with 18 additions and 23 deletions.
8 changes: 2 additions & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -98,17 +98,13 @@ The architecture of this codebase focuses on modularity and adaptability, meanin
### text-to-text
We are using the [llama.cpp](https://github.com/ggerganov/llama.cpp) library, which supports open source models optimized for local inference and minimal hardware requirements.
We are using the [llama.cpp](https://github.com/ggerganov/llama.cpp) library, which supports open source models optimized for local inference and minimal hardware requirements. Our default text-to-text model is the open source [OLMoE-7B-Instruct](https://huggingface.co/allenai/OLMoE-1B-7B-0924-Instruct) from [AllenAI](https://allenai.org/).
For the complete list of models supported out-of-the-box, visit this [link](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#text-only).
Our default text-to-text model is the fully open source [OLMoE-7B-Instruct](https://huggingface.co/allenai/OLMoE-1B-7B-0924-Instruct) from [AllenAI](https://allenai.org/).
### text-to-speech
We support models from the [OuteAI](https://github.com/edwko/OuteTTS) and [Parler_tts](https://github.com/huggingface/parler-tts) packages.
For a complete list of models visit [Oute HF](https://huggingface.co/collections/OuteAI/outetts-6728aa71a53a076e4ba4817c) (only the GGUF versions) and [Parler HF](https://huggingface.co/collections/parler-tts/parler-tts-fully-open-source-high-quality-tts-66164ad285ba03e8ffde214c).
We support models from the [OuteAI](https://github.com/edwko/OuteTTS) and [Parler_tts](https://github.com/huggingface/parler-tts) packages. For a complete list of models visit [Oute HF](https://huggingface.co/collections/OuteAI/outetts-6728aa71a53a076e4ba4817c) (only the GGUF versions) and [Parler HF](https://huggingface.co/collections/parler-tts/parler-tts-fully-open-source-high-quality-tts-66164ad285ba03e8ffde214c).
**Important note:** In order to keep the package dependencies as lightweight as possible, only the Oute interface is installed by default. If you want to use the parler models, please also run:
Expand Down
3 changes: 1 addition & 2 deletions docs/customization.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,7 @@ Customizing the app:
Example:

```python
PODCAST_PROMPT = """
SPEAKER_DESCRIPTIONS = {
SPEAKER_DESCRIPTIONS_OUTE = {
"1": "A cheerful and animated voice with a fast-paced delivery.",
"2": "A calm and deep voice, speaking with authority and warmth."
}
Expand Down
12 changes: 4 additions & 8 deletions docs/getting-started.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,24 +15,20 @@ python -m streamlit run demo/app.py


### 💻 **Option 2: Local Installation**
1.**Clone the Repository**

1. **Clone the Repository**

Inside your terminal, run:

Inside your terminal, run:
```bash
git clone https://github.com/mozilla-ai/document-to-podcast.git
cd document-to-podcast
git clone https://github.com/mozilla-ai/document-to-podcast.git
cd document-to-podcast
```

2. **Install Dependencies**

Inside your terminal, run:

```bash
pip install -e .
```

3. **Run the Demo**

Inside your terminal, start the Streamlit demo by running:
Expand Down
18 changes: 11 additions & 7 deletions docs/step-by-step-guide.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,14 +56,10 @@ In this step, the pre-processed text is transformed into a conversational podcas

**1 - Model Loading**

- The [`model_loader.py`](api.md/#document_to_podcast.inference.model_loaders) module is responsible for loading the `text-to-text` and `text-to-speech` models using the `llama_cpp`, `outetts` and `parler_tts` libraries.
- The [`model_loader.py`](api.md/#document_to_podcast.inference.model_loaders) module is responsible for loading the `text-to-text` models using the `llama_cpp` library.

- The function `load_llama_cpp_model` takes a model ID in the format `{org}/{repo}/{filename}` and loads the specified model. This approach of using the `llama_cpp` library supports efficient CPU-based inference, making language models accessible even on machines without GPUs.

- The function `load_outetts_model` takes a model ID in the format `{org}/{repo}/{filename}` and loads the specified model, either on CPU or GPU, based on the `device` parameter. The parameter `language` also enables to swap between the languages the Oute package supports (as of Dec 2024: `en, zh, ja, ko`)

- The function `load_parler_tts_model_and_tokenizer` takes a model ID in the format `{repo}/{filename}` and loads the specified model and tokenizer, either on CPU or GPU, based on the `device` parameter.

**2 - Text-to-Text Generation**

- The [`text_to_text.py`](api.md/#document_to_podcast.inference.text_to_text) script manages the interaction with the language model, converting input text into a structured conversational podcast script.
Expand All @@ -81,7 +77,15 @@ In this final step, the generated podcast transcript is brought to life as an au

### ⚙️ **Key Components in this Step**

**1 - Text-to-Speech Audio Generation**
**1 - Model Loading**

- The [`model_loader.py`](api.md/#document_to_podcast.inference.model_loaders) module is responsible for loading the `text-to-speech` models using the `outetts` and `parler_tts` libraries.

- The function `load_outetts_model` takes a model ID in the format `{org}/{repo}/{filename}` and loads the specified model, either on CPU or GPU, based on the `device` parameter. The parameter `language` also enables to swap between the languages the Oute package supports (as of Dec 2024: `en, zh, ja, ko`)

- The function `load_parler_tts_model_and_tokenizer` takes a model ID in the format `{repo}/{filename}` and loads the specified model and tokenizer, either on CPU or GPU, based on the `device` parameter.

**2 - Text-to-Speech Audio Generation**

- The [`text_to_speech.py`](api.md/#document_to_podcast.inference.text_to_speech) script converts text into audio using a specified TTS model.

Expand Down Expand Up @@ -125,7 +129,7 @@ This demo uses [Streamlit](https://streamlit.io/), an open-source Python framewo

- The script uses `load_llama_cpp_model` from `model_loader.py` to load the LLM for generating the podcast script.

- Similarly, `load_parler_tts_model_and_tokenizer` is used to prepare the TTS model and tokenizer for audio generation.
- Similarly, `load_outetts_model` is used to prepare the TTS model and tokenizer for audio generation.

- These models are cached using `@st.cache_resource` to ensure fast and efficient reuse during app interactions.

Expand Down

0 comments on commit 483d44d

Please sign in to comment.