minor changes to docs and readme

mozilla-ai · Dec 11, 2024 · 483d44d · 483d44d
1 parent 7116394
commit 483d44d
Show file tree

Hide file tree

Showing 4 changed files with 18 additions and 23 deletions.
diff --git a/README.md b/README.md
@@ -98,17 +98,13 @@ The architecture of this codebase focuses on modularity and adaptability, meanin
 
 ### text-to-text
 
-We are using the [llama.cpp](https://github.com/ggerganov/llama.cpp) library, which supports open source models optimized for local inference and minimal hardware requirements.
+We are using the [llama.cpp](https://github.com/ggerganov/llama.cpp) library, which supports open source models optimized for local inference and minimal hardware requirements. Our default text-to-text model is the open source [OLMoE-7B-Instruct](https://huggingface.co/allenai/OLMoE-1B-7B-0924-Instruct) from [AllenAI](https://allenai.org/).
 
 For the complete list of models supported out-of-the-box, visit this [link](https://github.com/ggerganov/llama.cpp?tab=readme-ov-file#text-only).
 
-Our default text-to-text model is the fully open source [OLMoE-7B-Instruct](https://huggingface.co/allenai/OLMoE-1B-7B-0924-Instruct) from [AllenAI](https://allenai.org/).
-
 ### text-to-speech
 
-We support models from the [OuteAI](https://github.com/edwko/OuteTTS) and [Parler_tts](https://github.com/huggingface/parler-tts) packages.
-
-For a complete list of models visit [Oute HF](https://huggingface.co/collections/OuteAI/outetts-6728aa71a53a076e4ba4817c) (only the GGUF versions) and [Parler HF](https://huggingface.co/collections/parler-tts/parler-tts-fully-open-source-high-quality-tts-66164ad285ba03e8ffde214c).
+We support models from the [OuteAI](https://github.com/edwko/OuteTTS) and [Parler_tts](https://github.com/huggingface/parler-tts) packages. For a complete list of models visit [Oute HF](https://huggingface.co/collections/OuteAI/outetts-6728aa71a53a076e4ba4817c) (only the GGUF versions) and [Parler HF](https://huggingface.co/collections/parler-tts/parler-tts-fully-open-source-high-quality-tts-66164ad285ba03e8ffde214c).
 
 **Important note:** In order to keep the package dependencies as lightweight as possible, only the Oute interface is installed by default. If you want to use the parler models, please also run:
 

diff --git a/docs/customization.md b/docs/customization.md
@@ -54,8 +54,7 @@ Customizing the app:
 Example:
 
 ```python
-PODCAST_PROMPT = """
-SPEAKER_DESCRIPTIONS = {
+SPEAKER_DESCRIPTIONS_OUTE = {
     "1": "A cheerful and animated voice with a fast-paced delivery.",
     "2": "A calm and deep voice, speaking with authority and warmth."
 }

diff --git a/docs/getting-started.md b/docs/getting-started.md
@@ -15,24 +15,20 @@ python -m streamlit run demo/app.py
 
 
 ### 💻  **Option 2: Local Installation**
+1.**Clone the Repository**
 
-1. **Clone the Repository**
-
-   Inside your terminal, run:
-
+Inside your terminal, run:
 ```bash
-git clone https://github.com/mozilla-ai/document-to-podcast.git
-cd document-to-podcast
+   git clone https://github.com/mozilla-ai/document-to-podcast.git
+   cd document-to-podcast
 ```
-
 2. **Install Dependencies**
 
    Inside your terminal, run:
 
 ```bash
 pip install -e .
 ```
-
 3. **Run the Demo**
 
    Inside your terminal, start the Streamlit demo by running:

diff --git a/docs/step-by-step-guide.md b/docs/step-by-step-guide.md
@@ -56,14 +56,10 @@ In this step, the pre-processed text is transformed into a conversational podcas
 
  **1 - Model Loading**
 
-   - The [`model_loader.py`](api.md/#document_to_podcast.inference.model_loaders) module is responsible for loading the `text-to-text` and `text-to-speech` models using the `llama_cpp`, `outetts` and `parler_tts` libraries.
+   - The [`model_loader.py`](api.md/#document_to_podcast.inference.model_loaders) module is responsible for loading the `text-to-text` models using the `llama_cpp` library.
 
    - The function `load_llama_cpp_model` takes a model ID in the format `{org}/{repo}/{filename}` and loads the specified model. This approach of using the `llama_cpp` library supports efficient CPU-based inference, making language models accessible even on machines without GPUs.
 
-   - The function `load_outetts_model` takes a model ID in the format `{org}/{repo}/{filename}` and loads the specified model, either on CPU or GPU, based on the `device` parameter. The parameter `language` also enables to swap between the languages the Oute package supports (as of Dec 2024: `en, zh, ja, ko`)
-
-   - The function `load_parler_tts_model_and_tokenizer` takes a model ID in the format `{repo}/{filename}` and loads the specified model and tokenizer, either on CPU or GPU, based on the `device` parameter.
-
  **2 - Text-to-Text Generation**
 
    - The [`text_to_text.py`](api.md/#document_to_podcast.inference.text_to_text) script manages the interaction with the language model, converting input text into a structured conversational podcast script.
@@ -81,7 +77,15 @@ In this final step, the generated podcast transcript is brought to life as an au
 
 ### ⚙️ **Key Components in this Step**
 
-**1 - Text-to-Speech Audio Generation**
+ **1 - Model Loading**
+
+   - The [`model_loader.py`](api.md/#document_to_podcast.inference.model_loaders) module is responsible for loading the `text-to-speech` models using the `outetts` and `parler_tts` libraries.
+
+   - The function `load_outetts_model` takes a model ID in the format `{org}/{repo}/{filename}` and loads the specified model, either on CPU or GPU, based on the `device` parameter. The parameter `language` also enables to swap between the languages the Oute package supports (as of Dec 2024: `en, zh, ja, ko`)
+
+   - The function `load_parler_tts_model_and_tokenizer` takes a model ID in the format `{repo}/{filename}` and loads the specified model and tokenizer, either on CPU or GPU, based on the `device` parameter.
+
+**2 - Text-to-Speech Audio Generation**
 
    - The [`text_to_speech.py`](api.md/#document_to_podcast.inference.text_to_speech) script converts text into audio using a specified TTS model.
 
@@ -125,7 +129,7 @@ This demo uses [Streamlit](https://streamlit.io/), an open-source Python framewo
 
 - The script uses `load_llama_cpp_model` from `model_loader.py` to load the LLM for generating the podcast script.
 
-- Similarly, `load_parler_tts_model_and_tokenizer` is used to prepare the TTS model and tokenizer for audio generation.
+- Similarly, `load_outetts_model` is used to prepare the TTS model and tokenizer for audio generation.
 
 - These models are cached using `@st.cache_resource` to ensure fast and efficient reuse during app interactions.