This project demonstrates how to build a website summarization pipeline using Indexify. The pipeline scrapes a website, summarizes its content, and generates an audio version of the summary.
- Website content scraping
- Content summarization using OpenAI's GPT-4
- Text-to-speech generation using ElevenLabs
We define functions to do the following tasks -
- Website Scraping:
- Uses
httpx
to fetch the content of a given URL.
- Uses
- Content Summarization:
- Utilizes OpenAI's GPT-4 to generate a concise summary of the website content.
- Text-to-Speech Generation:
- Employs ElevenLabs' API to convert the summary into an audio file.
The functions are laid out in the graph as follows:
scrape_website -> summarize_website -> generate_tts
- Python 3.9+
- Docker and Docker Compose (for containerized setup)
- OpenAI API key
- ElevenLabs API key
-
Clone this repository:
git clone https://github.com/tensorlakeai/indexify cd indexify/examples/website_audio_summary
-
Create a virtual environment and activate it:
python -m venv venv source venv/bin/activate
-
Install the required dependencies:
pip install -r requirements.txt
-
Set up environment variables:
export OPENAI_API_KEY=your_openai_api_key export ELEVENLABS_API_KEY=your_elevenlabs_api_key
-
Run the main script:
python workflow.py --mode in-process-run
-
Clone this repository:
git clone https://github.com/tensorlakeai/indexify cd indexify/examples/website_audio_summary
-
Build the Docker images:
indexify-cli build-image workflow.py scrape_website indexify-cli build-image workflow.py summarize_website indexify-cli build-image workflow.py generate_tts
-
Create a
.env
file in the project directory and add your API keys:OPENAI_API_KEY=your_openai_api_key ELEVENLABS_API_KEY=your_elevenlabs_api_key
-
Start the services:
docker-compose up --build
-
Deploy the graph:
python workflow.py --mode remote-deploy
-
Run the workflow:
python workflow.py --mode remote-run
- Modify the
url
variable in therun_workflow()
function to summarize different websites. - Adjust the summarization prompt in
summarize_website()
for different summary styles. - Change the voice in
generate_tts()
to use different ElevenLabs voices.