Speaking Bot

This repository expands upon Pipecat's Python framework for building voice and multimodal conversational agents. Our implementation creates AI meeting agents that can join and participate in Google Meet and Microsoft Teams meetings with distinct personalities and capabilities defined in Markdown files.

Overview

This project extends Pipecat's WebSocket server implementation to create:

Meeting agents that can join Google Meet or Microsoft Teams through the MeetingBaas API
Customizable personas with unique context
Support for running multiple instances locally or at scale

Architecture

Core Framework: Pipecat Integration

Pipecat provides the foundational framework with:

Real-time audio processing pipeline
WebSocket communication
Voice activity detection
Message context management

In this implementation, Pipecat is integrated with Cartesia for speech generation (text-to-speech), Gladia or Deepgram for speech-to-text conversion, and OpenAI's GPT-4 as the underlying LLM.

Project Extensions

Building upon Pipecat, we've added:

Persona system with Markdown-based configuration for:
- Core personality traits and behaviors
- Knowledge base and domain expertise
- Additional contextual information (websites formatted to MD, technical documentation, etc.)
CLI-based creation tool
AI image generation via Replicate
Image hosting through UploadThing (UTFS)
MeetingBaas integration for video meeting platform support
Multi-agent orchestration

Required API Keys

For Pipecat-related Services

OpenAI (LLM)
Cartesia (text-to-speech)
Gladia or Deepgram (speech-to-text)
MeetingBaas (video meeting platform integration)

For Project-specific Add-ons

OpenAI (LLM to complete the user prompt and match to a Cartesia Voice ID)
Replicate (AI image generation)
UploadThing (UTFS) (image hosting)

For speech-related services (TTS/STT) and LLM choice (like Claude, GPT-4, etc), you can freely choose and swap between any of the integrations available in Pipecat's supported services.

Important Note

OpenAI's GPT-4, UploadThing (UTFS), and Replicate are currently hard-coded specifically for the CLI-based persona generation features: matching personas to available voices from Cartesia, generating AI avatars, and creating initial personality descriptions and knowledge bases. You do not need a Replicat or UTFS API key to run the project if you're not using the CLI-based persona creation feature and edit Markdowns manually.

Persona System

Bot Service

Real-time audio processing pipeline
WebSocket-based communication
Tool integration (weather, time)
Voice activity detection
Message context management
Dynamic persona loading from markdown files
Customizable personality traits and behaviors
Support for multiple languages
Voice characteristic customization
Image generation for persona avatars
Metadata management for each persona

Persona Structure

Each persona is defined in the @personas directory with:

A README.md defining their personality
Space for additional markdown files to expand knowledge and behaviour

Example Persona Structure

@personas/
└── quantum_physicist/
    ├── README.md
    └── (additional beVhavior files)

Prerequisites

Python 3.x
grpc_tools for protocol buffer compilation
Ngrok (for local deployment)
Poetry for dependency management

Installation

1. Set Up Poetry Environment

# Install Poetry (Unix/macOS)
curl -sSL https://install.python-poetry.org | python3 -

# Install Poetry (Windows)
(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py -

# Install dependencies
poetry install

# Activate virtual environment
poetry shell

2. Compile Protocol Buffers

poetry run python -m grpc_tools.protoc --proto_path=./protobufs --python_out=./protobufs frames.proto

3. Configure Environment

cp env.example .env

Edit .env with your MeetingBaas credentials.

Running Meeting Agents

Single Agent Deployment

To launch one agent into a meeting:

poetry run python scripts/batch.py -c 1 --meeting-url <your-meeting-url>

Multiple Agent Deployment

To launch two agents simultaneously:

poetry run python scripts/batch.py -c 2 --meeting-url <your-meeting-url>

Local Deployment with Ngrok

For 1-2 agents, use Ngrok to expose your local server:

ngrok start --all --config ~/.config/ngrok/ngrok.yml,./config/ngrok/config.yml

Web Deployment

For more than 2 agents, deploy to a web server to avoid Ngrok limitations.

Future Extensibility

The persona architecture is designed to support:

Scrapping the websites given by the user to MD for the bot knowledge base
Containerizing this nicely

Troubleshooting

Verify Poetry environment is activated
Check Ngrok connection status
Validate environment variables
Ensure unique Ngrok URLs for multiple agents

For more detailed information about specific personas or deployment options, check the respective documentation in the @personas directory.

Troubleshooting WebSocket Connections

Handling Timing Issues with ngrok and Meeting Baas Bots

Sometimes, due to WebSocket connection delays through ngrok, the Meeting Baas bots may join the meeting before your local bot connects. If this happens:

Simply press Enter to respawn your bot
This will reinitiate the connection and allow your bot to join the meeting

This is a normal occurrence and can be easily resolved with a quick bot respawn.

Name		Name	Last commit message	Last commit date
Latest commit History 91 Commits
.github		.github
config		config
images		images
meetingbaas_pipecat		meetingbaas_pipecat
protobufs		protobufs
scripts		scripts
tests		tests
.cursorrules		.cursorrules
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
MULTIPLE.md		MULTIPLE.md
README.md		README.md
TODO.md		TODO.md
__init__.py		__init__.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Speaking Bot

Overview

Architecture

Core Framework: Pipecat Integration

Project Extensions

Required API Keys

For Pipecat-related Services

For Project-specific Add-ons

Important Note

Persona System

Bot Service

Persona Structure

Example Persona Structure

Prerequisites

Installation

1. Set Up Poetry Environment

2. Compile Protocol Buffers

3. Configure Environment

Running Meeting Agents

Single Agent Deployment

Multiple Agent Deployment

Local Deployment with Ngrok

Web Deployment

Future Extensibility

Troubleshooting

Troubleshooting WebSocket Connections

Handling Timing Issues with ngrok and Meeting Baas Bots

About

Releases

Packages

Languages

License

Allenmylath/speaking-meeting-bot

Folders and files

Latest commit

History

Repository files navigation

Speaking Bot

Overview

Architecture

Core Framework: Pipecat Integration

Project Extensions

Required API Keys

For Pipecat-related Services

For Project-specific Add-ons

Important Note

Persona System

Bot Service

Persona Structure

Example Persona Structure

Prerequisites

Installation

1. Set Up Poetry Environment

2. Compile Protocol Buffers

3. Configure Environment

Running Meeting Agents

Single Agent Deployment

Multiple Agent Deployment

Local Deployment with Ngrok

Web Deployment

Future Extensibility

Troubleshooting

Troubleshooting WebSocket Connections

Handling Timing Issues with ngrok and Meeting Baas Bots

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages