This repository expands upon Pipecat's Python framework for building voice and multimodal conversational agents. Our implementation creates AI meeting agents that can join and participate in Google Meet and Microsoft Teams meetings with distinct personalities and capabilities defined in Markdown files.
This project extends Pipecat's WebSocket server implementation to create:
- Meeting agents that can join Google Meet or Microsoft Teams through the MeetingBaas API
- Customizable personas with unique context
- Support for running multiple instances locally or at scale
Pipecat provides the foundational framework with:
- Real-time audio processing pipeline
- WebSocket communication
- Voice activity detection
- Message context management
In this implementation, Pipecat is integrated with Cartesia for speech generation (text-to-speech), Gladia or Deepgram for speech-to-text conversion, and OpenAI's GPT-4 as the underlying LLM.
Building upon Pipecat, we've added:
- Persona system with Markdown-based configuration for:
- Core personality traits and behaviors
- Knowledge base and domain expertise
- Additional contextual information (websites formatted to MD, technical documentation, etc.)
- CLI-based creation tool
- AI image generation via Replicate
- Image hosting through UploadThing (UTFS)
- MeetingBaas integration for video meeting platform support
- Multi-agent orchestration
- OpenAI (LLM)
- Cartesia (text-to-speech)
- Gladia or Deepgram (speech-to-text)
- MeetingBaas (video meeting platform integration)
- OpenAI (LLM to complete the user prompt and match to a Cartesia Voice ID)
- Replicate (AI image generation)
- UploadThing (UTFS) (image hosting)
For speech-related services (TTS/STT) and LLM choice (like Claude, GPT-4, etc), you can freely choose and swap between any of the integrations available in Pipecat's supported services.
OpenAI's GPT-4, UploadThing (UTFS), and Replicate are currently hard-coded specifically for the CLI-based persona generation features: matching personas to available voices from Cartesia, generating AI avatars, and creating initial personality descriptions and knowledge bases. You do not need a Replicat or UTFS API key to run the project if you're not using the CLI-based persona creation feature and edit Markdowns manually.
-
Real-time audio processing pipeline
-
WebSocket-based communication
-
Tool integration (weather, time)
-
Voice activity detection
-
Message context management
-
Dynamic persona loading from markdown files
-
Customizable personality traits and behaviors
-
Support for multiple languages
-
Voice characteristic customization
-
Image generation for persona avatars
-
Metadata management for each persona
Each persona is defined in the @personas
directory with:
- A README.md defining their personality
- Space for additional markdown files to expand knowledge and behaviour
@personas/
└── quantum_physicist/
├── README.md
└── (additional beVhavior files)
- Python 3.x
grpc_tools
for protocol buffer compilation- Ngrok (for local deployment)
- Poetry for dependency management
# Install Poetry (Unix/macOS)
curl -sSL https://install.python-poetry.org | python3 -
# Install Poetry (Windows)
(Invoke-WebRequest -Uri https://install.python-poetry.org -UseBasicParsing).Content | py -
# Install dependencies
poetry install
# Activate virtual environment
poetry shell
poetry run python -m grpc_tools.protoc --proto_path=./protobufs --python_out=./protobufs frames.proto
cp env.example .env
Edit .env
with your MeetingBaas credentials.
To launch one agent into a meeting:
poetry run python scripts/batch.py -c 1 --meeting-url <your-meeting-url>
To launch two agents simultaneously:
poetry run python scripts/batch.py -c 2 --meeting-url <your-meeting-url>
For 1-2 agents, use Ngrok to expose your local server:
ngrok start --all --config ~/.config/ngrok/ngrok.yml,./config/ngrok/config.yml
For more than 2 agents, deploy to a web server to avoid Ngrok limitations.
The persona architecture is designed to support:
- Scrapping the websites given by the user to MD for the bot knowledge base
- Containerizing this nicely
- Verify Poetry environment is activated
- Check Ngrok connection status
- Validate environment variables
- Ensure unique Ngrok URLs for multiple agents
For more detailed information about specific personas or deployment options, check the respective documentation in the @personas
directory.
Sometimes, due to WebSocket connection delays through ngrok, the Meeting Baas bots may join the meeting before your local bot connects. If this happens:
- Simply press
Enter
to respawn your bot - This will reinitiate the connection and allow your bot to join the meeting
This is a normal occurrence and can be easily resolved with a quick bot respawn.