Skip to content

lhl/voicechat2

Repository files navigation

voicechat2

A fast, fully local AI Voicechat using WebSockets

voicechat2.webm

Unmute to hear the audio

On an 7900-class AMD RDNA3 card, voice-to-voice latency is in the 1 second range:

On a 4090, using Faster Whisper with faster-distil-whisper-large-v2 we can cut the latency down to as low as 300ms:

voicechat2-fw.webm

You can of course run any model or swap out any of the SRT, LLM, TTS components as you like. For example, you can run whisper.cpp for SRT, or we have a StyleTTS2 server in the test folder for an alternative TTS. For a bit more about this project, see my Hackster.io writeup.

Install

These installation instructions are for Ubuntu LTS and assume you've setup your ROCm or CUDA already.

I recommend you use conda or (my preferred), mamba for environment management. It will make your life easier.

System Prereqs

sudo apt update

# Not strictly required but the helpers we use
sudo apt install byobu curl wget

# Audio processing
sudo apt install espeak-ng ffmpeg libopus0 libopus-dev 

Checkout code

# Create env
mamba create -y -n voicechat2 python=3.11

# Setup
mamba activate voicechat2
git clone https://github.com/lhl/voicechat2
cd voicechat2
pip install -r requirements.txt

llama.cpp

# Build llama.cpp
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
# AMD version
make GGML_HIPBLAS=1 -j 
# Nvidia version
make GGML_CUDA=1 -j 

# Grab your preferred GGUF model
wget https://huggingface.co/bartowski/Meta-Llama-3-8B-Instruct-GGUF/resolve/main/Meta-Llama-3-8B-Instruct-Q4_K_M.gguf

# If you're going to go to the next instruction
cd ..

Some extra convenience scripts for launching:

run-voicechat2.sh - on your GPU machine, tries to launch all servers in separate byobu sessions; update the MODEL variables
remote-tunnel.sh - connect your GPU machine to a jump machine
local-tunnel.sh - connect to the GPU machine via a jump machine

Other AI Voicechat Projects

Speech To Speech

A project released after voicechat2 that uses a similar modular approach but is local device oriented

webrtc-ai-voice-chat

The demo shows a fair amount of latency (~10s) but this project isn't the closest to what we're doing (it uses WebRTC not websockets) from voicechat2 (HF Transformers, Ollama)

june

A console-based local client (HF Transformers, Ollama, Coqui TTS, PortAudio)

GlaDOS

This is a very responsive console-based local-client app that also has VAD and interruption support, plus a really clever hook! (whisper.cpp, llama.cpp, piper, espeak)

local-talking-llm

Another console-based local client, more of a proof of concept but with w/ blog writeup.

BUD-E - natural_voice_assistant

Another console-based local client (FastConformer, HF Transformers, StyleTTS2, espeak)

LocalAIVoiceChat

KoljaB has a number of interesting projects around console-based local clients like RealtimeSTT, RealtimeTTS, Linguflex, etc. (faster_whisper, llama.cpp, Coqui XTTS)

rtvi-web-demo

This is not a local voicechat client, but it does have a neat WebRTC front-end, so might be worth poking around into (Vite/React, Tailwind, Radix)