Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Audio transcription in supervisor route #19

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

Luisotee
Copy link
Collaborator

Audio Transcription Support for Supervisor Route

Overview

Add audio transcription capabilities to the supervisor route using Groq's Whisper V3 Turbo API. This centralizes audio processing in the AI API service, eliminating the need for individual transcription handling in client integrations (simulator, WhatsApp, Telegram).

Technical Changes

  • Added support for multiple audio formats:
    • Direct support: mp3, mp4, mpeg, mpga, m4a, wav, webm
    • Conversion support: ogg -> mp3
  • Implemented audio file handling with temporary storage
  • Integrated Groq's Whisper V3 Turbo for transcription
  • Added content type detection and validation
  • Centralized error handling for audio processing

Dependencies

  • Added python-multipart for form data handling
  • Added python-ffmpeg for audio conversion
  • Added groq for Whisper API access

Configuration

Requires GROQ_API_KEY environment variable

Benefits

  • Centralized audio processing
  • Consistent transcription quality
  • Reduced implementation complexity in clients
  • Unified error handling

@Luisotee Luisotee added the feature New feature label Nov 27, 2024
This was linked to issues Nov 27, 2024
@luandro
Copy link
Contributor

luandro commented Nov 27, 2024

@Luisotee, it's working great, amazing job! Just don't forget to add the packages you use. For example:

uv add ffmpeg python-multipart

Which will automatically add to the pyproject.toml file and will reflect on every run of the the project. Added on my commit.

A second comment is that the route /api/supervisor/supervisor isn't ideal. Might be a good opportunity to change to /api/classifier or something else.

@Luisotee
Copy link
Collaborator Author

@luandro should be fine now

@luandro
Copy link
Contributor

luandro commented Nov 28, 2024

@Luisotee when testing on the docs page, "send empty value" works when set for message, but for some reason when setting empty value for message an error is throw:

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate text-to-speech Intent classification
2 participants