A modern speech-to-text (STT) service with a sleek frontend built using Remix and a powerful backend powered by FastAPI. This project combines state-of-the-art machine learning models with a beautiful, responsive user interface.
- Audio transcription (will implement real time later)
- Speaker diarization (speaker identification)
- Beautiful, animated UI with a modern dark theme
- Multi-language support
- Configurable number of speakers
- Responsive design
- Remix - Modern web framework
- TypeScript - Type-safe JavaScript
- Tailwind CSS - Utility-first CSS framework
- React - UI library
- FastAPI - Modern Python web framework
- Whisper - OpenAI's speech recognition model
- PyAnnote - Speaker diarization
- PyTorch - Machine learning framework
- Node.js >= 20.0.0
- Python >= 3.8
- pip
-
Navigate to the backend directory: bash cd backend
-
Install dependencies:
pip install -r requirements.txt
-
Create a
.env
file with your configuration -
Start the backend server:
uvicorn src.main:app --reload
- Navigate to the frontend directory:
cd frontend
- Install dependencies:
npm install
- Start the frontend server:
npm run dev
The application will be available at http://localhost:5173
The project uses modern development tools and practices:
- ESLint for JavaScript/TypeScript linting
- Ruff for Python linting
- Tailwind CSS for styling
- TypeScript for type safety
- Vite for fast development and building
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Feel free to submit issues and pull requests.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add some amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
- OpenAI's Whisper model for speech recognition
- PyAnnote for speaker diarization
- The Remix team for their excellent web framework