Vision-AI is a project designed to promote independence and inclusivity for visually impaired individuals. The integration of the Llava model, cameras, and Raspberry Pi aims to revolutionize the way visually impaired individuals interact with computers. The project addresses the challenges faced by the visually impaired community, offering a technology-driven solution that enhances accessibility and well-being.
Audio Interaction 🔊 : The project utilizes speech recognition and text-to-speech technologies to enable audio interaction with the system.
Firebase Integration: Firebase services are employed for tasks such as storing captured images, generating download URLs, and analyzing user feedback through Firebase analytics.
Llava Model 📈: The project leverages the power of the Llava model for processing images and providing relevant information.
Real-time Image Capture 📷: Vision-AI captures real-time images using the Raspberry Pi camera module.
How It Works
- Audio Input: Users can interact with the system using voice commands such as "Hey Vision" or "Hello Vision."
- Image Capture: Upon receiving the command, the system captures a real-time image using the Raspberry Pi camera.
- Prompt Recording: The user is prompted to provide additional information or ask questions related to the captured image.
- Firebase Integration: The captured image is uploaded to Firebase storage, and a download URL is generated.
- Llava Model Processing 📈: The Llava model is invoked with the image URL and user prompt to gather relevant information.
- Text-to-Speech Output: The output from the Llava model is converted to speech using text-to-speech technology for user feedback.
- User Feedback and Analytics: User satisfaction and interaction data are collected through feedback forums and Firebase analytics.
export REPLICATE_API_TOKEN= <REPLICATE_API_TOKEN> run the above command in command prompt to activate the token
- Thejas Raja Elandassery
- Abhay Gupta
- Aditya Kumar
- Sanika Kalaskar