Skip to content

Latest commit

 

History

History
71 lines (51 loc) · 3.84 KB

README.md

File metadata and controls

71 lines (51 loc) · 3.84 KB

1320-blind

In our project, we demonstrate a prototypical smart glasses to help blind people navigate and identify object easier.

components:

  • smart glasses: streams video and audio to the internet by connecting to a wifi network. To enable transportability one can connect it to a mobile hotspot instead.
  • backend / server / computer: receives the video and audio streams; analyze the audio stream for voice commands and sends the video data to a bunch of ai to convert to useful text.
  • in the prototype: the text is read aloud by the computer and the output audio is streamed to the blind persons phone throuugh the internet
  • ideally: the text is sent to a custom app on the blind persons phone and it reads out the texts

note on image / video:

  • in the prototype: an image is taken from the video stream every 10 seconds and the image is sent to different single-image processing machine learning models
  • ideally: the video stream itself is taken as the input to more sophisticated machine learning models in order to take advantage of the temporal relations between frames

voice commands:

  • [nothing]: every 10 seconds it reads aloud what object it detected according to the flowchart.
  • "help": reads aloud available commands
  • "start": turns on
  • "stop": turns off
  • "remind": saves the recording of what the user said
  • "reminder": plays the most recent "remind" recording
  • "describe": sends the image to a natural languge processing model to generate a coherent description of the image.
  • "picture": saves the image and in the prototype saves it to a google drive for future use of the blind person; ideally the image will be sent to the custom app
  • "crop" (todo): crops image to a rectangular object such as a poster or a paper the user is holding; describes the cropped image and saves it as well
  • "point" (todo): reads aloud what object the user is pointing at with the blind cane

instructions

it literally screenshots the screen everytime and sends it to a bunch of ais; so open ur camera app to fullscreen to test (https://webcamtests.com/)

install libraries (for huggingface):

  1. pip install sounddevice soundfile gtts transformers pyautogui pillow speechrecognition scipy torch timm playsound

for gcloud:

  1. https://cloud.google.com/sdk/docs/install-sdk
  2. https://googleapis.dev/python/google-api-core/latest/auth.html

for imageai:

  1. download the 3 .pt files on this webpage (scroll down a bit) https://github.com/OlafenwaMoses/ImageAI/tree/master/imageai/Detection
  2. put in same folder as the python scripts

for huggingface:

  1. no setup required we will use this as the main one

IMPORTANT: hold ctrl c to stop the program; please stop it after few tries cuz every time use google cloud some money is used in my account (currently running on free credits given by google)

job division:

image

credits