Thoth - the video distiller

Team members: Kaichun Luo, Lorraine Lyu, Victor Song, Max Yu

Inspiration

With the Pandemic continuously affecting our daily life, online educational videos have become increasingly popular. Be it hybrid or completely asynchronous, many lectures and meetings are pre-recorded or delivered on Zoom with a recording available. Watching and rewatching video lectures can be painstaking and demotivating, given its length and lack of interactions. But what if we can turn a video lecture into beautiful notes with pictures so that we can even search contents in a pile of videos? What if you can search and find which exact lecture you missed talked about a specific topic? That is what we bring to you in this project.

What it does

Slices video clips according to contents (for example, slides)
Transcribes speech for each video slice
Extracts the most representative image from each video slice
OCR (Optical Character Recognition) extract text from each image
Displays slides and transcripts side by side
Enables content searching in the video through transcribed speech and recognized text from images

How we built it

Full-stack web application

Front-end: We used Material UI and React.js to make requests to the backend and present the output in an accessible manner.
Back-end: We used Flask in Python to handle http requests from the front-end and run our video processing pipeline.

Video processing pipeline

Scene Detection: We utilized the PySceneDetect package as a guideline and on top of that, we designed and implemented a novel customized detector which relies heavily on Numpy and OpenCV.
Speech Recognition: We used Google Cloud Speech-To-Text API to transcribe the videos.
OCR: We used Tesseract for OCR on the existing screenshots generated by Scene Detection.
Search Indexing: We used Whoosh, an open source search engine and variant of Lucene, to index all our text generated by Speech Recognition and OCR.

Challenges we ran into

It was not an easy project. Initially we wanted to build a native JavaScript program that calls our pipeline written in Python. This did not work out, and we turned to a full-stack web application when we had less than 12 hours left. Miscellaneous bugs came out when we were trying to push everything in such a tight time limit, especially when doing things that we are not familiar with. We spent a lot of time figuring out how to use multi-threading and multi-processing to speed up video processing, and it also took a lot of fine tuning to make sure that our scene detector model fits various videos. Glad that we made it!

Environment Requirements

The following libraries are required:

pip install ffmpeg

pip install opencv-python

pip install scenedetect

pip install pyinstaller

pip install pillow

pip install pytesseract

To run video_to_text.py (team members only):

cd into thoth
mkdir audios & mkdir videos if you don't have'em
pip3 install tinytag
pip3 install google-cloud-speech
pip3 install google-cloud-storage
Log into GCP with your rice email. Generate your credential in Thoth project in GCP (in dashboard left menu IAM & Admin > service accounts > Actions (the dots to the right of the only service account entry) >Manage Keys > create new key > JSON)
Set up Google auth credential following this section https://cloud.google.com/docs/authentication/getting-started#setting_the_environment_variable
place .mp4 file or .mov file into vodeos folder and call get_speech_from_video("video name"), the video should be short for testing. It takes some time to transcribe.

Name		Name	Last commit message	Last commit date
Latest commit History 114 Commits
electron		electron
frontend		frontend
python		python
test		test
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Thoth - the video distiller

Team members: Kaichun Luo, Lorraine Lyu, Victor Song, Max Yu

Inspiration

What it does

How we built it

Full-stack web application

Video processing pipeline

Challenges we ran into

Environment Requirements

About

Releases

Packages

Contributors 3

Languages

maxyu1115/thoth

Folders and files

Latest commit

History

Repository files navigation

Thoth - the video distiller

Team members: Kaichun Luo, Lorraine Lyu, Victor Song, Max Yu

Inspiration

What it does

How we built it

Full-stack web application

Video processing pipeline

Challenges we ran into

Environment Requirements

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages