Video Subtitle Processing and Search Application

This project is a web application that allows users to upload videos, process them to extract subtitles, and then search for specific keywords within the subtitles. The subtitles are extracted using the ccextractor binary and stored in AWS DynamoDB, while the videos are stored in AWS S3. The backend is built with Django and uses Celery for background tasks.

The application is deployed on an AWS EC2 instance and can be accessed at http://65.2.179.201/.

The latency of the application was tested repeatedly and each time the application responded with a time of less than 1 second. The application was able to handle multiple requests simultaneously without any noticeable delay in response time.

Note: During deployment, due to using free tiers of AWS services, there are occasional delays in the application's response times, especially during video uploads.

Note: The main branch of this repository mainly contains the code for the deployed application which also corresponds to code for the linux os, also present in the linux branch. The windows branch contains the code for the windows os.

Features

Video Upload: Users can upload videos through the web interface.
Thumbnail Generation: Generates thumbnails for uploaded videos.
Subtitle Extraction: Extracts subtitles from uploaded videos using ccextractor.
Background Processing: Uses Celery to handle video processing in the background.
AWS Integration: Stores videos in AWS S3 and subtitles in AWS DynamoDB.
Video Streaming: Provides pre-signed URLs for streaming videos from S3.
API Endpoints: Provides API endpoints for interacting with the application programmatically.
Web Interface: Provides a user-friendly web interface for interacting with the application, made by rendering rest api endpoints using django rest framework TemplateHTMLRenderer.
Search Functionality: Allows users to search for keywords in subtitles and videos.
Keyword Search: Allows users to search for keywords in subtitles and retrieves the relevant time segments in the video.

Requirements

Python 3.9+
Django
AWS Account with S3 and DynamoDB setup
ccextractor installed
Redis (for Celery message broker)
Celery

Setup and Installation

Clone the Repository

git clone https://github.com/lordgrim18/VideoSearch.git
cd VideoSearch

Install Dependencies

Create a virtual environment and install the required dependencies:

python -m venv venv

Now activate the virtual environment:

On Windows:

venv\Scripts\activate

On Linux:

source venv/bin/activate

Install the dependencies:

pip install -r requirements.txt

Set Environment Variables

Create a .env file in the project root directory and add the variables given in the sample .env.sample file.

AWS Configuration

Create an AWS account and set up an S3 bucket and DynamoDB table. It is recommended to create a separate IAM user with limited permissions for this project.
Use the AWS CLI to configure your credentials:
Install the AWS CLI: follow the instructions here
After installing the AWS CLI, run the following command to configure your credentials:

aws configure

Enter your AWS Access Key ID and Secret Access Key when prompted. Then set the default region to the region where you created your S3 bucket and DynamoDB table. You can find the region code in the AWS Management Console in the top left. Set the output format to json.

S3 Bucket

In the AWS Management Console, create an S3 bucket to store the videos.
Keep in mind that the bucket must be in the same region as the one you configured in the AWS CLI.
Set the bucket name in the .env file.

DynamoDB Table

The DynamoDB table can be created by running the core/dynamo_migration.py script.
Run the script with the following command:

python3 core/dynamo_migrator.py

This will create two tables in DynamoDB: Video and Subtitle.
The Video table stores information about the uploaded videos, while the Subtitle table stores the extracted subtitles.

Start Redis (for Celery)

Install Redis:

On linux:

sudo apt update
sudo apt install redis-server

Start the Redis server:

sudo service redis-server start

Check the status of the Redis server:

sudo service redis-server status

Verify that Redis is running:

redis-cli ping

The server should respond with PONG.

On Windows:

Set up Docker image for Redis:

docker pull redis
docker run --name redis -p 6379:6379 -d redis

If you are running Redis on a different host or port, you can specify the connection string in the .env file.

Start Celery Worker

Start the Celery worker to handle background tasks:

On linux:

celery -A VideoSearch worker -loglevel=info

On Windows:

celery -A VideoSearch worker -l info --pool=solo -E

Run the Application

Start the Django development server:

python manage.py runserver

The application should now be running at http://127.0.0.1:8000/.

Usage

Upload a video through the web interface.
The video will be processed in the background to extract subtitles.
Once the subtitles are extracted, you can search for keywords in the subtitles.
The application will display the relevant time segments in the video where the keywords appear.

Users can also interact with the API endpoints directly: They are located at http://127.0.0.1:8000/api/v1/.

API Endpoints

The API endpoints allow users to interact with the application programmatically. The available endpoints are:

Videos

GET /api/v1/videos/: List all videos.
POST /api/v1/videos/create/: Upload a video.
GET /api/v1/videos/<video_id>/: Retrieve a specific video.
PATCH /api/v1/videos/update/<video_id>/: Update a video.
DELETE /api/v1/videos/delete/<video_id>/: Delete a video.
GET /api/v1/videos/search/: Search for videos by keyword.

Subtitles

GET /api/v1/subtitles/<video_id>: Retrieve subtitles for a specific video.
GET /api/v1/subtitles/<video_id>/search/: Search for keywords in the subtitles of a video.
GET /api/v1/subtitles/search/: Search for keywords in all subtitles.

Storage

GET /api/v1/storage/: List all videos stored in S3.
GET /api/v1/storage/object_name/: Retrieve a specific video stored in S3.
POST /api/v1/storage/url/: Obtain a pre-signed URL from S3 for streaming the video.

Search

GET /api/v1/search/: Search for keywords in all subtitles and videos.

Deployment

The application has been deployed to AWS EC2 instance and can be accessed at http://65.2.179.201/.
The deployment process involves setting up the EC2 instance, installing the necessary dependencies, configuring the environment variables, and running the application.
The application is served using Gunicorn and Nginx.
The Celery worker is also running on the EC2 instance to handle background tasks.
The Celery worker was daemonized to run in the background.

Note: During deployment, due to using free tiers of AWS services, there are occasional delays in the application's response times, especially during video uploads.

Name		Name	Last commit message	Last commit date
Latest commit History 186 Commits
VideoSearch		VideoSearch
api		api
core		core
frontend		frontend
media		media
staticfiles		staticfiles
templates		templates
.env.sample		.env.sample
.gitignore		.gitignore
README.md		README.md
commands.md		commands.md
manage.py		manage.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video Subtitle Processing and Search Application

Features

Requirements

Setup and Installation

Clone the Repository

Install Dependencies

Set Environment Variables

AWS Configuration

S3 Bucket

DynamoDB Table

Start Redis (for Celery)

On linux:

On Windows:

Start Celery Worker

On linux:

On Windows:

Run the Application

Usage

API Endpoints

Videos

Subtitles

Storage

Search

Deployment

About

Releases

Packages

Languages

lordgrim18/VideoSearch

Folders and files

Latest commit

History

Repository files navigation

Video Subtitle Processing and Search Application

Features

Requirements

Setup and Installation

Clone the Repository

Install Dependencies

Set Environment Variables

AWS Configuration

S3 Bucket

DynamoDB Table

Start Redis (for Celery)

On linux:

On Windows:

Start Celery Worker

On linux:

On Windows:

Run the Application

Usage

API Endpoints

Videos

Subtitles

Storage

Search

Deployment

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages