Video Audio Enhancer with Azure OpenAI

This project enhances the audio quality of videos by extracting the audio, converting it into a transcript, correcting grammar, and eliminating filler words using Azure OpenAI. The modified transcript is then converted back into audio and precisely mapped to the original video, ensuring seamless synchronization.

Features

Automatic Speech-to-Text: Extracts audio from the input video and converts it into text using a speech recognition engine.
Grammar Correction: Corrects grammatical errors in the transcript using Azure OpenAI.
Filler Word Removal: Removes common filler words such as "uh", "um", and "hmm" from the transcript to improve clarity.
Text-to-Speech: Converts the cleaned transcript back into audio.
Seamless Audio-Video Synchronization: Ensures the new audio is perfectly synchronized with the original video, without any delay or mismatch.

Flow

https://skstanwar.github.io/Curious-PM-/

Installation

Clone the repository:

git clone https://github.com/skstanwar/Curious-PM-.git
cd Curious-PM-

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # For Windows: venv\Scripts\activate

Install dependencies:
```
pip install -r requirements.txt
```
Set up Azure OpenAI API:
- Create an account and get your API key from Azure OpenAI.
- Set up your API key in the environment file .env:
```
azure_openai_key="**********"
```

Usage

Provide your input video: Place the video file in the input directory or specify the path in the script.
Run the script:
```
python main.py input_video_path.mp4
```
- wait for 20 to 30 secs

Project Workflow

Audio Extraction: The script extracts the audio track from the input video using MoviePy and saves it as a separate audio file.
Speech-to-Text Conversion: The extracted audio is processed using a speech recognition engine to convert it into a transcript. This step generates a text version of the spoken content.
Grammar Correction and Filler Word Removal: The transcript is sent to Azure OpenAI, where grammatical errors are corrected, and filler words such as "umm", "uh", and "hmm" are removed for a more professional-sounding transcript.
Text-to-Speech Conversion: The cleaned transcript is converted back into an audio file using a text-to-speech engine.
Remapping Audio to Video: The newly generated audio is remapped back to the original video. The script ensures perfect synchronization between the new audio and the video, with no delays or mismatches.

Dependencies

Python: 3.8+
MoviePy: For video processing
Azure OpenAI: For interacting with Azure OpenAI API
deepgram: For converting audio to text and text to audio
streamlit: For web app view application

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
audio		audio
sample		sample
uploaded_video		uploaded_video
video		video
.env.example		.env.example
.gitignore		.gitignore
.gitpod.yml		.gitpod.yml
README.md		README.md
Report PDF.pdf		Report PDF.pdf
index.html		index.html
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Video Audio Enhancer with Azure OpenAI

Table of Contents

Features

Flow

Installation

Usage

Project Workflow

Dependencies

License

About

Releases

Packages

Languages

skstanwar/Curious-PM-

Folders and files

Latest commit

History

Repository files navigation

Video Audio Enhancer with Azure OpenAI

Table of Contents

Features

Flow

Installation

Usage

Project Workflow

Dependencies

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages