This project enhances the audio quality of videos by extracting the audio, converting it into a transcript, correcting grammar, and eliminating filler words using Azure OpenAI. The modified transcript is then converted back into audio and precisely mapped to the original video, ensuring seamless synchronization.
- Automatic Speech-to-Text: Extracts audio from the input video and converts it into text using a speech recognition engine.
- Grammar Correction: Corrects grammatical errors in the transcript using Azure OpenAI.
- Filler Word Removal: Removes common filler words such as "uh", "um", and "hmm" from the transcript to improve clarity.
- Text-to-Speech: Converts the cleaned transcript back into audio.
- Seamless Audio-Video Synchronization: Ensures the new audio is perfectly synchronized with the original video, without any delay or mismatch.
-
Clone the repository:
git clone https://github.com/skstanwar/Curious-PM-.git cd Curious-PM-
-
Create and activate a virtual environment:
python -m venv venv source venv/bin/activate # For Windows: venv\Scripts\activate
-
Install dependencies:
pip install -r requirements.txt
-
Set up Azure OpenAI API:
- Create an account and get your API key from Azure OpenAI.
- Set up your API key in the environment file .env:
azure_openai_key="**********"
-
Provide your input video: Place the video file in the
input
directory or specify the path in the script. -
Run the script:
python main.py input_video_path.mp4
- wait for 20 to 30 secs
-
Audio Extraction: The script extracts the audio track from the input video using
MoviePy
and saves it as a separate audio file. -
Speech-to-Text Conversion: The extracted audio is processed using a speech recognition engine to convert it into a transcript. This step generates a text version of the spoken content.
-
Grammar Correction and Filler Word Removal: The transcript is sent to Azure OpenAI, where grammatical errors are corrected, and filler words such as "umm", "uh", and "hmm" are removed for a more professional-sounding transcript.
-
Text-to-Speech Conversion: The cleaned transcript is converted back into an audio file using a text-to-speech engine.
-
Remapping Audio to Video: The newly generated audio is remapped back to the original video. The script ensures perfect synchronization between the new audio and the video, with no delays or mismatches.
- Python: 3.8+
- MoviePy: For video processing
- Azure OpenAI: For interacting with Azure OpenAI API
- deepgram: For converting audio to text and text to audio
- streamlit: For web app view application
This project is licensed under the MIT License. See the LICENSE file for more details.