Scripts designed to download video based on the URL and transcribe the video using a LLM audio-to-text transcription model to be defined. The transcribed text is then used to generate questions and answers using a LLM to be defined - probably ChatGPT. The QA model is then used to answer questions based on the video content.
- Create conda environment
conda create -n video-transcription python=3.12
- Activate conda environment
conda activate video-transcription
- Install requirements
pip install -r requirements.txt
- Run the video download script with the URL of the video as an argument
python scripts/video_downloader.py --url <URL>
- Run the audio from video script with the filepath of the video and the output path of the audio file as arguments
python scripts/audio_from_video.py --filepath download/video --output_path download/audio.wav
- Transcribe the audio file using the OpenAI whisper-large-v3 model
python scripts/audio_transcription.py --audio_path download/audio.wav --transcription_path download/transcription.json --timestamps True
- Add OPENAI_API_KEY to the environment variables
export OPENAI_API_KEY=<API_KEY>
- Ask question based on context of the transcription using ChaGPT API
python scripts/question_answer.py --transcription_path download/transcription.txt --questions_path download/questions.txt --answers_path download/answers.txt
- Execute bash script transcribe.sh to run the entire pipeline
./transcribe_url.sh <URL>