This really messy script combines whisper from OpenAI and a pytorch tutorial on forced alignment to generate word-level .srt captions
Install dependencies, put a video.wav
file with a 16000 bitrate in the folder, run the script and pray it works.
https://twitter.com/thejohnfish/status/1574204788408926208?s=20&t=IeCezzbsOso508wPYH9ZXg