Uses VideoSubFinder and Google Cloud Vision to extract hardsubs and OCR them to create an SRT file. Main purpose is for use with MPVacious for quick dictionary lookups and use with subs2srs. All code in the .bat and .py files was written by ChatGPT. I used pyinstaller to make the python script an exe, source code is in BatchProcessor.py.
- Download this repository and extract anywhere.
- Make sure you have python installed (used version 3.13.0 here)
- Install VideoSubFinder and extract Release_x64 to the main directory.
- It's important that the folder name matches
Release_x64
exactly.
- It's important that the folder name matches
/Hardsub-Extract-OCR-main/
└── /BatchProcessor/
└── /PutVideosInHere/
└── /Release_x64/ <-- here
- Put the videos you want to extract hardsubs from in the /PutVideosInHere/ folder.
- Run the ExtractHardsubs.bat.
- You will need to check the RGBImages folder so you can tweak the crop settings. Having a closer crop will reduce noise in the final result and increase processing speed in the extraction and OCR steps. You also may need to close VideoSubFinder using Task Manager (ctrl+shift+esc) before running the .bat again:
This is where the RGBImages folder is:
/Hardsub-Extract-OCR-main/
└── /BatchProcessor/
└── /PutVideosInHere/
├── /output/
└──/VideoTitle/
└──RGBImages
└── VideoTitle.mp4
└── /Release_x64/
Example of too high -be
value, crops too much off the bottom
This is how you can tweak the crop values in ExtractHardsubs.bat. You can edit the file by right clicking and selecting edit.
Once your crop values are tweaked, you can let the program run. It will leave you with the final images in TXTImages. They should look like this:
The OCR we will be using is called Google Cloud Vision, it's the same one used in Google Lens. I say it's 99% accuracy, but I've never seen it make a mistake with high quality source images. It does pick up some noise though. At the time of writing, Google offers a $300 free trial for 3 months. That's enough to OCR over 100 episodes of anime in my experience.
- Make a Google Cloud account.
- Create a project.
- You can just enter random text or "Other" for the details.
- Enable billing, enter payment information.
- It will not charge you unless you go over the trial amount.
- In
APIs & Services
in your project, go toCredentials
. - Click
+ Create Credentials
, thenService Account
. - Enter info.
- Inside the Service Account, click Keys.
- Click
ADD KEY
, Create new key, make sure JSON is selected, then click Create. - Drag this JSON file into the main directory.
- In your project, go to
APIs and Services
. Enabled APIs and Services
.+ Enable APIs and Services
.- Search for
Cloud Vision
. - Enable it.
Now you can use the Batch Processor.
- Run
01 InstallRequirements.bat
. - Run
02 BatchProcessor.exe
. - Select your folder with images.
- If you're processing multiple videos at once, this will be
output
, check theProcess Multiple Folders
checkbox. - If you're processing a single video, this will be
TXTImages
.
- If you're processing multiple videos at once, this will be
- Select the output folder, where the SRT files will be generated.
- Select your JSON file, should be in the main directory. Ex.
example-473422-7e6ba2cacb95.json
.
Once you have your subtitle files, you may want to use subtitle edit to merge lines with the same text. For example, when a scene changes in a show with the same subtitle line on screen, it gets counted as two lines, which can cause problems when using subs2srs.
- To do this, open subtitle edit.
- Click on
Tools
Batch Convert
- Drag in files
Merge lines with same text
Overwrite files
Convert
Discord: furretar
If you'd rather use a locally hosted OCR rather than Google Cloud Vision, you should consider using RapidVideOCR Desktop. It uses PaddleOCR, which has about 95% accuracy in my experience. You can also convert the TXTImages from the first step into a SUP file directly without OCR using Images to PGS SUP