Skip to content

Tentacule/PgsToSrt

Repository files navigation

PgsToSrt

Convert PGS subtitles to SRT using OCR.

Prerequisites

Data files must be placed in the tessdata folder inside PgsToSrt folder, or the path can be specified in the command line with the --tesseractdata parameter.

You only need data files for the language(s) you want to convert.

Usage

dotnet PgsToSrt.dll [parameters]

Parameter Description
--input Input filename, can be an mkv file or pgs subtitle extracted to a .sup file with mkvextract.
--output Output SubRip (.srt) filename. Auto generated from input filename if not set.
--track Track number of the subtitle to process in an .mkv file (only required when input is a matroska file)
This can be obtained with mkvinfo
--tracklanguage Convert all tracks of the specified language (only works with .mkv input)
--tesseractlanguage Tesseract language to use if multiple languages are available in the tesseract data directory.
--tesseractdata Path of tesseract language data files, by default tessdata in the executable directory.
--tesseractversion libtesseract version, support 4 and 5 (default: 4) (ignored on Windows platform)
--libleptname leptonica library name, usually lept or leptonica, 'lib' prefix is automatically added (default: lept) (ignored on Windows platform)
--libleptversion leptonica library version (default: 5) (ignored on Windows platform)

Example (Command Line)

dotnet PgsToSrt.dll --input video1.fr.sup --output video1.fr.srt --tesseractlanguage fra
dotnet PgsToSrt.dll --input video1.mkv --output video1.srt --track 4

Example (Docker)

Examime entrypoint.sh for a full list of all available arguments.

docker run -it --rm \
    -v /data:/data \
    -e INPUT=/data/myImageSubtitle.sup \
    -e OUTPUT=/data/myTextSubtitle.srt \
    -e LANGUAGE=eng \
    tentacule/pgstosrt

Hint: The default arguments coming from Dockerfile are INPUT=/input.sup and OUTPUT=/output.srt, so you can easily:

touch output-file.srt  # This needs to be a file, otherwise Docker will just assume it's a directory mount and it will fail.
docker run --it -rm \
    -v source-file.sup:/input.sup \
    -v output-file.srt:/output.srt \
    -e LANGUAGE=eng \
    tentacule/pgstosrt

Dependencies

  • Windows : none, tesseract/leptonica libraries are included in the release package.
  • Linux : libtesseract4 (sudo apt install libtesseract4 or whatever your distro requires)

Build

To build PgsToSrt.dll execute the following commands in the src/ directory:

dotnet restore
dotnet publish -c Release -o out --framework net6.0
# The file produced is  PgsToSrt/out/PgsToSrt.dll

To build a Docker image for all languages:

make build-all

To build a docker image for a single language:

make build-single LANGUAGE=eng  # or any other Tessaract-available language code

Built With