predict: ctc_decoder parameters are never applied #365

bertsky · 2024-10-02T16:27:35Z

calamari/calamari_ocr/scripts/predict.py

Line 114 in f0139d6

prepare_ctc_decoder_params(args.ctc_decoder)

This merely post-processes some of the command-line choices. It never actually instantiates a CTCDecoderProcessor, or replaces the default one in the postprocessor pipeline.

How is this supposed to have worked in the first place?

Also, the parameterization of this postprocessor begs more questions. Assuming some tests have been done with the dictionary feature (word beam search):

Why is non_word_chars not automatically configured to all the punctuation characters in the entire charset during training? (All the public models I see merely contain the default characters, i.e. only ASCII punctuation.)
Why is word_separator only whitespace by default – shouldn't that allow more cases like hyphen (esp. in German)?

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

predict: ctc_decoder parameters are never applied #365

predict: ctc_decoder parameters are never applied #365

bertsky commented Oct 2, 2024

predict: ctc_decoder parameters are never applied #365

predict: ctc_decoder parameters are never applied #365

Comments

bertsky commented Oct 2, 2024