You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Calamari's network specs do not contain or require a reshaping/projection operation before the first LSTM layer, this seems to be added automatically.
However, other traditional CNN-RNN implementations offer an alternative element: an LSTM which takes the height axis as sequence and summarises into a single output vector per width position:
Tesseract traditionally uses Lfys<h> layer, e.g. in the default VGSL 1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx192
Kraken offers that as well (but usually rather just reshapes via explicit S1(1x<h>)1,3 element)
Is it perhaps expected that the combination of reshape and CenterNormalizer will do a better job? I wonder whether this has ever been thoroughly investigated. Also, CenterNormalizer might degrade instead of improve horizontal statistics, esp. for handwriting (where some have even argued a need for deslanting), or with grayscale input.
The text was updated successfully, but these errors were encountered:
Calamari's network specs do not contain or require a reshaping/projection operation before the first LSTM layer, this seems to be added automatically.
However, other traditional CNN-RNN implementations offer an alternative element: an LSTM which takes the height axis as sequence and summarises into a single output vector per width position:
Lfys<h>
layer, e.g. in the default VGSL1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx192
S1(1x<h>)1,3
element)Is it perhaps expected that the combination of reshape and CenterNormalizer will do a better job? I wonder whether this has ever been thoroughly investigated. Also, CenterNormalizer might degrade instead of improve horizontal statistics, esp. for handwriting (where some have even argued a need for deslanting), or with grayscale input.
The text was updated successfully, but these errors were encountered: