-
-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Request for info: support for multi-page tiffs #136
Labels
Comments
@evu Thanks. Could you give me any multi-paged tiff file as an example for development |
ping @evu |
thx |
otiai10
added a commit
that referenced
this issue
Nov 5, 2018
I'm having the same problem where I'm trying to extract text from a multi-page .tiff file, only first page is extracted. The same problem also exists in the case of a .png file. Would appreciate any help :) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Summary
Please confirm if support for multi-page TIFF files is present, perhaps using an option I cannot identify, or if this would require an enhancement.
When I extract text from a multi-page TIFF using
Text()
it only extracts the text from the first page of the TIFF.When I extract text from a multi-page TIFF using the
tesseract
command line client with defaults it extracts all pages of text.I looked at some of the tesseract source code for
pixReadMem()
and I noticed this here:It looks like tesseract might do some additional preprocessing on the image prior to calling
pixReadMem()
.Reproducibility
Reproducility Frequency
How to reproduce
multipage.tif.txt
) and notice text has been extracted from all pages of tif.text
. Notice only first page's text is returned.Environment
The text was updated successfully, but these errors were encountered: