You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's been ages since I last used Spreads and I am glad to see that the project is still in active development and offering a lot of new features!
However, I cannot figure out how to use Spreads to post-process (with tesseract and pdfbeads) existing images not captured using the program. I have tried two different routes, both failed:
1-) post-processing the output of scantailor (.tif files):
$ spread --loglevel debug --verbose postprocess out
Workflow: Initializing workflow out1
bagit: Adding path /tmp/tmpxaqkKY/bag-info.txt to payload
bagit: Adding path /tmp/tmpxaqkKY/bag-info.txt to payload
bagit: Adding path /tmp/tmpxaqkKY/bag-info.txt to payload
bagit: Adding path /tmp/tmpxaqkKY/bag-info.txt to payload
bagit: Copying path out1/IMG_0117_right.tif to paylod directory
bagit: Copying path out1/IMG_0022_right.tif to paylod directory
.
.
.
bagit: Copying path out1/IMG_0182_left.tif to paylod directory
bagit: Adding path /tmp/tmpxaqkKY/bag-info.txt to payload
bagit: Adding path /home/asartori/bookscan/JConch/vol1/out1/bag-info.txt to payload
spreads encountered an error:
Traceback (most recent call last):
File "/usr/local/lib/python2.7/dist-packages/spreads/main.py", line 321, in main
run()
File "/usr/local/lib/python2.7/dist-packages/spreads/main.py", line 308, in run
args.subcommand(config)
File "/usr/local/lib/python2.7/dist-packages/spreads/cli.py", line 358, in postprocess
workflow = spreads.workflow.Workflow(config=config, path=path)
File "/usr/local/lib/python2.7/dist-packages/spreads/workflow.py", line 445, in __init__
for img in (self.path/'data'/'raw').iterdir()]
File "/usr/local/lib/python2.7/dist-packages/pathlib.py", line 982, in iterdir
for name in self._accessor.listdir(self):
File "/usr/local/lib/python2.7/dist-packages/pathlib.py", line 346, in wrapped
return strfunc(str(pathobj), *args)
OSError: [Errno 2] No such file or directory: 'out1/data/raw'
Trying to fix this by generating the missing folder does not work:
$ mkdir out1/data/raw
$ spread --loglevel debug --verbose postprocess out1
Workflow: Initializing workflow out1
bagit: Adding path /home/asartori/bookscan/JConch/vol1/out1/bag-info.txt to payload
bagit: Adding path out1/config.yml to payload
Workflow: Starting postprocessing...%
Workflow: Running 'process' hooks
spreadsplug.tesseract: Performing OCR
spreadsplug.tesseract: Language is "chi_sim"
bagit: Path out1/data/done is an empty directory , will be skipped.
bagit: Adding path /home/asartori/bookscan/JConch/vol1/out1/bag-info.txt to payload
bagit: Adding path out1/pagemeta.json to payload
Workflow: Done with postprocessing!
OCR was not performed, but Spreads exits without error.
2-) post-processing the JPGs from my cameras trying to invoke scantailor via spreads:
$ spread --verbose postprocess vol2
This results in the same error as scenario 1 (OSError: [Errno 2] No such file or directory: 'vol2/data/raw')
After creating the missing folder, the new output reveals that scantailor is not being invoked correctly by Spreads:
spread --verbose postprocess vol2
Workflow: Initializing workflow vol2
bagit: Adding path /home/asartori/bookscan/JConch/vol2/bag-info.txt to payload
bagit: Adding path /home/asartori/bookscan/JConch/vol2/bag-info.txt to payload
bagit: Adding path /home/asartori/bookscan/JConch/vol2/bag-info.txt to payload
bagit: Adding path vol2/config.yml to payload
Workflow: Starting postprocessing...%
Workflow: Running 'process' hooks
spreadsplug.scantailor: Generating ScanTailor configuration
spreadsplug.scantailor: /usr/bin/scantailor-cli --start-filter=2 --end-filter=5 --layout=1.5 -o=/tmp/tmpX_hohD.ScanTailor --margins-top=2.5 --margins-right=2.5 --margins-bottom=2.5 --margins-left=2.5 /tmp/st-out4zVnEq
Scan Tailor is a post-processing tool for scanned pages.
Version: 0.9.11.1
ScanTailor usage:
1) scantailor
2) scantailor <project_file>
3) scantailor-cli [options] <image, image, ...> <output_directory>
4) scantailor-cli [options] <project_file> [output_directory]
1)
start ScanTailor's GUI interface
2)
start ScanTailor's GUI interface and load project file
3)
batch processing images from command line; no GUI
4)
batch processing project from command line; no GUI
if output_directory is specified as last argument, it overwrites the one in project file
Options:
--help, -h
--verbose, -v
--layout=, -l=<0|1|1.5|2> -- default: 0
0: auto detect
1: one page layout
1.5: one page layout but cutting is needed
2: two page layout
--layout-direction=, -ld=<lr|rl> -- default: lr
--orientation=<left|right|upsidedown|none>
-- default: none
--rotate=<0.0...360.0> -- it also sets deskew to manual mode
--deskew=<auto|manual> -- default: auto
--content-detection=<cautious|normal|aggressive>
-- default: normal
--content-box=<<left_offset>x<top_offset>:<width>x<height>>
-- if set the content detection is se to manual mode
example: --content-box=100x100:1500x2500
--margins=<number> -- sets left, top, right and bottom margins to same number.
--margins-left=<number>
--margins-right=<number>
--margins-top=<number>
--margins-bottom=<number>
--alignment=center -- sets vertical and horizontal alignment to center
--alignment-vertical=<top|center|bottom>
--alignment-horizontal=<left|center|right>
--dpi=<number> -- sets x and y dpi. default: 600
--dpi-x=<number>
--dpi-y=<number>
--output-dpi=<number> -- sets x and y output dpi. default: 600
--output-dpi-x=<number>
--output-dpi-y=<number>
--color-mode=<black_and_white|color_grayscale|mixed>
-- default: black_and_white
--white-margins -- default: false
--normalize-illumination -- default: false
--threshold=<n> -- n<0 thinner, n>0 thicker; default: 0
--despeckle=<off|cautious|normal|aggressive>
-- default: normal
--dewarping=<off|auto> -- default: off
--depth-perception=<1.0...3.0> -- default: 2.0
--start-filter=<1...6> -- default: 4
--end-filter=<1...6> -- default: 6
--output-project=, -o=<project_name>
spreadsplug.tesseract: Performing OCR%
spreadsplug.tesseract: Language is "chi_sim"
bagit: Path vol2/data/done is an empty directory , will be skipped.
bagit: Adding path /home/asartori/bookscan/JConch/vol2/bag-info.txt to payload
bagit: Adding path vol2/pagemeta.json to payload
Workflow: Done with postprocessing!
I suspect the problem is possibly just to do with the project folder structure that Spreads expects to find (created during the capture workflow that I am skipping).
Any ideas on how I could fix this would be greatly appreciated!
The text was updated successfully, but these errors were encountered:
I, as well, am trying to use spreads for strictly its post-processing tool chain. Yet, like the issue exhibited above, I can not find a way to initiate the post-processing process. I have manually created a /data/raw directory for unprocessed .tif(s). How does the capture process signal to spread to initiate post-processing. Is there a way I can spoof this signal so that I can initiate post-processing without actually using spread's capture?
I would really like to see this as a supported feature. I am not a developer, but it appears to me a fairly simple implementation? Could there be an option to upload images to a workflow's raw folder, and then initiate from the post-processing step, skipping the capture step?
EDIT: I just realized I can initiate the processing stage via the API, but it still does not process anything. What is the minimum amount of meta data necessary for post-processing to proceed correctly?
Hi,
It's been ages since I last used Spreads and I am glad to see that the project is still in active development and offering a lot of new features!
However, I cannot figure out how to use Spreads to post-process (with tesseract and pdfbeads) existing images not captured using the program. I have tried two different routes, both failed:
1-) post-processing the output of scantailor (.tif files):
Trying to fix this by generating the missing folder does not work:
OCR was not performed, but Spreads exits without error.
2-) post-processing the JPGs from my cameras trying to invoke scantailor via spreads:
This results in the same error as scenario 1 (OSError: [Errno 2] No such file or directory: 'vol2/data/raw')
After creating the missing folder, the new output reveals that scantailor is not being invoked correctly by Spreads:
I suspect the problem is possibly just to do with the project folder structure that Spreads expects to find (created during the capture workflow that I am skipping).
Any ideas on how I could fix this would be greatly appreciated!
The text was updated successfully, but these errors were encountered: