Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using spreads for postprocessing only #195

Open
afsartori opened this issue Jun 8, 2015 · 3 comments
Open

Using spreads for postprocessing only #195

afsartori opened this issue Jun 8, 2015 · 3 comments
Labels

Comments

@afsartori
Copy link

Hi,

It's been ages since I last used Spreads and I am glad to see that the project is still in active development and offering a lot of new features!

However, I cannot figure out how to use Spreads to post-process (with tesseract and pdfbeads) existing images not captured using the program. I have tried two different routes, both failed:
1-) post-processing the output of scantailor (.tif files):

$ spread --loglevel debug --verbose postprocess out
Workflow: Initializing workflow out1
bagit: Adding path /tmp/tmpxaqkKY/bag-info.txt to payload
bagit: Adding path /tmp/tmpxaqkKY/bag-info.txt to payload
bagit: Adding path /tmp/tmpxaqkKY/bag-info.txt to payload
bagit: Adding path /tmp/tmpxaqkKY/bag-info.txt to payload
bagit: Copying path out1/IMG_0117_right.tif to paylod directory
bagit: Copying path out1/IMG_0022_right.tif to paylod directory
.
.
.
bagit: Copying path out1/IMG_0182_left.tif to paylod directory
bagit: Adding path /tmp/tmpxaqkKY/bag-info.txt to payload
bagit: Adding path /home/asartori/bookscan/JConch/vol1/out1/bag-info.txt to payload
spreads encountered an error:
Traceback (most recent call last):
  File "/usr/local/lib/python2.7/dist-packages/spreads/main.py", line 321, in main
    run()
  File "/usr/local/lib/python2.7/dist-packages/spreads/main.py", line 308, in run
    args.subcommand(config)
  File "/usr/local/lib/python2.7/dist-packages/spreads/cli.py", line 358, in postprocess
    workflow = spreads.workflow.Workflow(config=config, path=path)
  File "/usr/local/lib/python2.7/dist-packages/spreads/workflow.py", line 445, in __init__
    for img in (self.path/'data'/'raw').iterdir()]
  File "/usr/local/lib/python2.7/dist-packages/pathlib.py", line 982, in iterdir
    for name in self._accessor.listdir(self):
  File "/usr/local/lib/python2.7/dist-packages/pathlib.py", line 346, in wrapped
    return strfunc(str(pathobj), *args)
OSError: [Errno 2] No such file or directory: 'out1/data/raw'

Trying to fix this by generating the missing folder does not work:

$ mkdir out1/data/raw
$ spread --loglevel debug --verbose postprocess out1
Workflow: Initializing workflow out1
bagit: Adding path /home/asartori/bookscan/JConch/vol1/out1/bag-info.txt to payload
bagit: Adding path out1/config.yml to payload
Workflow: Starting postprocessing...%
Workflow: Running 'process' hooks
spreadsplug.tesseract: Performing OCR
spreadsplug.tesseract: Language is "chi_sim"
bagit: Path out1/data/done is an empty directory , will be skipped.
bagit: Adding path /home/asartori/bookscan/JConch/vol1/out1/bag-info.txt to payload
bagit: Adding path out1/pagemeta.json to payload
Workflow: Done with postprocessing!

OCR was not performed, but Spreads exits without error.

2-) post-processing the JPGs from my cameras trying to invoke scantailor via spreads:

$ spread --verbose postprocess vol2

This results in the same error as scenario 1 (OSError: [Errno 2] No such file or directory: 'vol2/data/raw')

After creating the missing folder, the new output reveals that scantailor is not being invoked correctly by Spreads:

spread --verbose postprocess vol2
Workflow: Initializing workflow vol2
bagit: Adding path /home/asartori/bookscan/JConch/vol2/bag-info.txt to payload
bagit: Adding path /home/asartori/bookscan/JConch/vol2/bag-info.txt to payload
bagit: Adding path /home/asartori/bookscan/JConch/vol2/bag-info.txt to payload
bagit: Adding path vol2/config.yml to payload
Workflow: Starting postprocessing...%
Workflow: Running 'process' hooks
spreadsplug.scantailor: Generating ScanTailor configuration
spreadsplug.scantailor: /usr/bin/scantailor-cli --start-filter=2 --end-filter=5 --layout=1.5 -o=/tmp/tmpX_hohD.ScanTailor --margins-top=2.5 --margins-right=2.5 --margins-bottom=2.5 --margins-left=2.5 /tmp/st-out4zVnEq

Scan Tailor is a post-processing tool for scanned pages.
Version: 0.9.11.1

ScanTailor usage: 
    1) scantailor
    2) scantailor <project_file>
    3) scantailor-cli [options] <image, image, ...> <output_directory>
    4) scantailor-cli [options] <project_file> [output_directory]

1)
    start ScanTailor's GUI interface
2)
    start ScanTailor's GUI interface and load project file
3)
    batch processing images from command line; no GUI
4)
    batch processing project from command line; no GUI
    if output_directory is specified as last argument, it overwrites the one in project file

Options:
    --help, -h
    --verbose, -v
    --layout=, -l=<0|1|1.5|2>       -- default: 0
              0: auto detect
              1: one page layout
            1.5: one page layout but cutting is needed
              2: two page layout
    --layout-direction=, -ld=<lr|rl>    -- default: lr
    --orientation=<left|right|upsidedown|none>
                        -- default: none
    --rotate=<0.0...360.0>          -- it also sets deskew to manual mode
    --deskew=<auto|manual>          -- default: auto
    --content-detection=<cautious|normal|aggressive>
                        -- default: normal
    --content-box=<<left_offset>x<top_offset>:<width>x<height>>
                        -- if set the content detection is se to manual mode
                           example: --content-box=100x100:1500x2500
    --margins=<number>          -- sets left, top, right and bottom margins to same number.
        --margins-left=<number>
        --margins-right=<number>
        --margins-top=<number>
        --margins-bottom=<number>
    --alignment=center          -- sets vertical and horizontal alignment to center
        --alignment-vertical=<top|center|bottom>
        --alignment-horizontal=<left|center|right>
    --dpi=<number>              -- sets x and y dpi. default: 600
        --dpi-x=<number>
        --dpi-y=<number>
    --output-dpi=<number>           -- sets x and y output dpi. default: 600
        --output-dpi-x=<number>
        --output-dpi-y=<number>
    --color-mode=<black_and_white|color_grayscale|mixed>
                        -- default: black_and_white
    --white-margins             -- default: false
    --normalize-illumination        -- default: false
    --threshold=<n>             -- n<0 thinner, n>0 thicker; default: 0
    --despeckle=<off|cautious|normal|aggressive>
                        -- default: normal
    --dewarping=<off|auto>          -- default: off
    --depth-perception=<1.0...3.0>      -- default: 2.0
    --start-filter=<1...6>          -- default: 4
    --end-filter=<1...6>            -- default: 6
    --output-project=, -o=<project_name>

spreadsplug.tesseract: Performing OCR%
spreadsplug.tesseract: Language is "chi_sim"
bagit: Path vol2/data/done is an empty directory , will be skipped.
bagit: Adding path /home/asartori/bookscan/JConch/vol2/bag-info.txt to payload
bagit: Adding path vol2/pagemeta.json to payload
Workflow: Done with postprocessing!

I suspect the problem is possibly just to do with the project folder structure that Spreads expects to find (created during the capture workflow that I am skipping).
Any ideas on how I could fix this would be greatly appreciated!

@adongy
Copy link
Contributor

adongy commented Jun 12, 2015

There was a recent patch to use spreads with older workflows, can you try it? I can't find it again, but it was fairly recent'

@afsartori
Copy link
Author

Thank you for your suggestion. I ended up adapting parts of spreads' source
code to achieve what I need via a separate python script.

On 12 June 2015 at 12:23, Anthony Dong [email protected] wrote:

There was a recent patch to use spreads with older workflows, can you try
it? I can't find it again, but it was fairly recent'


Reply to this email directly or view it on GitHub
#195 (comment)
.

@jbaiter jbaiter added the bug label Jun 19, 2015
@alclary
Copy link

alclary commented Nov 4, 2015

I, as well, am trying to use spreads for strictly its post-processing tool chain. Yet, like the issue exhibited above, I can not find a way to initiate the post-processing process. I have manually created a /data/raw directory for unprocessed .tif(s). How does the capture process signal to spread to initiate post-processing. Is there a way I can spoof this signal so that I can initiate post-processing without actually using spread's capture?

I would really like to see this as a supported feature. I am not a developer, but it appears to me a fairly simple implementation? Could there be an option to upload images to a workflow's raw folder, and then initiate from the post-processing step, skipping the capture step?

EDIT: I just realized I can initiate the processing stage via the API, but it still does not process anything. What is the minimum amount of meta data necessary for post-processing to proceed correctly?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants