You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To crawl through the IAR site and grab the PDF's, which get sent to our FTP server
To parse the PDF's and turn them into JSON
What I think we can do is have 2 jobs:
one that triggers/schedules the 1st script to fetch the PDF's
another one that watches the directory on the ftp server and runs the ruby parsing script when the directory changes. It would then get the JSON output somehow and push that through the normal data processing pipeline that roach typically uses (ie. crawler -> redis -> rabbitMQ).
We could just use the ruby parser that has already been written and try to integrate that into roach as a job.
The text was updated successfully, but these errors were encountered: