Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Easy off on crawling #36

Closed
2 tasks done
ojongerius opened this issue Oct 30, 2017 · 1 comment
Closed
2 tasks done

Easy off on crawling #36

ojongerius opened this issue Oct 30, 2017 · 1 comment

Comments

@ojongerius
Copy link
Member

ojongerius commented Oct 30, 2017

We crawl once a day, for bigger sites this can imply visiting all the job urls, of which me might have seen most. Some solutions:

  • Fetch only new jobs (new that 24 or 28 hours).
  • Save job urls (we can search, or add functionality to search on URL to the REST API), and check when it was last seen.
  • merge PR with a solution
  • re-enble scheduled jobs
@ojongerius
Copy link
Member Author

Until we can update objects in #26 , I decided to not revisit URLs that are already associated with an existing job. This will solve most of our use cases; we save on a lot of unnecessary traffic, and I don't expect objects to change often.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant