[capitol-words] ported crec scraper to work as a celery task in django #17
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This ports the crec scraper to run in the django app as a celery task. I've also included a couple extensions to django+celery. The first allows you to schedule cronlike events for a celery task in the django admin ui. The second allows you to return a value from that task and then store that value in django's db (both of these go through django's orm). Right now it just reports whether or not it succeeded, but we can include more detailed info in that result data so a maintainer can use the django admin ui to inspect the scraper status.
I've updated the pip requirements file, but looking at how much got added I think I may have run pip freeze while outside the virtualenv. Let me know if that looks weird.
You'll also need a local instance of rabbit running, if you don't already have one you can just install via brew (no other config stuff needed).
After installing the new dependencies you'll need to run a couple migrations:
I think thats all of them, let me know if it doesn't work.
Next up you need to run three processes:
python manage.py runserver
as usual.celery -A capitolweb worker -l info
.celery -A capitolweb beat -l debug -S django
Finally you can test running the script via the django admin ui, navigate to
127.0.0.1:8000/admin
. From there, you click on "Periodic Tasks" under "Django Celery Beat", then "Add Periodic Task". In the form, select the only option in the drop down menu next to "Task (Registered)". Check the "Enabled" box, then set a schedule (you'll need to add one via the plus sign icon, that form is self-explanatory). Set it to something short so it'll execute soon. Collisions is one thing I haven't accounted for, I'm thinking a tracking table in rds or something, but for now you'll want to disable that periodic task entry once the worker starts executing.@butlern We'll need to add something to the cloudformation template to install rabbit and set up credentials for it. I don't think we need a remote instance of it, so credentials may not really be necessary if we can block that port for any external requests.