[capitol-words] ported crec scraper to work as a celery task in django #17

will-horning · 2017-07-28T18:42:50Z

This ports the crec scraper to run in the django app as a celery task. I've also included a couple extensions to django+celery. The first allows you to schedule cronlike events for a celery task in the django admin ui. The second allows you to return a value from that task and then store that value in django's db (both of these go through django's orm). Right now it just reports whether or not it succeeded, but we can include more detailed info in that result data so a maintainer can use the django admin ui to inspect the scraper status.

I've updated the pip requirements file, but looking at how much got added I think I may have run pip freeze while outside the virtualenv. Let me know if that looks weird.

You'll also need a local instance of rabbit running, if you don't already have one you can just install via brew (no other config stuff needed).

After installing the new dependencies you'll need to run a couple migrations:

python manage.py migrate
python manage.py migrate django_celery_results

I think thats all of them, let me know if it doesn't work.

Next up you need to run three processes:

The django server, just start it up with python manage.py runserver as usual.
The celery workers: celery -A capitolweb worker -l info.
The celery scheduler: celery -A capitolweb beat -l debug -S django

Finally you can test running the script via the django admin ui, navigate to 127.0.0.1:8000/admin. From there, you click on "Periodic Tasks" under "Django Celery Beat", then "Add Periodic Task". In the form, select the only option in the drop down menu next to "Task (Registered)". Check the "Enabled" box, then set a schedule (you'll need to add one via the plus sign icon, that form is self-explanatory). Set it to something short so it'll execute soon. Collisions is one thing I haven't accounted for, I'm thinking a tracking table in rds or something, but for now you'll want to disable that periodic task entry once the worker starts executing.

@butlern We'll need to add something to the cloudformation template to install rabbit and set up credentials for it. I don't think we need a remote instance of it, so credentials may not really be necessary if we can block that port for any external requests.

…rride arguments from the ui to scrape all days within a datetime range

will-horning added 2 commits July 28, 2017 14:16

[capitol-words] ported crec scraper to work as a celery task in django

35366be

[capitol-words] factored out some config stuff, added support for ove…

a6a63cd

…rride arguments from the ui to scrape all days within a datetime range

rmangi force-pushed the master branch from c4ce53d to 1119f83 Compare July 30, 2017 22:25

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[capitol-words] ported crec scraper to work as a celery task in django #17

[capitol-words] ported crec scraper to work as a celery task in django #17

will-horning commented Jul 28, 2017

[capitol-words] ported crec scraper to work as a celery task in django #17

Are you sure you want to change the base?

[capitol-words] ported crec scraper to work as a celery task in django #17

Conversation

will-horning commented Jul 28, 2017