Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[capitol-words] ported crec scraper to work as a celery task in django #17

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

will-horning
Copy link

This ports the crec scraper to run in the django app as a celery task. I've also included a couple extensions to django+celery. The first allows you to schedule cronlike events for a celery task in the django admin ui. The second allows you to return a value from that task and then store that value in django's db (both of these go through django's orm). Right now it just reports whether or not it succeeded, but we can include more detailed info in that result data so a maintainer can use the django admin ui to inspect the scraper status.

I've updated the pip requirements file, but looking at how much got added I think I may have run pip freeze while outside the virtualenv. Let me know if that looks weird.

You'll also need a local instance of rabbit running, if you don't already have one you can just install via brew (no other config stuff needed).

After installing the new dependencies you'll need to run a couple migrations:

python manage.py migrate
python manage.py migrate django_celery_results

I think thats all of them, let me know if it doesn't work.

Next up you need to run three processes:

  1. The django server, just start it up with python manage.py runserver as usual.
  2. The celery workers: celery -A capitolweb worker -l info.
  3. The celery scheduler: celery -A capitolweb beat -l debug -S django

Finally you can test running the script via the django admin ui, navigate to 127.0.0.1:8000/admin. From there, you click on "Periodic Tasks" under "Django Celery Beat", then "Add Periodic Task". In the form, select the only option in the drop down menu next to "Task (Registered)". Check the "Enabled" box, then set a schedule (you'll need to add one via the plus sign icon, that form is self-explanatory). Set it to something short so it'll execute soon. Collisions is one thing I haven't accounted for, I'm thinking a tracking table in rds or something, but for now you'll want to disable that periodic task entry once the worker starts executing.

@butlern We'll need to add something to the cloudformation template to install rabbit and set up credentials for it. I don't think we need a remote instance of it, so credentials may not really be necessary if we can block that port for any external requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant