UW-Madison Course Guide Crawler

A web scraper that reads the information from the public UW-Madison course guide and saves it in a SQLite database. It uses Scrapy, an open source web scraping library designed using Python. To learn more about it, check out its documentation

Installation

Before you start, you'll need to install a few system libraries:

Python 2.7
libxml2 / libxslt
libffi
OpenSSL
SQLite 3

You'll also need a C compiler. The Scrapy installation notes will help you get started.

Then, clone the repo and install the project's dependencies with pip:

pip install -r requirements.txt

Getting started

Currently the only way to run the crawler is with the scrapy command-line tool. This must be run inside the UWMadCrawler directory, as so:

cd UWMadCrawler
scrapy crawl UWMad

This will crawl the website and print out the courses on the register.

It will create a SQLite database inside this directory, named classes.db. Courses and their sections are saved into different tables, along with (almost) all of the information available about them in the course guide. Sections are related to courses by sharing the same department, course number, and course title.

Courses are not expected to change very often, so the scraper will avoid re-adding them if they are already in the database. However, sections are saved along with the time of last modification found on the course guide itself. This means that you can scrape multiple times and (hopefully!) have the right thing happen.

Notes

This program is a fork of the UWMadCrawler by Joe Kelley, originally released under the MIT license (see NOTICE).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

UW-Madison Course Guide Crawler

Installation

Getting started

Notes

Files

README.md

Latest commit

History

README.md

File metadata and controls

UW-Madison Course Guide Crawler

Installation

Getting started

Notes