UW-Madison Course Guide Crawler

A web scraper that reads the information from the public UW-Madison course guide and saves it in a SQLite database. It uses Scrapy, an open source web scraping library designed using Python. To learn more about it, check out its documentation

Installation

Before you start, you'll need to install a few system libraries:

Python 2.7
libxml2 / libxslt
libffi
OpenSSL
SQLite 3

You'll also need a C compiler. The Scrapy installation notes will help you get started.

Then, clone the repo and install the project's dependencies with pip:

pip install -r requirements.txt

Getting started

Currently the only way to run the crawler is with the scrapy command-line tool. This must be run inside the UWMadCrawler directory, as so:

cd UWMadCrawler
scrapy crawl UWMad

This will crawl the website and print out the courses on the register.

It will create a SQLite database inside this directory, named classes.db. Courses and their sections are saved into different tables, along with (almost) all of the information available about them in the course guide. Sections are related to courses by sharing the same department, course number, and course title.

Courses are not expected to change very often, so the scraper will avoid re-adding them if they are already in the database. However, sections are saved along with the time of last modification found on the course guide itself. This means that you can scrape multiple times and (hopefully!) have the right thing happen.

Notes

This program is a fork of the UWMadCrawler by Joe Kelley, originally released under the MIT license (see NOTICE).

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
UWMadCrawler		UWMadCrawler
.gitignore		.gitignore
LICENSE		LICENSE
NOTICE		NOTICE
README.md		README.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UW-Madison Course Guide Crawler

Installation

Getting started

Notes

About

Releases

Packages

Languages

License

badgerherald/Course-Guide-Crawler

Folders and files

Latest commit

History

Repository files navigation

UW-Madison Course Guide Crawler

Installation

Getting started

Notes

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages