Skip to content

badgerherald/Course-Guide-Crawler

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

UW-Madison Course Guide Crawler

A web scraper that reads the information from the public UW-Madison course guide and saves it in a SQLite database. It uses Scrapy, an open source web scraping library designed using Python. To learn more about it, check out its documentation

Installation

Before you start, you'll need to install a few system libraries:

  • Python 2.7
  • libxml2 / libxslt
  • libffi
  • OpenSSL
  • SQLite 3

You'll also need a C compiler. The Scrapy installation notes will help you get started.

Then, clone the repo and install the project's dependencies with pip:

pip install -r requirements.txt

Getting started

Currently the only way to run the crawler is with the scrapy command-line tool. This must be run inside the UWMadCrawler directory, as so:

cd UWMadCrawler
scrapy crawl UWMad

This will crawl the website and print out the courses on the register.

It will create a SQLite database inside this directory, named classes.db. Courses and their sections are saved into different tables, along with (almost) all of the information available about them in the course guide. Sections are related to courses by sharing the same department, course number, and course title.

Courses are not expected to change very often, so the scraper will avoid re-adding them if they are already in the database. However, sections are saved along with the time of last modification found on the course guide itself. This means that you can scrape multiple times and (hopefully!) have the right thing happen.

Notes

This program is released under the MIT license. Copyright (c) 2014 The Badger Herald, Inc.

This program is a fork of the UWMadCrawler by Joe Kelley, originally released under the MIT license (see NOTICE).

About

A Scrapy spider script to crawl the UW-Madison course register

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages