This directory contains scripts for collecting and processing UIUC course and building data. The process is split into multiple stages to collect, transform, and load the data.
-
oneshotscraper.py
- Scrapes all course data from courses.illinois.edu (currently only scraping 2024 fall potB in person courses with fixed meeting location and times)
- Output:
subject_data.json
-
subject_to_buildings.py
- Transforms subject-sorted data into building-sorted data
- Input:
subject_data.json
- Output:
building_data.json
-
filterbuildings.py
- Filters buildings based on criteria (minimum rooms, exclusion list)
- Input:
building_data.json
- Output:
filtered_buildings.json
-
addbuildinghours.py
- Adds operating hours to each building
- Input:
filtered_buildings.json
,buildinghours.json
- Updates:
filtered_buildings.json
-
addbuildingcoordinates.py
- Adds geographical coordinates to each building
- Input:
filtered_buildings.json
,uiuc_buildings.geojson
- Updates:
filtered_buildings.json
-
load_to_postgres.py
- Loads the final data into PostgreSQL database
- Input:
filtered_buildings.json
- Creates and populates database tables
Web Data → subject_data.json → building_data.json → filtered_buildings.json
↓
[+ building hours & coordinates]
↓
Database Load
- Install required packages:
pip install curl-cffi==0.7.3 beautifulsoup4==4.12.3 supabase==1.0.4 python-dotenv==1.0.0
- Run the scripts in sequence:
python3 oneshotscraper.py
python3 subject_to_buildings.py
python3 filterbuildings.py
python3 addbuildinghours.py
python3 addbuildingcoordinates.py
python3 load_to_postgres.py
- Minimum 7 rooms per building
- Certain buildings explicitly excluded (see filterbuildings.py)
CREATE TABLE buildings (
name TEXT PRIMARY KEY,
latitude DECIMAL(10,8),
longitude DECIMAL(10,8),
monday_open TIME,
monday_close TIME,
...
sunday_open TIME,
sunday_close TIME
);
CREATE TABLE rooms (
building_name TEXT REFERENCES buildings(name),
room_number TEXT,
PRIMARY KEY (building_name, room_number)
);
CREATE TABLE class_schedule (
building_name TEXT,
room_number TEXT,
course_code TEXT NOT NULL,
course_title TEXT NOT NULL,
start_time TIME NOT NULL,
end_time TIME NOT NULL,
day_of_week CHAR(1) NOT NULL,
FOREIGN KEY (building_name, room_number) REFERENCES rooms(building_name, room_number)
);
buildinghours.json
: Building operating hoursuiuc_buildings.geojson
: Building coordinate data
subject_data.json
: Raw course data organized by subjectbuilding_data.json
: Data reorganized by building and roomfiltered_buildings.json
: Final processed building data including hours and coordinates