Skip to content

Commit

Permalink
updates
Browse files Browse the repository at this point in the history
  • Loading branch information
gboeing committed Jan 8, 2024
1 parent 634b9ab commit 77a8d6e
Show file tree
Hide file tree
Showing 32 changed files with 1,312 additions and 2,051 deletions.
19 changes: 9 additions & 10 deletions .github/workflows/tests.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,7 @@ on:
branches: [main]

jobs:

build:

name: ${{ matrix.os }}
runs-on: ${{ matrix.os }}
strategy:
Expand All @@ -19,24 +17,21 @@ jobs:

defaults:
run:
shell: bash -l {0}
shell: bash -elo pipefail {0}

steps:

- name: Checkout repo
uses: actions/checkout@v3
with:
fetch-depth: 2

- name: Setup Conda environment with Micromamba
uses: mamba-org/provision-with-micromamba@v14
- name: Create environment with Micromamba
uses: mamba-org/setup-micromamba@v1
with:
cache-downloads: true
cache-env: true
channels: conda-forge
channel-priority: strict
cache-environment: true
environment-file: environment.yml
environment-name: ppde642
post-cleanup: none

- name: Test environment
run: |
Expand All @@ -45,3 +40,7 @@ jobs:
conda info --all
jupyter kernelspec list
ipython -c "import osmnx; print('OSMnx version', osmnx.__version__)"
- name: Lint
run: |
SKIP=no-commit-to-branch pre-commit run --all-files
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,7 @@
data/*
modules/*/*.gal
modules/*/*.png
modules/*/cache/*
modules/*/keys.py
syllabus/pdf/*.pdf

Expand Down
50 changes: 50 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
repos:
- repo: https://github.com/pre-commit/pre-commit-hooks
rev: "v4.5.0"
hooks:
- id: check-added-large-files
args: [--maxkb=50]
- id: check-ast
- id: check-builtin-literals
- id: check-case-conflict
- id: check-docstring-first
- id: check-json
- id: check-merge-conflict
args: [--assume-in-merge]
- id: check-yaml
- id: debug-statements
- id: detect-private-key
- id: end-of-file-fixer
- id: fix-byte-order-marker
- id: mixed-line-ending
- id: no-commit-to-branch
args: [--branch, main]
- id: trailing-whitespace

- repo: https://github.com/pre-commit/mirrors-prettier
rev: "v3.0.3"
hooks:
- id: prettier
types_or: [markdown, yaml]

- repo: https://github.com/nbQA-dev/nbQA
rev: "1.7.1"
hooks:
- id: nbqa-isort
additional_dependencies: [isort]
args: [--line-length=100, --sl]
- id: nbqa-black
additional_dependencies: [black]
args: [--line-length=100]
- id: nbqa-flake8
additional_dependencies: [flake8]
args: [--max-line-length=100]

- repo: local
hooks:
- id: nbconvert
name: clear notebook output
entry: jupyter nbconvert
language: system
types: [jupyter]
args: ["--clear-output", "--inplace"]
4 changes: 0 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/gboeing/ppde642/main?urlpath=lab)
[![Build Status](https://github.com/gboeing/ppde642/workflows/tests/badge.svg?branch=main)](https://github.com/gboeing/ppde642/actions?query=workflow%3A%22tests%22)


# PPDE642: Advanced Urban Analytics

This is the second part of a two-course series on **urban data science** that I teach at the **University of Southern California**'s Department of Urban Planning and Spatial Analysis.

This course series takes a computational social science approach to working with urban data. It uses Python and Jupyter notebooks to introduce coding and statistical methods that students can reproduce and experiment with in the cloud. The series as a whole presumes no prior knowledge as it introduces coding, stats, spatial analysis, and applied machine learning from the ground up, but PPDE642 assumes you have completed [PPD534](https://github.com/gboeing/ppd534) or its equivalent.


## Urban Data Science course series

### PPD534: Data, Evidence, and Communication for the Public Good
Expand All @@ -17,14 +15,12 @@ The first course in the series, **PPD534**, starts with the basics of coding wit

**PPD534**'s lecture materials are available on [GitHub](https://github.com/gboeing/ppd534) and interactively on [Binder](https://mybinder.org/v2/gh/gboeing/ppd534/main).


### PPDE642: Advanced Urban Analytics

The second course, **PPDE642**, assumes you have completed PPD534 (or its equivalent) and builds on its topics. It introduces spatial analysis, network analysis, spatial models, and applied machine learning. It also digs deeper into the tools and workflows of urban data science in both research and practice.

**PPDE642**'s lecture materials are available in this repo and interactively on [Binder](https://mybinder.org/v2/gh/gboeing/ppde642/main).


## Not a USC student?

Did you discover this course on GitHub? Come study with us: [consider applying](https://geoffboeing.com/lab/) to the urban planning master's or PhD programs at USC.
Expand Down
6 changes: 3 additions & 3 deletions assignments/assignment2.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ You will clean, organize, describe, and visualize the data you downloaded in Ass

Create a new Jupyter notebook. The first cell of your notebook should be markdown explaining what your research question and hypotheses are, where you found your data set, and what it contains. Given your proposed project:

1. Load your data set and clean/process it as needed.
1. Identify at least two variables of interest and calculate relevant descriptive statistics.
1. Using the techniques we learned in class, visualize interesting aspects of your data set. Create at least 4 visualizations using at least 3 different visualization types (e.g., scatterplots, barplots, maps, etc).
1. Load your data set and clean/process it as needed.
1. Identify at least two variables of interest and calculate relevant descriptive statistics.
1. Using the techniques we learned in class, visualize interesting aspects of your data set. Create at least 4 visualizations using at least 3 different visualization types (e.g., scatterplots, barplots, maps, etc).

Make sure your code is well-commented throughout for explanatory clarity. Your notebook should be well-organized into high-level sections using markdown headers representing the steps above, plus subheaders as needed. Each visualization should be followed by a markdown cell that explains what you are visualizing, why it is interesting, and why you made your specific graphical design decisions. What story does each visual tell? How does it enrich, confirm, or contradict the descriptive statistics you calculated earlier?

Expand Down
6 changes: 3 additions & 3 deletions assignments/assignment4.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,9 @@ You will conduct a spatial analysis using a spatial dataset (ideally the same on

Create a new Jupyter notebook. The first cell of your notebook should be markdown explaining what your research question and hypotheses are, where you found your data set, and what it contains. Use geopandas to load your data set and clean/process it as needed. Make sure your code is well-commented throughout for explanatory clarity. Using the techniques we learned in class, do the following:

1. conduct a spatial analysis to look for hot/cold spots and assess spatial autocorrelation
1. compute spatial diagnostics to pick an appropriate spatial regression model
1. estimate and interpret a spatial regression model
1. conduct a spatial analysis to look for hot/cold spots and assess spatial autocorrelation
1. compute spatial diagnostics to pick an appropriate spatial regression model
1. estimate and interpret a spatial regression model

Your notebook should be separated into high-level sections using markdown headers representing the steps above. Each section should conclude with a markdown cell that succinctly explains your analysis/visuals, why you set it up the way you did, and how you interpret its results. Your notebook should conclude with a markdown cell that explains 1) what evidence does this analysis provide for your research question and hypothesis, 2) what is the "big picture" story, and 3) how can planners or policymakers use this finding.

Expand Down
26 changes: 13 additions & 13 deletions assignments/final-project.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,27 +8,27 @@ The final project is a cumulative assignment that requires you to use the skills

Identify a conference of interest and familiarize yourself with their paper submission requirements. You might consider the following conferences, among others:

- Transportation Research Board (TRB)
- Association of Collegiate Schools of Planning (ACSP)
- American Planning Association's National Planning Conference (APA)
- American Association of Geographers (AAG)
- Urban Affairs Association (UAA)
- Transportation Research Board (TRB)
- Association of Collegiate Schools of Planning (ACSP)
- American Planning Association's National Planning Conference (APA)
- American Association of Geographers (AAG)
- Urban Affairs Association (UAA)

Develop an urban research question that fits with the themes of your chosen conference. Develop a research design to answer this question, then collect data, clean and organize it, visualize it, and analyze it.

Write a conference paper organized into five sections:

1. introduction: provide a short (3 paragraph) summary of the study's importance, methods, and findings/implications (1 paragraph each)
2. background: explain the context of your study and provide a short lit review of relevant related work to establish what is known and what urgent open questions remain
3. methods: present your data and your analysis methods with sufficient detail that a reader could reproduce your study
4. results: present your findings and include supporting visuals
5. discussion: circle back to your research question, interpret your findings, and discuss their importance and how planners or policymakers could use them to improve some aspect of urban living
1. introduction: provide a short (3 paragraph) summary of the study's importance, methods, and findings/implications (1 paragraph each)
2. background: explain the context of your study and provide a short lit review of relevant related work to establish what is known and what urgent open questions remain
3. methods: present your data and your analysis methods with sufficient detail that a reader could reproduce your study
4. results: present your findings and include supporting visuals
5. discussion: circle back to your research question, interpret your findings, and discuss their importance and how planners or policymakers could use them to improve some aspect of urban living

Format your paper according to the conference's guidelines. For the purposes of this course, your paper must be at least 3000 words in length (not including tables, figures, captions, or references). It must include the following, at a minimum:

- a table of descriptive statistics
- a table of spatial regression or machine learning model results
- 4 aesthetically-pleasing figures containing data visualizations including at least 1 map
- a table of descriptive statistics
- a table of spatial regression or machine learning model results
- 4 aesthetically-pleasing figures containing data visualizations including at least 1 map

You are strongly encouraged, but not required, to actually submit this paper to the conference.

Expand Down
22 changes: 11 additions & 11 deletions assignments/mini-lecture.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,21 +8,21 @@ This exercise is intended to be informal and an opportunity for self-discovery.

Instructions:

- Pick a method listed in the syllabus for those weeks or covered in the reading material.
- Learn how the method works by reading the week's reading material.
- Practice the method in your own notebook on your own data.
- Google for additional usage examples and further information.
- Prepare a mini-lecture notebook that would take 8-10 minutes to present that 1) briefly introduces why someone would use the method and how it works (~2 minutes), 2) demonstrates in code how to use the method for a simple data analysis (~5 minutes), 3) summarizes what the analysis revealed (~2 minutes).
- Pick a method listed in the syllabus for those weeks or covered in the reading material.
- Learn how the method works by reading the week's reading material.
- Practice the method in your own notebook on your own data.
- Google for additional usage examples and further information.
- Prepare a mini-lecture notebook that would take 8-10 minutes to present that 1) briefly introduces why someone would use the method and how it works (~2 minutes), 2) demonstrates in code how to use the method for a simple data analysis (~5 minutes), 3) summarizes what the analysis revealed (~2 minutes).

8 minutes is not a lot of time, so keep your lecture notebook simple and brief. Have a clean dataset ready to go at the beginning of your lecture. Do not show us a lot of preparatory steps setting things up in your notebook. Jump right into the analysis that demonstrates your method.
8 minutes is not a lot of time, so keep your lecture notebook simple and brief. Have a clean dataset ready to go at the beginning of your lecture. Do not show us a lot of preparatory steps setting things up in your notebook. Jump right into the analysis that demonstrates your method.

You will be graded according to the following. In your notebook, did you:

- summarize why someone would use this method and how it works, at a high-level
- demonstrate the method with a simple data analysis
- summarize what your analysis revealed
- keep it all succinct
- summarize why someone would use this method and how it works, at a high-level
- demonstrate the method with a simple data analysis
- summarize what your analysis revealed
- keep it all succinct

Make sure your notebook runs from the top without any errors (i.e., restart the kernel and run all cells) and that all the output can be seen inline without me having to re-run your notebook. Via Blackboard, submit your notebook and data files, all zipped as a single file, named `LastName_FirstName_Lecture.zip`. If your submission file exceeds Blackboard's maximum upload size limit, you may provide a Google Drive link to your zipped data in the comment field when you submit.

Note that if you pick a supervised learning method, your assignment is due prior to class in module 11. If you pick an unsupervised learning method, your assignment is due prior to class in module 12. The "presentation" is pretend: you are just creating the lecture notebook you would have presented, and submitting it via Blackboard *before that module's class session* begins.
Note that if you pick a supervised learning method, your assignment is due prior to class in module 11. If you pick an unsupervised learning method, your assignment is due prior to class in module 12. The "presentation" is pretend: you are just creating the lecture notebook you would have presented, and submitting it via Blackboard _before that module's class session_ begins.
19 changes: 10 additions & 9 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,32 +5,33 @@ channels:

dependencies:
- beautifulsoup4
- black
- cartopy
- cenpy
- conda
- contextily
- dill
- flake8
- folium
- gensim
- geopandas
- isort
- jupyterlab
- mapclassify
- osmnx=1.8.1
- nbqa
- nltk
- pandana
- pandas
- pre-commit
- pysal
- python=3.11.*
- rasterio
- rtree
- seaborn
- scikit-learn
- scipy
- statsmodels

# computer vision and NLP
- gensim
- nltk
- pillow
- pytorch
- torchvision

# others (unused)
# bokeh
# datashader
# holoviews
Expand Down
6 changes: 0 additions & 6 deletions format.sh

This file was deleted.

19 changes: 6 additions & 13 deletions modules/01-introduction/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,29 +2,24 @@

In this module, we introduce the course, the syllabus, the semester's expectations and schedule, and set up the computing environment for coursework. Then we introduce the foundational tools underlying much of the modern data science world: package management, version control, and computational notebooks.


## Syllabus

The syllabus is in the [syllabus](../../syllabus) folder.


## Computing environment

Make sure that you have already completed the course's initial [software](../../software) setup before proceeding.


## Package management

A Python **module** is a file of Python code containing variables, classes, functions, etc. A Python **package** is a collection of modules, kind of like a folder of files and subfolders. A package can be thought of as a computer program.

**Package management** is the process of installing, uninstalling, configuring, and upgrading packages on a computer. A **package manager** is a software tool for package management, retrieving information and installing packages from a software repository. The most common Python package managers are `conda` and `pip`. These tools are typically used in the terminal.


### pip

`pip` installs Python packages from [PyPI](https://pypi.org/) in the form of wheels or source code. The latter often requires that you have library dependencies and compatible compilers already installed on your system to install the Python package. This often requires some expertise when installing complicated toolkits, such as the Python geospatial data science ecosystem. For that reason, I recommend using `conda` unless you have to use `pip`.


### conda

`conda` installs packages from Anaconda's software repositories. These packages are binaries, so no compilation is required of the user, and they are multi-language: a package could include Python, C, C++, R, Julia, or other languages. Anaconda software repositories are organized by **channel**. Beyond the "default" channel, the [conda-forge](https://conda-forge.org/) channel includes thousands of community-led packages. `conda` is the recommended way to get started with the Python geospatial data science ecosystem.
Expand All @@ -46,25 +41,23 @@ conda env remove -n ox

Read the `conda` [documentation](https://conda.io/) for more details.


## Urban data science in a computational notebook

During the course's initial software setup, you created a conda environment with all the required packages. The required packages are defined in the course's [environment file](../../environment.yml). These are the tools we will use all semester.

All of the lectures and coursework will utilize Jupyter notebooks. These notebooks provide an interactive environment for working with code and have become standard in the data science world. [Read more](https://doi.org/10.22224/gistbok/2021.1.2).


## Version control

Distributed version control is central to modern analytics work in both research and practice. It allows (multiple) people to collaboratively develop source code while tracking changes. Today, git is the standard tool for version control and source code management. Sites like GitHub provide hosting for git repositories.

GitHub Guides provides an excellent [introduction](https://guides.github.com/) to distributed version control with git, so I will not duplicate it here. Take some time to work through their lessons. You need to understand, at a minimum, how to:

- fork a repo
- clone a repo
- work with branches
- add/commit changes
- push and pull to/from a remote repo
- merge a feature branch into the main branch
- fork a repo
- clone a repo
- work with branches
- add/commit changes
- push and pull to/from a remote repo
- merge a feature branch into the main branch

Start with their guides on the Git Handbook, Understanding the GitHub flow, Forking Projects, Mastering Markdown, and then explore from there.
Loading

0 comments on commit 77a8d6e

Please sign in to comment.