Skip to content

Latest commit

 

History

History
executable file
·
389 lines (262 loc) · 31.1 KB

README.md

File metadata and controls

executable file
·
389 lines (262 loc) · 31.1 KB

CYPLAN255

course repo for material related to CYPLAN 255 at UC Berkeley, Spring 2022

Getting started

Syllabus

Department of City and Regional Planning, U.C. Berkeley

CYPLAN255: Urban Informatics and Visualization Spring 2022

Instructor Max Gardner / [email protected]
Office hours: Mon 3-4 pm (IRL @ 410b Wurster), Tue 2:30-3:30pm (remote, sign-up here)
GSI Irene Farah / [email protected]
Office hours: Thu 2-4pm / sign-up here
Details Meeting times: Mon/Wed 9:30-11am ("Berkeley time")
Meeting location: 102 Bauer Wurster Auditorium / Zoom (as needed)
Course website: https://bcourses.berkeley.edu/courses/1511685
Course GitHub repository: https://github.com/mxndrwgrdnr/CYPLAN255
Prerequisites: CP201A, CP204C, or equivalent experience
Grading: out of 100 pts – attendance (10%) / assignments (15%) / project (75%)

Overview

The goal of this course is to train students to analyze urban data, derive insights, and create effective visualizations using open source software tools and public data. The course will first introduce the fundamentals of programming in Python before moving on to a survey of data analysis/visualization tools and technologies. Sessions will include lectures and practice exercises. Assignments will reinforce the skills and topics being presented. A final project will provide an opportunity for students to use these skills to complete an end-to-end data analysis of their own design, the results of which will be published on GitHub and presented in class.

This is a "hands-on" course. It requires some tolerance for experimentation, self-directed trial and error, and an interest in learning to write code. If you are willing to roll up your sleeves and embrace some uncertainty, you'll learn the fundamentals of urban data analysis and visualization, and might discover an entirely new lens through which to study, plan, and design neighborhoods, cities, and regions.

Course Materials and Attendance

All required readings will be provided via bCourses or hyperlinks on this electronic syllabus. Lecture slides, example code, and demos/exercises will all be made available via GitHub.

We'll write code in Jupyter notebooks using the Anaconda Python distribution plus some additional software libraries. In some cases you may want to use a Berkeley service called DataHub instead of your own computer – but in general we encourage you to get comfortable installing Python and Python tools on your own computer. You'll get far more comfortable with it that way, and know that whatever you learn, and whatever you install, you can take with you when the class is over. We will use only open source, free software in this course. You'll be surprised how far you can go with it.

You should plan to bring a laptop** to all class sessions.

**NOTE TO WINDOWS USERS: Most exercises and lectures will be OS-agnostic, but command-line tools will be demonstrated in a Unix-like terminal. Windows users can use the Windows command prompt if they choose, but instructor support will be limited. Instead, I recommend installing one of the following to gain access to a Unix-like shell: Windows Subsystem for Linux, Cygwin, Git Bash, or PyCharm.

Campus Resources Related to this Course

Join the Course Slack Channel

D-Lab:

Assignments (15 pts)

Students will develop skills gradually through exercises paced over the semester. These will typically involve writing some code and documenting it, using Jupyter Notebooks that can be shared and interactively run inside a web browser, and providing a writeup discussing the assignment and its results.

Assignments will be posted on the course GitHub repository, and students will need to pull them down from there. Assignments will generally be due one week from the day they are assigned, by 11:59pm PST. Students will submit their completed assignments by opening a pull request on the course repo.

Assignments are designed to build a degree of mastery of skills and will be used as a means of ensuring that students are keeping up with the material and not falling behind. All assignments will be marked down 10% for each day late, so please submit on time.

Readings and Other Assignments (ungraded)

This course has readings associated with nearly every class meeting. These are suggested readings, unless otherwise specified. You will not be quizzed on them, and they may or may not be referenced in class. They are, however, strongly recommended. They have been thoughtfully compiled over the many years this course has been taught, and are designed to help you get the most of this course and make your final projects a success.

In addition to the readings, the Course Schedule (see below) specifies several other exercises which, unless explicitly stated, will not be collected. They will, however, be used for the following: 1) to facilitate discussion/break-out groups in class; 2) to inspire final project ideas; and 3) to ensure that you make steady progress on your final projects throughout the course of the semester. In some cases there will be class time designated for working on these ungraded exercises, but not always. It is in your own best interest, and that of your fellow students, that you keep up with them.

Final Projects (75 pts)

Final projects will require harnessing the skills practiced in the exercises and developing a more independent work plan to accomplish an analysis of data. More details will be provided later in the semester.

Project components and due dates:

  1. Project proposal + initial analysis (10 pts)
    • Due Sunday, Mar 13
  2. Final presentation (20 pts)
    • Slides, etc. presented during the last week of class.
  3. github.io project page (45 pts)
    • Due Monday, May 9 (first day of Finals week)

Self-Reliance, Collaboration, and Academic Integrity

This course requires a lot of experimentation and trial-and-error. Google and StackOverflow will be your best friends! Google your questions, Google any error messages, and if you can't find an answer, talk to your classmates, and if you still can't sort it out, e-mail Max and Irene. When you e-mail us, tell us what you've searched and what you've discovered, and include screenshots, links, and error messages. 99% of the time, somebody else has encountered the exact issue you are having and has documented the solution.

That being said, you are welcome — in fact, encouraged — to work on the homework exercises and your semester project together with other students. Discussing code is a great way to understand it better, and can make tracking down bugs less frustrating. If you copy an entire substantive piece of code (i.e., several lines or more) from the internet or from another student, we ask that you indicate this in a code comment. Otherwise, we will expect everything you submit to be your own original work. Details of the U.C. Berkeley Academic Honor Code can be found here.

Campus Policies and Guidelines

https://teaching.berkeley.edu/campus-policies

Accommodations for Students with Disabilities

UC Berkeley is committed to creating a learning environment that meets the needs of its diverse student body. If you anticipate or experience any barriers to learning in this course, please feel welcome to discuss your concerns with me.

If you have a disability, or think you may have a disability, you can work with the Disabled Students' Program (DSP) to request an official accommodation. The Disabled Students' Program (DSP) is the campus office responsible for authorizing disability-related academic accommodations, in cooperation with the students themselves and their instructors. You can find more information about DSP, including contact information and the application process here. If you have already been approved for accommodations through DSP, please meet with me so we can develop an implementation plan together.

Students who need academic accommodations or have questions about their accommodations should contact DSP, located at 260 César Chávez Student Center. Students may call 510-642-0518 (voice), 510-642-6376 (TTY), or email [email protected].

Department Climate Statement

The Department of City and Regional Planning in the College of Environmental Design is committed to an equitable and inclusive educational environment for all. As students, staff, and faculty, we strive to foster a community in which we celebrate our diversity and affirm the dignity of each person by respecting the identities, perspectives, and experiences of those with whom we work. As a member of the UC Berkeley community, the Department of City and Regional Planning is committed to a safe work environment for all.

The following campus-wide resources are available to assist with this effort:

Reading Material and Web Resources

The following books and websites may be helpful resources, and we will draw material from many of them during the semester. (All readings assigned for class will be available online or as PDFs in bCourses.) Each piece of software we'll use also has official documentation online.

  • Adhikari, Ani and John DeNero, Computational and Inferential Thinking, 2019(https://inferentialthinking.com)

    • Online textbook developed for Berkeley's Foundations of Data Science class.
  • Downey, Allen, Think Python, 2nd Edition, O'Reilly Media, 2015(https://learning-python.com/about-lp5e.html)

    • Introduction to programming using Python. All the material is online.
  • Foster, Ian, et al., Big Data and Social Science, CRC Press, 2017

    • A practical guide to gathering data and working with it in various ways.
  • Lutz, Mark, Learning Python, 5th Edition, O'Reilly Media, 2013(https://learning-python.com/about-lp5e.html)

    • Much more depth than you need for this class, but a great reference.
  • McKinney, Wes, Python for Data Analysis, 2nd Edition, O'Reilly Media, 2017

    • More depth about Pandas than Python Data Science Handbook, but less readable.
  • Pilgrim, Mark, Dive into Python 3, Apress, 2009 (https://diveintopython3.net)

    • Nice tutorials and reference for aspects of core Python syntax and programming concepts, but missing some topics that are in Think Python. All the material is online.
  • VanderPlas, Jake, Python Data Science Handbook, O'Reilly Media, 2016(https://jakevdp.github.io/PythonDataScienceHandbook)

    • Excellent – working with data, making graphs and charts, machine learning. All the material is online.
  • Real Python (https://realpython.com) — Great Python tutorials on numerous topics.

  • Rey, Sergio, et al., Geographic Data Science with Python, 2020 (https://geographicdata.science/book/intro.html)

    • Great resource for geospatial data analysis in Python from the creators of PySAL.
  • Software Carpentry (https://software-carpentry.org/lessons)

    • Tutorials about scientific computing.
  • Stack Overflow (https://stackoverflow.com) — Best website for user-contributed coding Q&As.

Topics + Course Schedule

The topics covered by this course are organized into the following seven (7) modules:

  1. Fundamentals of Programming
  2. Intro to Data Analysis in Python
  3. Intro to Data Visualization
  4. APIs + Open Data
  5. Working with Geospatial Data
  6. Visualizing Spatial Data
  7. Statistical Analysis + Machine Learning

MODULE 1: FUNDAMENTALS OF PROGRAMMING

  • Weds, Jan 19 -- Course Introduction: Overview of the course, expectations, prerequisites, learning objectives, assignments and projects.

  • Mon, Jan 24 -- Intro to the Command-line: Using a command-line interpreter; common syntax, programs, and arguments; accessing and navigating the file system; Python interpreters; conda environments; starting/stopping a Jupyter server; using Git; text editors

  • Weds, Jan 26 -- Git and GitHub: Principles of distributed version control; repositories; commits; branches; forks; making a GitHub pages website

    • Exercises

      • Create your own github.io website by following this helpful tutorial from the Data89 class at cal. For advanced users, take it one step further with a slightly more advanced version here.
        • NOTE: although this is only listed as an "exercise" and not an "assignment", your final project will be submitted as a GitHub Pages website, so it would be wise to get started on this sooner than later.
    • Readings

  • Mon, Jan 31 -- Python at the Command-line Anaconda distro; Python vs. IPython vs. Jupyter; virtual environments; intro to the Jupyter Notebook

    • Assignments

      • Assignment 1 released (due Sun, Feb 6)
    • Exercises

      • Re-read and work your way through "notebooks/lecture_03_intro_python_jupyter.ipynb"
      • Continue to work your way through the GitHub Pages website tutorial.
    • Readings

  • Weds, Feb 2 -- The Python Standard Library Variables, expressions, and assignment; built-in functions and data types; the math module; working with strings and lists and dicts.


MODULE 2: INTRO TO DATA ANALYSIS IN PYTHON

  • Mon, Feb 7 -- Programming Logic: Procedural programming; control flow in Python (conditional logic, loops, functions)

  • Wed, Feb 9 -- Object-oriented Programming: Modules, classes, methods, and functions; namespaces and scopes; lambda functions and map() for iteration

    • Readings
    • Assignments
      • Assignment 2 released (due Tues, Feb 20)
  • Mon, Feb 14 💘 -- Data Analysis in Python: NumPy arrays and matrices; Pandas Series and DataFrames; loading, displaying and exporting data; descriptive statistics; indexing and filtering

  • Wed, Feb 16 -- More Pandas: Vectorized operations; merge, join, concatenate; group by and aggregations; cleaning and imputing missing data

    • Readings
    • Exercises
      • Spend 2-3 hours working through notebooks 7 and 8 on your own

MODULE 3: INTRO TO DATA VISUALIZATION

  • Mon, Feb 21 NO CLASS (President's Day)

    • Exercises
      • Spend 2-3 hours working through notebooks 7 and 8 on your own
    • Assignments
      • Assignment 3 released (due Tues, March 1)
  • Wed, Feb 23 -- Data Visualization Pt. I: Data viz. for good and evil; use Matplotlib and Seaborn to create static images; dimensionality of data; continuous vs. categorical data; univariate distributions

    • Exercises

      • Find three (3) examples of interesting data visualizations and describe in 2-3 sentences what makes each of them good, bad, or misleading. Be prepared to talk about them in class.
    • Readings

  • Mon, Feb 28 -- Data Visualization Pt. II: Interactive plots, widgets, and apps.

    • Readings

MODULE 4: OPEN DATA AND APIs

  • Wed, Mar 2 -- Intro to APIs: What's in an API; performing queries; authentication; Socrata;

    • Assignments

      • Project proposal assignment (Assignment 4) released (Due Sun, Mar 13)
    • Readings

  • Mon, Mar 7 -- APIs and Beyond: Geocoding; web scraping; parsing XML


MODULE 5: WORKING WITH GEOSPATIAL DATA

  • Wed, Mar 9 -- Intro to Geospatial Data Analysis: Vector vs. raster; coordinate reference systems and projections; spatial data types and file formats; spatial indexing; common spatial transformations

  • Mon, Mar 14 -- FOSS tools for Geospatial Data Analysis: Survey of open source tools for manipulating geospatial data from the command-line, a Python session, a browser, or your desktop.

  • Wed, Mar 16 -- Advanced Spatial Statistics with PySAL:

    • Guest speaker: Irene!

    • Readings

      • Chapters 6, 8, and 9 of Geographic Data Science with Python
  • Mon, Mar 21 -- NO CLASS (🏄 Spring Break 🏄)

  • Wed, Mar 23 -- NO CLASS (🏄 Spring Break 🏄)

  • Mon, Mar 28 -- Intro to Network Analysis: Graph theory; GTFS; Python tools for working with networks

    • Readings
      • Boeing, Geoff. "OSMnx: New methods for acquiring, constructing, analyzing, and visualizing complex street networks." Computers, Environment and Urban Systems 65 (2017): 126-139. https://doi.org/10.1016/j.compenvurbsys.2017.05.004
      • Blanchard SD, Waddell P. Assessment of Regional Transit Accessibility in the San Francisco Bay Area of California with UrbanAccess. Transportation Research Record. 2017;2654(1):45-54. https://doi.org/10.3141%2F2654-06
      • Foti, Fletcher, Paul Waddell, and Dennis Luxen. "A generalized computational framework for accessibility: from the pedestrian to the metropolitan scale." Proceedings of the 4th TRB Conference on Innovations in Travel Modeling. Transportation Research Board. 2012. http://onlinepubs.trb.org/onlinepubs/conferences/2012/4thITM/Papers-A/0117-000062.pdf
      • https://www.mapzen.com/blog/animating-transitland/
      • Li, Yang, and Wei "David" Fan. "Modeling and evaluating public transit equity and accessibility by integrating general transit feed specification data: Case study of the City of Charlotte." Journal of Transportation Engineering, Part A: Systems 146.10 (2020): 04020112. Available here.

MODULE 6: VISUALIZING GEOSPATIAL DATA


MODULE 7: STATISTICAL ANALYSIS + MACHINE LEARNING

  • Mon, Apr 11 -- Causal Inference Methods in Urban Science: Deep dive into two examples from the recent literature

    • Readings
      • Greenstone, M. & Gallagher, J. Does Hazardous Waste Matter? Evidence from the Housing Market and the Superfund Program. No. w11790. National Bureau of Economic Research, 2005. Available here.
      • Lawrence, D.F. et al. Stepping towards causation: Do built environments or neighborhood and travel preferences explain physical activity, driving, and obesity? Social Science and Medicine, 2007. Available here
      • Krizek, Kevin. Residential Relocation and Changes in Urban Travel: Does Neighborhood-Scale Urban Form Matter? Journal of the American Planning Association, 2003. Available here.
      • Gardner, Max. The Effect of Rent Control on Eviction Rates: Causal Evidence from San Francisco. Forthcoming, 2022.
  • Wed, Apr 13 -- Special Topics I: Visualizing Transit Data

    • Guest speaker: Kuan Butts (Mapbox)
  • Mon, Apr 18 -- Special Topics II: Geospatial Data Activism

    • Guest speaker: Erin McElroy (cofounder of the Anti-Eviction Mapping Project, the Radical Housing Journal, and Assistant Prof. of American Studies at UT Austin) and Mary Shi (Ph.D. Candidate in Sociology at Cal)
  • Wed, Apr 20 -- Presentations I

  • Mon, Apr 25 -- Presentations II

  • Wed, Apr 27 -- Presentations III


UC Berkeley sits on the territory of xučyun (Huichin), the ancestral and unceded land of the Chochenyo speaking Ohlone people, the successors of the sovereign Verona Band of Alameda County. This land was and continues to be of great importance to the Muwekma Ohlone Tribe and other familial descendants of the Verona Band.

We recognize that every member of the Berkeley community has, and continues to benefit from, the use and occupation of this land, since the institution's founding in 1868. Consistent with our values of community, inclusion and diversity, we have a responsibility to acknowledge and make visible the university's relationship to Native peoples.

It is vitally important that we not only recognize the history of the land on which we stand, but also, we recognize that the Muwekma Ohlone people are alive and flourishing members of the Berkeley and broader Bay Area communities today.

Read more on the Centers for Educational Justice & Community Engagement website.