This repository contains exercise files for the second and third semester data journalism courses taught at the Craig Newmark Graduate School of Journalism. The files cover scrapers, Python 101, pandas and machine learning.
- Please address questions with ([email protected])
Data sets are everywhere. In public sources, like election results, budgets and census reports; semi-public and private datasets, like hidden company information; in cross referencing people and organizations in documents and databases to discover conflicts of interest; in social media updates, images and video uploads. Data has become an invaluable resource for journalists to expose stories buried in the numbers and find relevant facts to shape them in newsworthy ways to produce great stories. And today, no matter if your goal is to cover a daily beat or to do enterprise or investigative stories, you are expected to be able to use it.
In this course, you will build the skills you need to do data journalism:
- Data journalism history and principles.
- How to find and acquire data using automated means (scraping!), as well as how to negotiate access to data with officials by using FOIA/FOIL.
- Work with common data formats and different types of data, as well as to understand what sort of data are in rows and columns.
- Discover how to spot errors, deal with missing values and messy data.
- How to clean data, normalize it, analyze it and test your results using basic math, statistics and data journalism tools
- To mix data skills with on-the ground reporting to be able to discover newsworthy stories in data and answer questions to do accountability journalism that serves the public interest.
Most importantly, we want to focus on getting you the skills you need to find stories in data and be able to come to your editor with data-driven pitches.
All code in this repository is available under the MIT License. The data file in the output/ directory is available under the Creative Commons Attribution 4.0 International (CC BY 4.0) license. All files in the data/ directory are released into the public domain.
Contact Lam Thuy Vo at [email protected].