As before, content contained in the data labs for subunits 3-1 and 3-2 is fairly emotionally charged. Please consider offering content warnings before engaging in this conversation with students, and consider cloning this lab and removing or swapping out content for less charged material if you feel that is what your students may need.
This is also the first lesson where students will likely need to use either a countif function or a pivot table to process the presented data. You may want to model this for students.
Your teacher will tell you whether and how you need to submit your discussion notes from this lab.
- How have you seen crime data used in the media?
- Where do you think crime data comes from?
- How might crime data be biased?
As before, please consider omitting or replacing this example depending on your comfort level presenting this to students.
From an editorial in the Chicago Tribune (May 25, 2018) |
---|
Only two thirds of murders result in arrests, which means that the homicide data are missing at least a third of actual incidents. And murders are unusual in that we typically have the body, so we know a crime actually occurred. That's not the case with assaults, rapes, thefts or illegal gun possession. There's no reason to think that the majority of these crimes lead to arrests, or that all arrests are related to actual crimes. Reports aren't much better. People decide whether to report in a cultural context. For example, they're more likely to do so if they trust the police, and the level of trust can vary sharply over time. A year after Donald Trump was elected president, the number of reported rapes among the Latino population of Houston declined by 40 percent, a strong indication that people became afraid to report the crimes. Police often don't take rape victims' reports seriously, a problem that is probably even worse for male victims. |
- What forms of bias are discussed in the passage above?
- What do you think the author believes about crime data?
- How does the author consider missing or absent data in their analysis?
- What assumptions does this author make in analyzing the data? Do you see these assumptions as forms of bias?
The popular magazine Literary Digest commissioned a pre-election survey in 1936 in order to predict the winner of the upcoming Presidential election between Alfred Landon, Republican Governor of Kansas, and Franklin D. Roosevelt, the incumbent Democratic President. Although you probably know who ultimately won the election, we're going to examine the survey results collected by the Literary Digest.
Make a copy of this Google Sheet, a representative subset of the full dataset: https://docs.google.com/spreadsheets/d/1e7aqBiZEgnJV0pjK0yA9J4DPgRNAuwVeVuTmBJJqGgI/edit?usp=sharing
Notes about the data collection:
- Based on every telephone directory in the United States, lists of magazine subscribers, rosters of clubs and associations, and other sources, a mailing list of about 10 million names was created.
- The list of 10 million names represented every county in the United States.
- Every name on this list was mailed a mock ballot and asked to return the marked ballot to the magazine.
- Around 2.4 million ballots were returned to the Literary Digest.
- Based on the data, who did the Literary Digest predict would win the 1936 election?
- By what margin did the Literary Digest predict the winner would win?
(The subset presented here only includes 1000 ballots, but the distribution is about the same as you would see in the 2.4 million actual responses.)
The actual result of the election was 62% for Roosevelt vs. 38% for Landon, a 24-point spread.
- How could the Literary Digest have gotten their poll so wrong?
- What sources of bias can you see in the data collection?
Hint: How were participants selected? and what was the response rate?
- How would those sources of bias have impacted the data collected?
- Why didn't the Literary Digest anticipate or see these biases?
- What could the Literary Digest have done to mitigate the impacts of those biases?
- Based on Case Study 1: 1936 Literary Digest Poll - proposed answers to all the questions asked above are included in this article.
- What similarities can you identify between the ways that crime data is collected and how data was collected for the Literary Digest poll?
- What differences can you identify between the ways data is/was collected?
- Today you've seen two ways that unintentional bias can skew the insight we get from data. Make a list of ways that you will avoid unintentional bias in your data project(s).