NOTE: This lab, unlike the React.js labs, does not have a file for your results or responses. Your teacher will specify whether/how to submit your work on this lab.
Some people say data is neutral. Some people say data can never be neutral. In this lab, you'll build an argument using data in order to fight for state funding.
In early 2019, New York State announced it had lost $2.3 billion because of recent changes to federal tax laws. As a result, the decision-makers in Albany have been looking for ways to cut the state budget. A frequent target for revenue cuts is the public library system.
Your Task: pick a library and defend it from these budget cuts!
You've received this NY Library Dataset (adapted from librarydatavisual.blogspot.com) which contains the following fields:
- Zip Code
- Children's Circulation Per Capita
- Children's Programming Attendance Per Capita
- Circulation Per Capita
- Electronic Materials Circulation Per Capita
- FTE MLS Per 10000 Pop.: Full-Time Employees with a Masters in Library Science per 10,000 People of Population
- FTE Staff Per 10000: Full-Time Employees on Staff per 10,000 People of Population
- ILL Per 100 Pop.: Inter-Library Loan Requests per 100 People of Population
- Internet Computers Per 1000 Pop.: Internet Computers per 1,000 People of Population
- Internet Sessions Per Capita
- Local Gov't Op. Income Per Capita
- Materials Expenditure Per Capita
- Population Groups
- Programming Attendance Per Capita
- Salaries Expenditure Per Capita
- State Gov't Operating Income Per Capita
- Total Operating Income Per Capita
- Undup Pop.: Unduplicated Population where each person is only counted once
- Visits Per Capita
- Wireless Sessions Per Capita
BEFORE YOU START be sure to talk over with your lab partner and with your teacher what each of the columns mean - there's no point analyzing data in a column if you aren't sure what's being measured. If you can identify at least 3-6 columns that you understand clearly, and at least two that are significantly interesting to you, you are prepared to move ahead.
Additionally, the data has been formatted so that values above the average are green, and values below the average are red.
- Take a few moments to get to know the data. Can you find your local public library? How does it compare to other libraries?
Do any of the average values look like they might be inaccurate?
Using the library you selected (whether your local library, or just a library that's interesting to you), think about what that library is (or might be) like - is it in a large building? a small one? Is it crowded? does it have a large children's section?
- Using the dataset, find a fact in the dataset for your library that could be used as a reason to cut that library's budget.
e.g. Really low book circulation, so their materials must not be used.
- Now using the dataset, find a fact in the dataset that could be used as a reason not to cut the same library's budget.
e.g. Really low book circulation, so they must need money to buy new materials.
- Test your arguments on 5 other libraries and see whether your arguments would apply consistently across all libraries.
- Qualitatively, how much would the cuts you suggested affect the overall $2.3 billion budget deficit?
Having heard that Albany was considering targeting NY State libraries, the American Library Association has offered a series of grants totalling $500,000 to bridge the budget gap for the libraries of the state.
- Use the dataset to make an argument for which library/libraries ought to receive additional funding:
- What are the criteria a library should meet to get funding?
- How much funding would you give to each (qualified) library?
- Which other libraries meet the same criteria? Would all of them receive funding?
- If so, how much money would be given to each?
- If not, what criteria could be used to further filter the results?
- Find a fact in the dataset that is absolutely neutral and could not be spun one way or another. The fact should not help either the argument to cut money from libraries or the argument to provide additional money to libraries.
- What words would you use to describe the fact you found?
Too often, someone will ask a data scientist to use a dataset to support a conclusion they've already made. They've already pre-determined what they want the data to say before they've ever gotten to know whether or not the data does in fact say that.
- Why is it irresponsible to pre-determine the outcome you want to see when analyzing data?
- Who's right when there are different conclusions drawn from the same data? How do you decide?