In this assignment you will use SQL to query the 2003-2015 American Time Use Survey (ATUS). You can download the database here.
Note that this is the full dataset (not the sample used in class) from about 1GB of CSV files which would be strain pandas to load in python. (The sqlite file is smaller, 200MB, because it stores the data more efficiently than a CSV.) You will find the ATUS Data Dictionary and Lexicon useful.
Please submit sql scripts q1.sql, q2.sql, q3.sql, q4.sql, q6.sql
, a python script q5.py
and the image specified below.
-
What is the state whose proportion of respondents who are high school graduates is the lowest? Put your answer in a comment at the end of q1.sql.
Hint: You can look up the state codes here.
-
What proportion of respondents who are eligible to work did work last week? (4 points)
Hint: Use the
worked_last_week
column in therespondents
table and seeTUFWK
in the dictionary. Whenworked_last_week
> 2, the respondent is retired/disabled/unable to work and so should be excluded in this question. -
Group your answer to (2) by whether or not the respondent has a spouse or partner in their household (three possibilities: spouse, partner, neither). (4 points)
Hint:
spouse_or_partner_present
, seeTRSPPRES
in the codebook. -
How many hours on average are spent playing video games (code 120307) by men and women (
edited_sex
1 and 2) and by age decade (10-19, 20-29, 30-39, etc.)?Hint: You can
GROUP BY
multiple columns by separating them with commas. To group by age decade, use the fact that the default integer division in SQL is floor division. (4 points) -
Use Python to run your script from (4) and plot the results with age decade on the x axis and average hours gaming on the y axis and two lines, one for men and one for women,. Label the axes appropriately and use a legend to identify the two lines. Save the plot in
gaming.png
. (4 points)Hint: Use
set_index()
andunstack()
similar to the last assignment. -
How many hours are spent on average on household activities, by sex and educational attainment? (4 points)
Hint: Household actitivity codes start with
02
. This will require twoJOIN
s becauseedited_sex
is in theroster
table whileeducational_attainment
is incps
. See the last query in the slides for an example.