Skip to content

Commit

Permalink
add exercises
Browse files Browse the repository at this point in the history
  • Loading branch information
Hua-Zhou committed Jun 18, 2024
1 parent 7d536e7 commit 797e0c2
Show file tree
Hide file tree
Showing 2 changed files with 368 additions and 177 deletions.
381 changes: 244 additions & 137 deletions data-science-tutorials/02-wrangle/wrangle.html

Large diffs are not rendered by default.

164 changes: 124 additions & 40 deletions data-science-tutorials/02-wrangle/wrangle.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -86,11 +86,10 @@ Sys.setenv("CENSUS_KEY" = "PUT YOUR KEY HERE")
To see a current table of every available endpoint, run `listCensusApis()`:

```{r}
#| eval: true
censusapi::listCensusApis() |>
# convert data.frame to tibble
as_tibble() |>
# only keep the columns needed
select(title, name, vintage, type, temporal, url)
```

Expand All @@ -99,8 +98,6 @@ censusapi::listCensusApis() |>
Now we are interested in Food Security Supplement. We can search for the keyword "Food Security" in the column `title` of the table above, and see which year of data is available.

```{r}
#| eval: true
censusapi::listCensusApis() |>
filter(str_detect(title, "Food Security"), vintage >= 2019) |>
select(title, name, vintage, type, description)
Expand All @@ -122,14 +119,13 @@ The example below makes a request to API for the CPS Food Security Supplement De
- HHSUPWGT: Household Supplemental Weight

```{r}
#| eval: true
fss_21_status <- censusapi::getCensus(
name = "cps/foodsec/dec",
vintage = 2021,
# vars is required
vars = c("HRHHID", "HRHHID2", "PERRP",
"GESTFIPS", "GTCO", "HRFS12M1",
"HHSUPWGT")
"HHSUPWGT")
) |>
as_tibble() |>
print()
Expand All @@ -138,51 +134,46 @@ fss_21_status <- censusapi::getCensus(
Notice that some columns are not in the format we want. For example, **HRFS12M1** (Food Security Status) should be a categorical variable, and we want to convert it to a factor with meaningful labels. Also, **HHSUPWGT** (Household Supplemental Weight) was ingested as a character variable, and we want to convert it to a numeric variable.

```{r}
#| eval: true
fss_21_status |>
filter(HRFS12M1 != "-1") |>
mutate(HRFS12M1 = factor(HRFS12M1,
levels = c("1", "2", "3", "-9"),
labels = c("Food Security",
"Low Food Security",
"Very Low Food Security",
"No Response")),
HHSUPWGT = as.numeric(HHSUPWGT),
PERRP = as.numeric(PERRP),
HRHHID2 = as.character(HRHHID2))
mutate(
HRFS12M1 = factor(HRFS12M1,
levels = c("1", "2", "3", "-9"),
labels = c("Food Security",
"Low Food Security",
"Very Low Food Security",
"No Response")),
HHSUPWGT = as.numeric(HHSUPWGT),
PERRP = as.numeric(PERRP),
HRHHID2 = as.character(HRHHID2)
)
```

In addition to checking the document and encoding labels manually, `listCensusMetadata()` offers a way to get the value labels of specific variables. This can be useful for understanding the meaning of variables and their values.

```{r}
#| eval: true
HRFS12M1_lb <- censusapi::listCensusMetadata(
name = "cps/foodsec/dec",
vintage = 2021,
type = "values",
variable = "HRFS12M1"
)
HRFS12M1_lb
) |>
as_tibble() |>
print()
```

Therefore, we can use the following code to convert the variable `HRFS12M1` to a factor with meaningful labels.

```{r}
#| eval: true
fss_21_status <- fss_21_status |>
filter(HRFS12M1 != "-1") |>
mutate(HRFS12M1 = factor(HRFS12M1,
levels = HRFS12M1_lb$code,
labels = HRFS12M1_lb$label),
HHSUPWGT = as.numeric(HHSUPWGT),
PERRP = as.numeric(PERRP),
HRHHID2 = as.character(HRHHID2))
fss_21_status
HRHHID2 = as.character(HRHHID2)) |>
print()
```

## Constructing household characteristics from person records
Expand All @@ -192,8 +183,6 @@ To compute some household characteristics (such as household size, presence of c
**HRFS12M1** (Summary Food Security Status, 12-Month) is one of the household characteristics. This is the variable used for most food security statistics in USDA’s annual food security report series. In order to compute the prevalence of food insecurity, we need to aggregate the food security status of all persons in the same household. `tbl_summary()` in the `gtsummary` package can generate beautiful summary tables.

```{r}
#| eval: true
fss_21_status <- fss_21_status |>
# Filter observations with PERRP 40 or 41 (reference person in the household)
filter(PERRP %in% c(40, 41)) |>
Expand All @@ -211,15 +200,19 @@ fss_21_status |>
gtsummary::tbl_summary()
```

**Exercise**: display the bar plot of variable **`HRFS12M1`**.

```{r}
# geom_bar
```

Notice that there are 27,357 + 1,844 + 1,093 + 49 = 30,343 households who attended the interview for Food Security Supplement in 2021, which matches the number in the technical documentation.

The CPS is a complex probability sample, and interviewed households, as well as persons in those households, are assigned weights so that the full interviewed sample represents the total national non-institutionalized civilian population. Initial weights are assigned based on probability of selection into the sample, and weights are then adjusted iteratively to match population controls for selected demographic characteristics at State and national levels. There are two sets of household and person weights in this data file: (1) labor force survey weights, and (2) Food Security Supplement weights.

We can use `makeVarlist()` function in `censusapi` package to get the list of variables in the dataset. In addition to the description and the type of each variable, we can also check column **suggested_weight** to see which weight should be used for the analysis.

```{r}
#| eval: true
censusapi::makeVarlist(
name = "cps/foodsec/dec",
vintage = 2021,
Expand Down Expand Up @@ -256,8 +249,6 @@ Now we are interested in how much percentage of households with low food securit
where $L_i$ is an indicator variable that equals 1 if household $i$ is in low food security or very low food security, and 0 otherwise; $H_i$ is an indicator variable that equals 1 if household $i$ attended the interview for Food Security Supplement and had a response to this question, and 0 otherwise; and $w_i$ is the weight of household $i$.

```{r}
#| eval: true
fss_21_status <- fss_21_status |>
mutate(HRFS12M1_low = ifelse(HRFS12M1 %in% c("Low Food Security",
"Very Low Food Security"),
Expand All @@ -279,8 +270,6 @@ Since we have already computed the rate of low food security in 2021. We can fur
We first need to match the FIPS code to the state names using `fips_codes()` function in `tigris` package.

```{r}
#| eval: true
# Get the state names
state_names <- tigris::fips_codes |>
select(state, state_code, state_name) |>
Expand All @@ -292,8 +281,6 @@ state_names <- tigris::fips_codes |>
Now we can compute the rate of low food security in each state.

```{r}
#| eval: true
fss_21_status_state <- fss_21_status |>
# Group by state FIPS code and calculate the rate by state
group_by(GESTFIPS) |>
Expand All @@ -314,8 +301,6 @@ fss_21_status_state <- fss_21_status |>
Then we can visualize the rate of low food security in each state.

```{r}
#| eval: true
ggplot2::map_data("state") |>
merge(fss_21_status_state, by.x = "region",
by.y = "state_name", all.x = TRUE) |>
Expand All @@ -332,3 +317,102 @@ ggplot2::map_data("state") |>
legend.position = "bottom") +
coord_fixed(ratio = 1.5)
```

## Exercises

1. Display the food security disparity in 2021 in terms of race, household income, education level, or other social-economic determinants.

```{r}
#| eval: false
#| code-fold: true
# Inf about race variable PTDTRACE
PTDTRACE_lb <- censusapi::listCensusMetadata(
name = "cps/foodsec/dec",
vintage = 2021,
type = "values",
variable = "PTDTRACE"
) |>
as_tibble() |>
mutate(code = as.numeric(as.character(code))) |>
arrange(code) |>
print(n = Inf)
```

```{r}
#| eval: false
#| code-fold: true
# Ingest FSS 2021 data with race info. on household
fss_21_race <- censusapi::getCensus(
name = "cps/foodsec/dec",
vintage = 2021,
# vars is required
vars = c("HRHHID", "HRHHID2", "PERRP",
"GESTFIPS", "GTCO", "HRFS12M1",
"HHSUPWGT", "PTDTRACE")
) |>
as_tibble() |>
# remove non-interviewed ones
filter(HRFS12M1 != "-1") |>
# enforce column data types
mutate(
HRFS12M1 = factor(HRFS12M1,
levels = c("1", "2", "3", "-9"),
labels = c("Food Security",
"Low Food Security",
"Very Low Food Security",
"No Response")),
HHSUPWGT = as.numeric(HHSUPWGT),
PERRP = as.numeric(PERRP),
HRHHID2 = as.character(HRHHID2),
PTDTRACE = factor(as.numeric(PTDTRACE),
levels = PTDTRACE_lb$code,
labels = PTDTRACE_lb$label) |>
fct_other(keep = c(
"American Indian, Alaskan Native Only",
"White only",
"Black only",
"Asian only",
"Hawaiian/Pacific Islander Only"
))
) |>
# only keep reference person per household
group_by(HRHHID, HRHHID2) |>
slice_min(PERRP) |>
ungroup() |>
print()
```

```{r}
#| eval: false
#| code-fold: true
# Summarize race information
fss_21_race |>
select(PTDTRACE) |>
gtsummary::tbl_summary()
```

```{r}
#| eval: false
#| code-fold: true
# Calculate the rate of low food security by race
fss_21_race |>
group_by(PTDTRACE) |>
summarise(
fd_insec_weight = sum((HRFS12M1 %in% c("Low Food Security", "Very Low Food Security")) * HHSUPWGT),
total_weight = sum(HHSUPWGT),
low_food_security_rate = fd_insec_weight / total_weight * 100
) |>
ggplot() +
geom_col(mapping = aes(y = PTDTRACE, x = low_food_security_rate)) +
labs(
x = "Low Food Security Rate",
y = ""
)
```

2. How does COVID impact the food security status of households in the U.S.? Does the impact differ by state, race, household income, or other social-economic determinants?

0 comments on commit 797e0c2

Please sign in to comment.