Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Census variable selection? #145

Closed
ericmanning opened this issue Mar 28, 2024 · 5 comments · Fixed by #146
Closed

Census variable selection? #145

ericmanning opened this issue Mar 28, 2024 · 5 comments · Fixed by #146

Comments

@ericmanning
Copy link

Why are race totals assigned by the following variables

r_whi = "P12I_001N"
r_bla = "P12B_001N"
r_his = "P12H_001N"
r_asi = c("P12D_001N", "P12E_001N")
r_oth = c("P12C_001N", "P12F_001N", "P12G_001N")

which correspond to the following Census tables

2020 DHC table 2010 DHC table Title
P12B P12B SEX BY AGE FOR SELECTED AGE CATEGORIES (BLACK OR AFRICAN AMERICAN ALONE)
P12C P12C SEX BY AGE FOR SELECTED AGE CATEGORIES (AMERICAN INDIAN AND ALASKA NATIVE ALONE)
P12D P12D SEX BY AGE FOR SELECTED AGE CATEGORIES (ASIAN ALONE)
P12E P12E SEX BY AGE FOR SELECTED AGE CATEGORIES (NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALONE)
P12F P12F SEX BY AGE FOR SELECTED AGE CATEGORIES (SOME OTHER RACE ALONE)
P12G P12G SEX BY AGE FOR SELECTED AGE CATEGORIES (TWO OR MORE RACES)
P12H P12H SEX BY AGE FOR SELECTED AGE CATEGORIES (HISPANIC OR LATINO)
P12I P12I SEX BY AGE FOR SELECTED AGE CATEGORIES (WHITE ALONE, NOT HISPANIC OR LATINO)

and not the following tables' variables instead?

2020 DHC table 2010 DHC table Title
P12H P12H SEX BY AGE FOR SELECTED AGE CATEGORIES (HISPANIC OR LATINO)
P12I P12I SEX BY AGE FOR SELECTED AGE CATEGORIES (WHITE ALONE, NOT HISPANIC OR LATINO)
P12J N/A SEX BY AGE FOR SELECTED AGE CATEGORIES (BLACK OR AFRICAN AMERICAN ALONE, NOT HISPANIC OR LATINO)
P12K N/A SEX BY AGE FOR SELECTED AGE CATEGORIES (AMERICAN INDIAN AND ALASKA NATIVE ALONE, NOT HISPANIC OR LATINO)
P12L N/A SEX BY AGE FOR SELECTED AGE CATEGORIES (ASIAN ALONE, NOT HISPANIC OR LATINO)
P12M N/A SEX BY AGE FOR SELECTED AGE CATEGORIES (NATIVE HAWAIIAN AND OTHER PACIFIC ISLANDER ALONE, NOT HISPANIC OR LATINO)
P12N N/A SEX BY AGE FOR SELECTED AGE CATEGORIES (SOME OTHER RACE ALONE, NOT HISPANIC OR LATINO)
P12O N/A SEX BY AGE FOR SELECTED AGE CATEGORIES (TWO OR MORE RACES, NOT HISPANIC OR LATINO)

Using the former yields aggregate population counts that exceed the population total for each geography because it ought to double-count non-white Hispanic or Latino individuals. The latter yields matching counts.

@ericmanning
Copy link
Author

Might be related to #138

@1beb
Copy link
Collaborator

1beb commented Mar 28, 2024

Thank you, getting these tables right is a challenge sometimes. I'm checking in with the team on this one.

@ericmanning
Copy link
Author

The Census Bureau did not publish the P12J through P12O information in any summary file for the 2010 census. So if my suggestion is correct, then you can't actually tabulate age and sex by block and H/L and race for 2010.

@1beb
Copy link
Collaborator

1beb commented Mar 28, 2024

Correct, we need to setup warnings so that people use an older version of the package as it's not backwards compatible with pre-2020.

@ericmanning
Copy link
Author

ericmanning commented Mar 28, 2024

FWIW, the package has always used the current set of variables for sex and age, which are incorrect -- so (correct me if I'm wrong) any version will produce inaccurate estimates for 2010 if age OR sex is TRUE

From wru-0.1-12/R/census_geo_api.R,

if (age == F & sex == F) {
    num <- ifelse(3:10 != 10, paste("0", 3:10, sep = ""), "10")
    vars <- paste("P0050", num, sep = "")
  }
  
  if (age == F & sex == T) {
    eth.let <- c("I", "B", "H", "D", "E", "F", "C")
    num <- as.character(c("01", "02", "26"))
    vars <- NULL
    for (e in 1:length(eth.let)) {
      vars <- c(vars, paste("P012", eth.let[e], "0", num, sep = ""))
    }
  }
  
  if (age == T & sex == F) {
    eth.let <- c("I", "B", "H", "D", "E", "F", "C")
    num <- as.character(c(c("01", "03", "04", "05", "06", "07", "08", "09"), seq(10, 25), seq(27, 49)))
    vars <- NULL
    for (e in 1:length(eth.let)) {
      vars <- c(vars, paste("P012", eth.let[e], "0", num, sep = ""))
    }
  }
  
  if (age == T & sex == T) {
    eth.let <- c("I", "B", "H", "D", "E", "F", "C")
    num <- as.character(c(c("01", "03", "04", "05", "06", "07", "08", "09"), seq(10, 25), seq(27, 49)))
    vars <- NULL
    for (e in 1:length(eth.let)) {
      vars <- c(vars, paste("P012", eth.let[e], "0", num, sep = ""))
    }
  }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants