Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for ZCTAs and 2020 age and sex data #123

Closed
wants to merge 68 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
68 commits
Select commit Hold shift + click to select a range
de9dcce
Use predict_race instead of predict_race_new for initial race values
solivella Sep 6, 2023
10a650c
Fix bug caused by tibble's type safety when accessing columns by posi…
solivella Sep 6, 2023
8f6a522
For race.init, run prediction function quietly
solivella Sep 6, 2023
f52ccf5
docs: use README.Rmd
rossellhayes Nov 15, 2023
a3c8e96
docs(README): move badges to their own line
rossellhayes Nov 15, 2023
4d116d0
docs(README): use thumbnail to link to The Who song
rossellhayes Nov 15, 2023
4246b74
docs(README): add installation instructions
rossellhayes Nov 15, 2023
cedf57e
docs(README): link DOI
rossellhayes Nov 15, 2023
71cf85e
docs: enable markdown in Roxygen comments
rossellhayes Nov 15, 2023
4aa6019
chore: `use_tidy_description()`
rossellhayes Nov 15, 2023
2998d97
docs(predict_race): document that `census.key` will be retrieved from…
rossellhayes Nov 15, 2023
ff8fe3a
docs(README): move instructions for storing API key after instruction…
rossellhayes Nov 15, 2023
616c2c6
docs: inherit from `predict_race()` in other documentation
rossellhayes Nov 15, 2023
ede8ad0
docs(get_census_data): document use of `CENSUS_API_KEY` envvar
rossellhayes Nov 15, 2023
60dda21
fix(voters): use tract NY-061-014900 instead of 015100
rossellhayes Nov 15, 2023
9693562
fix(voters): use tract NY-061-014900 instead of 015100
rossellhayes Nov 15, 2023
e411414
chore: `use_tidy_description()`
rossellhayes Nov 15, 2023
13265c6
chore: add dependencies on `cli` and `rlang` (already indirect depend…
rossellhayes Nov 15, 2023
60c1cd2
feat: add helper function `validate_key()`
rossellhayes Nov 16, 2023
7f5c0b6
feat: all functions that take a census API key now check `CENSUS_API_…
rossellhayes Nov 16, 2023
9fb7fe0
docs(README): update README
rossellhayes Nov 15, 2023
cf3a3d3
docs(README): add hex sticker
rossellhayes Nov 16, 2023
57fbc2d
refactor(validate_key): use `rlang::caller_arg()` to determine `argum…
rossellhayes Nov 17, 2023
926c2c1
feat: add CITATION file and use it to generate package startup message
rossellhayes Nov 17, 2023
30ce8f2
feat(.onAttach): add startup message about change to 2020 data in wru…
rossellhayes Nov 17, 2023
e30ab8c
Merge pull request #119 from kosukeimai/bugfix
1beb Nov 28, 2023
05c6147
Merge pull request #116 from rossellhayes/docs/improvements
1beb Nov 28, 2023
5a9a1bd
Merge branch 'dev' into fix/update-voters-tract
1beb Nov 28, 2023
80a1ea3
Merge pull request #118 from rossellhayes/fix/update-voters-tract
1beb Nov 28, 2023
47274f2
Merge branch 'dev' into feat/key-envvar
1beb Nov 28, 2023
462fa38
Merge pull request #117 from rossellhayes/feat/key-envvar
1beb Nov 28, 2023
b55cc86
Fixes broken DESCRIPTION file
1beb Nov 28, 2023
fc03dc0
Merge pull request #121 from kosukeimai/description-fix-in-dev
1beb Nov 28, 2023
a1d7aaa
updates for description
1beb Nov 28, 2023
f51341a
Merge pull request #122 from kosukeimai/bump-version
1beb Nov 28, 2023
74c641c
chore: rename `validate_key.R` to `utils_validate_key.R`
rossellhayes Nov 16, 2023
5eca735
feat: add `state_fips` dataset
rossellhayes Nov 28, 2023
1e1b57d
feat: add `as_fips_code()` helper function
rossellhayes Nov 27, 2023
b582980
refactor(census_geo_api): use `as_fips_code()`
rossellhayes Nov 17, 2023
975fa1c
refactor(census_geo_api): enumerate possible options for `geo` argume…
rossellhayes Nov 28, 2023
4a99c87
chore: add `.DS_Store` to `.gitignore`
rossellhayes Nov 22, 2023
33e9b00
refactor(census_geo_api): enumerate possible values for `year` argume…
rossellhayes Nov 22, 2023
06347ca
feat: add `assert_boolean()` helper function
rossellhayes Nov 22, 2023
34a9faa
feat: add `as_state_abbreviation()` helper function
rossellhayes Nov 27, 2023
f47576e
feat: add `census_geo_api_zcta()` function
rossellhayes Nov 19, 2023
0a4ec05
feat(census_geo_api): support `geo = "zcta"` by passing to `census_ge…
rossellhayes Nov 19, 2023
4561854
feat(get_census_data): add support for `census.geo = "zcta"`
rossellhayes Nov 28, 2023
817c7fe
docs: rebuild documentation
rossellhayes Nov 28, 2023
62d22a7
feat(predict_race): add support for `census.geo = "zcta"`
rossellhayes Nov 28, 2023
8d5655e
refactor(census_helper_new): use an `else` block to handle `geo == "p…
rossellhayes Nov 28, 2023
cab4a3a
refactor(census_helper_new): improve efficiency of identification of …
rossellhayes Nov 29, 2023
9576ca1
feat: add `census_geo_api_names()` and `census_geo_api_url()` helper …
rossellhayes Nov 30, 2023
984b6ce
refactor(census_geo_api_zcta): use `census_geo_api_names()`
rossellhayes Nov 30, 2023
0fa0953
refactor(census_geo_api): use `census_geo_api_names()`
rossellhayes Nov 30, 2023
bc8fa3c
refactor(census_geo_api): don't create geographic variables that are …
rossellhayes Dec 1, 2023
f138b22
refactor(census_geo_api): move main logic from `census_geo_api_zcta()…
rossellhayes Dec 1, 2023
0121c37
test(census_geo_api): add tests of `census_geo_api()`
rossellhayes Nov 30, 2023
11c4372
fix: add `census_geo_api_names_legacy()` to support census data with …
rossellhayes Dec 1, 2023
250c1cd
refactor(census_data_preflight): use `census_geo_api_names()`
rossellhayes Dec 1, 2023
dd4c6a6
feat(census_helper_new): add support for ZCTAs
rossellhayes Nov 28, 2023
4e691b6
feat: add `determine_geo_id_names()` helper function
rossellhayes Dec 1, 2023
8762f06
feat(predict_race_new): add support for ZCTAs
rossellhayes Nov 28, 2023
c256a4a
geo_id_names new
rossellhayes Dec 1, 2023
03dfd54
feat(predict_race_me): add support for ZCTAs
rossellhayes Dec 1, 2023
6333398
fix(validate_key): if `key` is `NULL`, replace it with `Sys.getenv("C…
rossellhayes Dec 1, 2023
bcd0f45
test(get_census_data): add test with ZCTAs
rossellhayes Dec 1, 2023
b374610
chore: add `.lazytest` to `.gitignore` (see https://lazytest.cynkra.c…
rossellhayes Dec 1, 2023
ec10a95
fix(census_helper_new): pass `year` to `census_geo_api()`
rossellhayes Dec 1, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,5 @@ ChangeLog

^cran-comments\.md$
^CRAN-SUBMISSION$
^README\.Rmd$
^data-raw$
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.DS_Store

# History files
.Rhistory
.Rapp.history
Expand All @@ -8,6 +10,7 @@
# RStudio files
.Rproj.user/
.Rproj
.lazytest

# produced vignettes
vignettes/*.html
Expand All @@ -21,4 +24,4 @@ vignettes/*.pdf
src/RcppExports.o
src/aux_funs.o
src/sample_me.o
src/wru.so
src/wru.so
59 changes: 34 additions & 25 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,47 +1,56 @@
Package: wru
Version: 2.0.0
Date: 2023-07-12
Title: Who are You? Bayesian Prediction of Racial Category Using Surname, First Name, Middle Name, and
Geolocation
Title: Who are You? Bayesian Prediction of Racial Category Using Surname,
First Name, Middle Name, and Geolocation
Version: 3.0.0
Date: 2023-12-09
Authors@R: c(
person("Kabir", "Khanna", email = "[email protected]", role = c("aut")),
person("Brandon", "Bertelsen", email = "[email protected]", role = c("aut","cre")),
person("Santiago", "Olivella", email = "[email protected]", role = c("aut")),
person("Evan", "Rosenman", email = "[email protected]", role = c("aut")),
person("Kosuke", "Imai", email = "[email protected]", role = c("aut"))
person("Kabir", "Khanna", , "[email protected]", role = "aut"),
person("Brandon", "Bertelsen", , "[email protected]", role = c("aut", "cre")),
person("Santiago", "Olivella", , "[email protected]", role = "aut"),
person("Evan", "Rosenman", , "[email protected]", role = "aut"),
person("Kosuke", "Imai", , "[email protected]", role = "aut")
)
Description: Predicts individual race/ethnicity using surname, first name, middle name, geolocation,
and other attributes, such as gender and age. The method utilizes Bayes'
Rule (with optional measurement error correction) to compute the posterior probability of each racial category for any given
individual. The package implements methods described in Imai and Khanna (2016)
"Improving Ecological Inference by Predicting Individual Ethnicity from Voter
Registration Records" Political Analysis <DOI:10.1093/pan/mpw001> and Imai, Olivella, and Rosenman (2022)
"Addressing census data problems in race imputation via fully Bayesian Improved Surname Geocoding and name supplements"
<DOI:10.1126/sciadv.adc9824>. The package also incorporates the data described in Rosenman, Olivella, and Imai (2023)
"Race and ethnicity data for first, middle, and surnames" <DOI:10.1038/s41597-023-02202-2>.
Description: Predicts individual race/ethnicity using surname, first name,
middle name, geolocation, and other attributes, such as gender and
age. The method utilizes Bayes' Rule (with optional measurement error
correction) to compute the posterior probability of each racial
category for any given individual. The package implements methods
described in Imai and Khanna (2016) "Improving Ecological Inference by
Predicting Individual Ethnicity from Voter Registration Records"
Political Analysis <DOI:10.1093/pan/mpw001> and Imai, Olivella, and
Rosenman (2022) "Addressing census data problems in race imputation
via fully Bayesian Improved Surname Geocoding and name supplements"
<DOI:10.1126/sciadv.adc9824>. The package also incorporates the data
described in Rosenman, Olivella, and Imai (2023) "Race and ethnicity
data for first, middle, and surnames"
<DOI:10.1038/s41597-023-02202-2>.
License: GPL (>= 3)
URL: https://github.com/kosukeimai/wru
BugReports: https://github.com/kosukeimai/wru/issues
Depends:
R (>= 4.1.0),
utils
Imports:
cli,
dplyr,
furrr,
future,
piggyback (>= 0.1.4),
PL94171,
purrr,
Rcpp,
piggyback (>= 0.1.4),
PL94171
rlang
Suggests:
covr,
testthat (>= 3.0.0),
covr
tidycensus
LinkingTo:
Rcpp,
RcppArmadillo
LazyLoad: yes
Config/testthat/edition: 3
Encoding: UTF-8
LazyData: yes
LazyDataCompression: xz
License: GPL (>= 3)
LazyLoad: yes
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Encoding: UTF-8
Config/testthat/edition: 3
2 changes: 2 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,11 @@ export(predict_race)
import(PL94171)
importFrom(Rcpp,evalCpp)
importFrom(dplyr,coalesce)
importFrom(dplyr,pull)
importFrom(furrr,future_map_dfr)
importFrom(piggyback,pb_download)
importFrom(purrr,map_dfr)
importFrom(rlang,"%||%")
importFrom(stats,rmultinom)
importFrom(utils,setTxtProgressBar)
importFrom(utils,txtProgressBar)
Expand Down
24 changes: 4 additions & 20 deletions R/census_data_preflight.R
Original file line number Diff line number Diff line change
@@ -1,31 +1,15 @@
#' Preflight census data
#'
#' @param census.data See documentation in \code{race_predict}.
#' @param census.geo See documentation in \code{race_predict}.
#' @param year See documentation in \code{race_predict}.
#' @inheritParams predict_race
#' @keywords internal

census_data_preflight <- function(census.data, census.geo, year) {

if (year != "2020"){
vars_ <- c(
pop_white = 'P005003', pop_black = 'P005004',
pop_aian = 'P005005', pop_asian = 'P005006',
pop_nhpi = 'P005007', pop_other = 'P005008',
pop_two = 'P005009', pop_hisp = 'P005010'
)
} else {
vars_ <- c(
pop_white = 'P2_005N', pop_black = 'P2_006N',
pop_aian = 'P2_007N', pop_asian = 'P2_008N',
pop_nhpi = 'P2_009N', pop_other = 'P2_010N',
pop_two = 'P2_011N', pop_hisp = 'P2_002N'
)
}
vars_ <- unlist(census_geo_api_names(year = year))
legacy_vars <- unlist(census_geo_api_names_legacy(year = year))

test <- lapply(census.data, function(x) {
nms_to_test <- names(x[[census.geo]])
all(vars_ %in% nms_to_test)
all(vars_ %in% nms_to_test) || all(legacy_vars %in% nms_to_test)
})
missings <- names(test)[!unlist(test)]

Expand Down
Loading
Loading