Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Development Release 3.0.0 #120

Merged
merged 108 commits into from
Feb 28, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
108 commits
Select commit Hold shift + click to select a range
de9dcce
Use predict_race instead of predict_race_new for initial race values
solivella Sep 6, 2023
10a650c
Fix bug caused by tibble's type safety when accessing columns by posi…
solivella Sep 6, 2023
8f6a522
For race.init, run prediction function quietly
solivella Sep 6, 2023
f52ccf5
docs: use README.Rmd
rossellhayes Nov 15, 2023
a3c8e96
docs(README): move badges to their own line
rossellhayes Nov 15, 2023
4d116d0
docs(README): use thumbnail to link to The Who song
rossellhayes Nov 15, 2023
4246b74
docs(README): add installation instructions
rossellhayes Nov 15, 2023
cedf57e
docs(README): link DOI
rossellhayes Nov 15, 2023
71cf85e
docs: enable markdown in Roxygen comments
rossellhayes Nov 15, 2023
4aa6019
chore: `use_tidy_description()`
rossellhayes Nov 15, 2023
2998d97
docs(predict_race): document that `census.key` will be retrieved from…
rossellhayes Nov 15, 2023
ff8fe3a
docs(README): move instructions for storing API key after instruction…
rossellhayes Nov 15, 2023
616c2c6
docs: inherit from `predict_race()` in other documentation
rossellhayes Nov 15, 2023
ede8ad0
docs(get_census_data): document use of `CENSUS_API_KEY` envvar
rossellhayes Nov 15, 2023
60dda21
fix(voters): use tract NY-061-014900 instead of 015100
rossellhayes Nov 15, 2023
9693562
fix(voters): use tract NY-061-014900 instead of 015100
rossellhayes Nov 15, 2023
e411414
chore: `use_tidy_description()`
rossellhayes Nov 15, 2023
13265c6
chore: add dependencies on `cli` and `rlang` (already indirect depend…
rossellhayes Nov 15, 2023
60c1cd2
feat: add helper function `validate_key()`
rossellhayes Nov 16, 2023
7f5c0b6
feat: all functions that take a census API key now check `CENSUS_API_…
rossellhayes Nov 16, 2023
9fb7fe0
docs(README): update README
rossellhayes Nov 15, 2023
cf3a3d3
docs(README): add hex sticker
rossellhayes Nov 16, 2023
57fbc2d
refactor(validate_key): use `rlang::caller_arg()` to determine `argum…
rossellhayes Nov 17, 2023
926c2c1
feat: add CITATION file and use it to generate package startup message
rossellhayes Nov 17, 2023
30ce8f2
feat(.onAttach): add startup message about change to 2020 data in wru…
rossellhayes Nov 17, 2023
e30ab8c
Merge pull request #119 from kosukeimai/bugfix
1beb Nov 28, 2023
05c6147
Merge pull request #116 from rossellhayes/docs/improvements
1beb Nov 28, 2023
5a9a1bd
Merge branch 'dev' into fix/update-voters-tract
1beb Nov 28, 2023
80a1ea3
Merge pull request #118 from rossellhayes/fix/update-voters-tract
1beb Nov 28, 2023
47274f2
Merge branch 'dev' into feat/key-envvar
1beb Nov 28, 2023
462fa38
Merge pull request #117 from rossellhayes/feat/key-envvar
1beb Nov 28, 2023
b55cc86
Fixes broken DESCRIPTION file
1beb Nov 28, 2023
fc03dc0
Merge pull request #121 from kosukeimai/description-fix-in-dev
1beb Nov 28, 2023
a1d7aaa
updates for description
1beb Nov 28, 2023
f51341a
Merge pull request #122 from kosukeimai/bump-version
1beb Nov 28, 2023
74c641c
chore: rename `validate_key.R` to `utils_validate_key.R`
rossellhayes Nov 16, 2023
5eca735
feat: add `state_fips` dataset
rossellhayes Nov 28, 2023
1e1b57d
feat: add `as_fips_code()` helper function
rossellhayes Nov 27, 2023
b582980
refactor(census_geo_api): use `as_fips_code()`
rossellhayes Nov 17, 2023
975fa1c
refactor(census_geo_api): enumerate possible options for `geo` argume…
rossellhayes Nov 28, 2023
4a99c87
chore: add `.DS_Store` to `.gitignore`
rossellhayes Nov 22, 2023
33e9b00
refactor(census_geo_api): enumerate possible values for `year` argume…
rossellhayes Nov 22, 2023
06347ca
feat: add `assert_boolean()` helper function
rossellhayes Nov 22, 2023
34a9faa
feat: add `as_state_abbreviation()` helper function
rossellhayes Nov 27, 2023
f47576e
feat: add `census_geo_api_zcta()` function
rossellhayes Nov 19, 2023
0a4ec05
feat(census_geo_api): support `geo = "zcta"` by passing to `census_ge…
rossellhayes Nov 19, 2023
4561854
feat(get_census_data): add support for `census.geo = "zcta"`
rossellhayes Nov 28, 2023
817c7fe
docs: rebuild documentation
rossellhayes Nov 28, 2023
62d22a7
feat(predict_race): add support for `census.geo = "zcta"`
rossellhayes Nov 28, 2023
8d5655e
refactor(census_helper_new): use an `else` block to handle `geo == "p…
rossellhayes Nov 28, 2023
cab4a3a
refactor(census_helper_new): improve efficiency of identification of …
rossellhayes Nov 29, 2023
9576ca1
feat: add `census_geo_api_names()` and `census_geo_api_url()` helper …
rossellhayes Nov 30, 2023
984b6ce
refactor(census_geo_api_zcta): use `census_geo_api_names()`
rossellhayes Nov 30, 2023
0fa0953
refactor(census_geo_api): use `census_geo_api_names()`
rossellhayes Nov 30, 2023
bc8fa3c
refactor(census_geo_api): don't create geographic variables that are …
rossellhayes Dec 1, 2023
f138b22
refactor(census_geo_api): move main logic from `census_geo_api_zcta()…
rossellhayes Dec 1, 2023
0121c37
test(census_geo_api): add tests of `census_geo_api()`
rossellhayes Nov 30, 2023
11c4372
fix: add `census_geo_api_names_legacy()` to support census data with …
rossellhayes Dec 1, 2023
250c1cd
refactor(census_data_preflight): use `census_geo_api_names()`
rossellhayes Dec 1, 2023
dd4c6a6
feat(census_helper_new): add support for ZCTAs
rossellhayes Nov 28, 2023
4e691b6
feat: add `determine_geo_id_names()` helper function
rossellhayes Dec 1, 2023
8762f06
feat(predict_race_new): add support for ZCTAs
rossellhayes Nov 28, 2023
c256a4a
geo_id_names new
rossellhayes Dec 1, 2023
03dfd54
feat(predict_race_me): add support for ZCTAs
rossellhayes Dec 1, 2023
6333398
fix(validate_key): if `key` is `NULL`, replace it with `Sys.getenv("C…
rossellhayes Dec 1, 2023
bcd0f45
test(get_census_data): add test with ZCTAs
rossellhayes Dec 1, 2023
b374610
chore: add `.lazytest` to `.gitignore` (see https://lazytest.cynkra.c…
rossellhayes Dec 1, 2023
ec10a95
fix(census_helper_new): pass `year` to `census_geo_api()`
rossellhayes Dec 1, 2023
1dc2a52
Update to census_helper, drops rows with geos not found in census dat…
mdblocker Dec 2, 2023
8ae3efb
telling wru which package contains drop_na
mdblocker Dec 2, 2023
23ded7c
declaring dependencies
mdblocker Dec 2, 2023
612ed1c
Merge pull request #126 from rossellhayes/feat/update-zcta
1beb Dec 4, 2023
50065ba
Adding 'skip_bad_geos' option to allow partial data sets of succesful…
mdblocker Dec 4, 2023
fe20c3a
docs
mdblocker Dec 4, 2023
4692683
Merge branch 'dev' into drop_bad_geos
mdblocker Dec 4, 2023
5aaea19
Merge pull request #125 from mdblocker/drop_bad_geos
1beb Dec 5, 2023
839f7ef
Updates for CPP17
1beb Dec 5, 2023
cdf548c
Merge pull request #127 from kosukeimai/update-cpp-makevars
1beb Dec 6, 2023
e1cbd71
fix(census_helper_new): fix bug when checking if `geo` is "precinct"
rossellhayes Dec 8, 2023
59a340d
fix(.predict_race_old): use `all(is.na())` in `if` statements
rossellhayes Dec 8, 2023
4ff2f0f
test(census_geo_api): add comment explaining snapshots
rossellhayes Dec 8, 2023
7be05c0
todo: add TODO comments
rossellhayes Dec 8, 2023
4943cff
refactor: replace loops in `census_helper_new()` and `predict_race_me…
rossellhayes Dec 8, 2023
1ebbdcc
Merge pull request #128 from rossellhayes/fixes
1beb Dec 8, 2023
29b0b1b
Adding legacy table test for census_helper_new, added message if old …
mdblocker Dec 14, 2023
64af499
providing new table, not hitting api
mdblocker Dec 14, 2023
74ba902
imports
mdblocker Dec 14, 2023
8c5f68a
Removing stringr dependency
mdblocker Dec 14, 2023
f492893
removing api key call
mdblocker Dec 14, 2023
f2d31bb
fix test path
mdblocker Dec 14, 2023
b4a7da1
skip if no key
mdblocker Dec 15, 2023
296e55c
More coverage tests
mdblocker Dec 15, 2023
daab4cd
Add files via upload
Dec 18, 2023
d7a7057
Merge pull request #131 from mdblocker/dev
1beb Dec 20, 2023
2e3d0aa
Update test-rollup.R
Dec 20, 2023
f5f8ee3
Merge pull request #133 from carlson9/dev
1beb Dec 20, 2023
b8013d7
Update test-rollup.R
Dec 20, 2023
e1148f6
fix(census_helper_new): remove check that errors if `age` or `sex` ar…
rossellhayes Jan 19, 2024
4da02e2
Merge pull request #134 from carlson9/dev
1beb Jan 23, 2024
3e6c7c8
Merge pull request #136 from rossellhayes/fix/enable_age_and_sex
1beb Jan 23, 2024
5e68d24
Adjustments for piggyback failures
1beb Feb 13, 2024
08ada3e
Typo in DESCRIPTION
1beb Feb 13, 2024
9d8626a
Adding progress bars to tract level pulls, fixing tests
1beb Feb 14, 2024
7541cbb
Fixes for all the things
1beb Feb 14, 2024
c103d5b
Adjustments for CRAN notes
1beb Feb 15, 2024
c8adb85
Adjustments for CRAN notes re: Rcpp docs
1beb Feb 15, 2024
40d79c3
Reduced test data size
mdblocker Feb 16, 2024
33fbbea
Merge pull request #137 from mdblocker/dev
1beb Feb 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,5 @@ ChangeLog

^cran-comments\.md$
^CRAN-SUBMISSION$
^README\.Rmd$
^data-raw$
5 changes: 4 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,3 +1,5 @@
.DS_Store

# History files
.Rhistory
.Rapp.history
Expand All @@ -8,6 +10,7 @@
# RStudio files
.Rproj.user/
.Rproj
.lazytest

# produced vignettes
vignettes/*.html
Expand All @@ -21,4 +24,4 @@ vignettes/*.pdf
src/RcppExports.o
src/aux_funs.o
src/sample_me.o
src/wru.so
src/wru.so
1 change: 1 addition & 0 deletions ChangeLog
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,4 @@ Date Version Comment
2022-06-17 1.0.0 Updates to BISG, inclusion of fBISG and other package improvements
2022-10-04 1.0.1 Bug fixes for census url and census year
2023-06-12 2.0.0 Updated defaults to 2020 data, specifiy as next major version 2.0.
2024-02-15 3.0.0 Adding back age and sex functionality. Other improvements.
61 changes: 36 additions & 25 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,47 +1,58 @@
Package: wru
Version: 2.0.0
Date: 2023-07-12
Title: Who are You? Bayesian Prediction of Racial Category Using Surname, First Name, Middle Name, and
Geolocation
Title: Who are You? Bayesian Prediction of Racial Category Using Surname,
First Name, Middle Name, and Geolocation
Version: 3.0.0
Date: 2024-02-14
Authors@R: c(
person("Kabir", "Khanna", email = "[email protected]", role = c("aut")),
person("Brandon", "Bertelsen", email = "[email protected]", role = c("aut","cre")),
person("Santiago", "Olivella", email = "[email protected]", role = c("aut")),
person("Evan", "Rosenman", email = "[email protected]", role = c("aut")),
person("Kosuke", "Imai", email = "[email protected]", role = c("aut"))
person("Kabir", "Khanna", , "[email protected]", role = "aut"),
person("Brandon", "Bertelsen", , "[email protected]", role = c("aut", "cre")),
person("Santiago", "Olivella", , "[email protected]", role = "aut"),
person("Evan", "Rosenman", , "[email protected]", role = "aut"),
person("Alex", "Rossell Hayes", , "[email protected]", role = "aut"),
person("Kosuke", "Imai", , "[email protected]", role = "aut")
)
Description: Predicts individual race/ethnicity using surname, first name, middle name, geolocation,
and other attributes, such as gender and age. The method utilizes Bayes'
Rule (with optional measurement error correction) to compute the posterior probability of each racial category for any given
individual. The package implements methods described in Imai and Khanna (2016)
"Improving Ecological Inference by Predicting Individual Ethnicity from Voter
Registration Records" Political Analysis <DOI:10.1093/pan/mpw001> and Imai, Olivella, and Rosenman (2022)
"Addressing census data problems in race imputation via fully Bayesian Improved Surname Geocoding and name supplements"
<DOI:10.1126/sciadv.adc9824>. The package also incorporates the data described in Rosenman, Olivella, and Imai (2023)
"Race and ethnicity data for first, middle, and surnames" <DOI:10.1038/s41597-023-02202-2>.
Description: Predicts individual race/ethnicity using surname, first name,
middle name, geolocation, and other attributes, such as gender and
age. The method utilizes Bayes' Rule (with optional measurement error
correction) to compute the posterior probability of each racial
category for any given individual. The package implements methods
described in Imai and Khanna (2016) "Improving Ecological Inference by
Predicting Individual Ethnicity from Voter Registration Records"
Political Analysis <DOI:10.1093/pan/mpw001> and Imai, Olivella, and
Rosenman (2022) "Addressing census data problems in race imputation
via fully Bayesian Improved Surname Geocoding and name supplements"
<DOI:10.1126/sciadv.adc9824>. The package also incorporates the data
described in Rosenman, Olivella, and Imai (2023) "Race and ethnicity
data for first, middle, and surnames"
<DOI:10.1038/s41597-023-02202-2>.
License: GPL (>= 3)
URL: https://github.com/kosukeimai/wru
BugReports: https://github.com/kosukeimai/wru/issues
Depends:
R (>= 4.1.0),
utils
Imports:
cli,
dplyr,
tidyr,
furrr,
future,
piggyback (>= 0.1.4),
PL94171,
purrr,
Rcpp,
piggyback (>= 0.1.4),
PL94171
rlang
Suggests:
covr,
testthat (>= 3.0.0),
covr
tidycensus
LinkingTo:
Rcpp,
RcppArmadillo
LazyLoad: yes
Config/testthat/edition: 3
Encoding: UTF-8
LazyData: yes
LazyDataCompression: xz
License: GPL (>= 3)
LazyLoad: yes
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Encoding: UTF-8
Config/testthat/edition: 3
4 changes: 4 additions & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
@@ -1,14 +1,18 @@
# Generated by roxygen2: do not edit by hand

export(as_fips_code)
export(as_state_abbreviation)
export(format_legacy_data)
export(get_census_data)
export(predict_race)
import(PL94171)
importFrom(Rcpp,evalCpp)
importFrom(dplyr,coalesce)
importFrom(dplyr,pull)
importFrom(furrr,future_map_dfr)
importFrom(piggyback,pb_download)
importFrom(purrr,map_dfr)
importFrom(rlang,"%||%")
importFrom(stats,rmultinom)
importFrom(utils,setTxtProgressBar)
importFrom(utils,txtProgressBar)
Expand Down
7 changes: 1 addition & 6 deletions R/RcppExports.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,21 +5,16 @@
#'
#' @param last_name Integer vector of last name identifiers for each record (zero indexed; as all that follow). Must match columns numbers in M_rs.
#' @param first_name See last_name
#' @param middle_name See last_name
#' @param mid_name See last_name
#' @param geo Integer vector of geographic units for each record. Must match column number in N_rg
#' @param N_rg Integer matrix of race | geography counts in census (geograpgies in columns).
#' @param M_rs Integer matrix of race | surname counts in dictionary (surnames in columns).
#' @param M_rf Same as `M_rs`, but for first names (can be empty matrix for surname only models).
#' @param M_rm Same as `M_rs`, but for middle names (can be empty matrix for surname, or surname and first name only models).
#' @param alpha Numeric matrix of race | geography prior probabilities.
#' @param pi_s Numeric matrix of race | surname prior probabilities.
#' @param pi_f Same as `pi_s`, but for first names.
#' @param pi_m Same as `pi_s`, but for middle names.
#' @param pi_nr Matrix of marginal probability distribution over missing names; non-keyword names default to this distribution.
#' @param which_names Integer; 0=surname only. 1=surname + first name. 2= surname, first, and middle names.
#' @param samples Integer number of samples to take after (in total)
#' @param burnin Integer number of samples to discard as burn-in of Markov chain
#' @param me_race Boolean; should measurement error in race | geography be corrected?
#' @param race_init Integer vector of initial race assignments
#' @param verbose Boolean; should informative messages be printed?
#'
Expand Down
24 changes: 4 additions & 20 deletions R/census_data_preflight.R
Original file line number Diff line number Diff line change
@@ -1,31 +1,15 @@
#' Preflight census data
#'
#' @param census.data See documentation in \code{race_predict}.
#' @param census.geo See documentation in \code{race_predict}.
#' @param year See documentation in \code{race_predict}.
#' @inheritParams predict_race
#' @keywords internal

census_data_preflight <- function(census.data, census.geo, year) {

if (year != "2020"){
vars_ <- c(
pop_white = 'P005003', pop_black = 'P005004',
pop_aian = 'P005005', pop_asian = 'P005006',
pop_nhpi = 'P005007', pop_other = 'P005008',
pop_two = 'P005009', pop_hisp = 'P005010'
)
} else {
vars_ <- c(
pop_white = 'P2_005N', pop_black = 'P2_006N',
pop_aian = 'P2_007N', pop_asian = 'P2_008N',
pop_nhpi = 'P2_009N', pop_other = 'P2_010N',
pop_two = 'P2_011N', pop_hisp = 'P2_002N'
)
}
vars_ <- unlist(census_geo_api_names(year = year))
legacy_vars <- unlist(census_geo_api_names_legacy(year = year))

test <- lapply(census.data, function(x) {
nms_to_test <- names(x[[census.geo]])
all(vars_ %in% nms_to_test)
all(vars_ %in% nms_to_test) || all(legacy_vars %in% nms_to_test)
})
missings <- names(test)[!unlist(test)]

Expand Down
Loading
Loading