Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace tract in voters dataset that is not present in 2020 census data #111

Closed
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
23 commits
Select commit Hold shift + click to select a range
de9dcce
Use predict_race instead of predict_race_new for initial race values
solivella Sep 6, 2023
10a650c
Fix bug caused by tibble's type safety when accessing columns by posi…
solivella Sep 6, 2023
8f6a522
For race.init, run prediction function quietly
solivella Sep 6, 2023
f52ccf5
docs: use README.Rmd
rossellhayes Nov 15, 2023
a3c8e96
docs(README): move badges to their own line
rossellhayes Nov 15, 2023
4d116d0
docs(README): use thumbnail to link to The Who song
rossellhayes Nov 15, 2023
4246b74
docs(README): add installation instructions
rossellhayes Nov 15, 2023
cedf57e
docs(README): link DOI
rossellhayes Nov 15, 2023
71cf85e
docs: enable markdown in Roxygen comments
rossellhayes Nov 15, 2023
4aa6019
chore: `use_tidy_description()`
rossellhayes Nov 15, 2023
2998d97
docs(predict_race): document that `census.key` will be retrieved from…
rossellhayes Nov 15, 2023
ff8fe3a
docs(README): move instructions for storing API key after instruction…
rossellhayes Nov 15, 2023
616c2c6
docs: inherit from `predict_race()` in other documentation
rossellhayes Nov 15, 2023
ede8ad0
docs(get_census_data): document use of `CENSUS_API_KEY` envvar
rossellhayes Nov 15, 2023
60dda21
fix(voters): use tract NY-061-014900 instead of 015100
rossellhayes Nov 15, 2023
9693562
fix(voters): use tract NY-061-014900 instead of 015100
rossellhayes Nov 15, 2023
9fb7fe0
docs(README): update README
rossellhayes Nov 15, 2023
cf3a3d3
docs(README): add hex sticker
rossellhayes Nov 16, 2023
926c2c1
feat: add CITATION file and use it to generate package startup message
rossellhayes Nov 17, 2023
30ce8f2
feat(.onAttach): add startup message about change to 2020 data in wru…
rossellhayes Nov 17, 2023
e30ab8c
Merge pull request #119 from kosukeimai/bugfix
1beb Nov 28, 2023
05c6147
Merge pull request #116 from rossellhayes/docs/improvements
1beb Nov 28, 2023
5a9a1bd
Merge branch 'dev' into fix/update-voters-tract
1beb Nov 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,5 @@ ChangeLog

^cran-comments\.md$
^CRAN-SUBMISSION$
^README\.Rmd$
^data-raw$
56 changes: 31 additions & 25 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,24 +1,30 @@
Package: wru
Title: Who are You? Bayesian Prediction of Racial Category Using Surname,
First Name, Middle Name, and Geolocation
Version: 2.0.0
Date: 2023-07-12
Title: Who are You? Bayesian Prediction of Racial Category Using Surname, First Name, Middle Name, and
Geolocation
Authors@R: c(
person("Kabir", "Khanna", email = "[email protected]", role = c("aut")),
person("Brandon", "Bertelsen", email = "[email protected]", role = c("aut","cre")),
person("Santiago", "Olivella", email = "[email protected]", role = c("aut")),
person("Evan", "Rosenman", email = "[email protected]", role = c("aut")),
person("Kosuke", "Imai", email = "[email protected]", role = c("aut"))
person("Kabir", "Khanna", , "[email protected]", role = "aut"),
person("Brandon", "Bertelsen", , "[email protected]", role = c("aut", "cre")),
person("Santiago", "Olivella", , "[email protected]", role = "aut"),
person("Evan", "Rosenman", , "[email protected]", role = "aut"),
person("Kosuke", "Imai", , "[email protected]", role = "aut")
)
Description: Predicts individual race/ethnicity using surname, first name, middle name, geolocation,
and other attributes, such as gender and age. The method utilizes Bayes'
Rule (with optional measurement error correction) to compute the posterior probability of each racial category for any given
individual. The package implements methods described in Imai and Khanna (2016)
"Improving Ecological Inference by Predicting Individual Ethnicity from Voter
Registration Records" Political Analysis <DOI:10.1093/pan/mpw001> and Imai, Olivella, and Rosenman (2022)
"Addressing census data problems in race imputation via fully Bayesian Improved Surname Geocoding and name supplements"
<DOI:10.1126/sciadv.adc9824>. The package also incorporates the data described in Rosenman, Olivella, and Imai (2023)
"Race and ethnicity data for first, middle, and surnames" <DOI:10.1038/s41597-023-02202-2>.
Description: Predicts individual race/ethnicity using surname, first name,
middle name, geolocation, and other attributes, such as gender and
age. The method utilizes Bayes' Rule (with optional measurement error
correction) to compute the posterior probability of each racial
category for any given individual. The package implements methods
described in Imai and Khanna (2016) "Improving Ecological Inference by
Predicting Individual Ethnicity from Voter Registration Records"
Political Analysis <DOI:10.1093/pan/mpw001> and Imai, Olivella, and
Rosenman (2022) "Addressing census data problems in race imputation
via fully Bayesian Improved Surname Geocoding and name supplements"
<DOI:10.1126/sciadv.adc9824>. The package also incorporates the data
described in Rosenman, Olivella, and Imai (2023) "Race and ethnicity
data for first, middle, and surnames"
<DOI:10.1038/s41597-023-02202-2>.
License: GPL (>= 3)
URL: https://github.com/kosukeimai/wru
BugReports: https://github.com/kosukeimai/wru/issues
Depends:
Expand All @@ -28,20 +34,20 @@ Imports:
dplyr,
furrr,
future,
piggyback (>= 0.1.4),
PL94171,
purrr,
Rcpp,
piggyback (>= 0.1.4),
PL94171
Rcpp
Suggests:
testthat (>= 3.0.0),
covr
covr,
testthat (>= 3.0.0)
LinkingTo:
Rcpp,
RcppArmadillo
LazyLoad: yes
Config/testthat/edition: 3
Encoding: UTF-8
LazyData: yes
LazyDataCompression: xz
License: GPL (>= 3)
LazyLoad: yes
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Encoding: UTF-8
Config/testthat/edition: 3
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ export(predict_race)
import(PL94171)
importFrom(Rcpp,evalCpp)
importFrom(dplyr,coalesce)
importFrom(dplyr,pull)
importFrom(furrr,future_map_dfr)
importFrom(piggyback,pb_download)
importFrom(purrr,map_dfr)
Expand Down
4 changes: 1 addition & 3 deletions R/census_data_preflight.R
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
#' Preflight census data
#'
#' @param census.data See documentation in \code{race_predict}.
#' @param census.geo See documentation in \code{race_predict}.
#' @param year See documentation in \code{race_predict}.
#' @inheritParams predict_race
#' @keywords internal

census_data_preflight <- function(census.data, census.geo, year) {
Expand Down
9 changes: 7 additions & 2 deletions R/get_census_data.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,13 @@
#' for specified state(s). Using this function to download Census data in advance
#' can save considerable time when running \code{predict_race} and \code{census_helper}.
#'
#' @param key A required character object containing a valid Census API key,
#' which can be requested \href{https://api.census.gov/data/key_signup.html}{here}.
#' @param key A character string containing a valid U.S. Census API key,
#' which can be requested from the
#' [U.S. Census API key signup page](https://api.census.gov/data/key_signup.html).
#'
#' If [`NULL`], the default, attempts to find a census key stored in an
#' [environment variable][Sys.getenv] named `CENSUS_API_KEY`.
#'
#' @param states which states to extract Census data for, e.g., \code{c("NJ", "NY")}.
#' @param age A \code{TRUE}/\code{FALSE} object indicating whether to condition on
#' age or not. If \code{FALSE} (default), function will return Pr(Geolocation | Race).
Expand Down
39 changes: 23 additions & 16 deletions R/predict_race.R
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,14 @@
#' must have column named \code{place}.
#' Specifying \code{\var{census.geo}} will call \code{census_helper} function
#' to merge Census geographic data at specified level of geography.
#' @param census.key A character object specifying user's Census API
#' key. Required if \code{\var{census.geo}} is specified, because
#' a valid Census API key is required to download Census geographic data.
#'
#' @param census.key A character object specifying user's Census API key.
#' Required if `census.geo` is specified, because a valid Census API key is
#' required to download Census geographic data.
#'
#' If [`NULL`], the default, attempts to find a census key stored in an
#' [environment variable][Sys.getenv] named `CENSUS_API_KEY`.
#'
#' @param census.data A list indexed by two-letter state abbreviations,
#' which contains pre-saved Census geographic data.
#' Can be generated using \code{get_census_data} function.
Expand Down Expand Up @@ -225,19 +230,21 @@ predict_race <- function(voter.file, census.surname = TRUE, surname.only = FALSE
if(ctrl$verbose){
message("Using `predict_race` to obtain initial race prediction priors with BISG model")
}
race.init <- predict_race_new(voter.file = voter.file,
names.to.use = names.to.use,
year = year,
age = age, sex = sex, # not implemented, default to F
census.geo = census.geo,
census.key = census.key,
name.dictionaries = name.dictionaries,
surname.only=surname.only,
census.data = census.data,
retry = retry,
impute.missing = TRUE,
census.surname = census.surname,
use.counties = use.counties)
race.init <- predict_race(voter.file = voter.file,
names.to.use = names.to.use,
year = year,
age = age, sex = sex, # not implemented, default to F
census.geo = census.geo,
census.key = census.key,
name.dictionaries = name.dictionaries,
surname.only=surname.only,
census.data = census.data,
retry = retry,
impute.missing = TRUE,
census.surname = census.surname,
use.counties = use.counties,
model = "BISG",
control = list(verbose=FALSE))
race.init <- max.col(
race.init[, paste0("pred.", c("whi", "bla", "his", "asi", "oth"))],
ties.method = "random"
Expand Down
28 changes: 8 additions & 20 deletions R/race_prediction_funs.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
#' Internal model fitting functions
#'
#' These functions are intended for internal use only. Users should use the
#' \code{race_predict} interface rather any of these functions directly.
#' [predict_race()] interface rather any of these functions directly.
#'
#' These functions fit different versions of WRU. \code{.predict_race_old} fits
#' the original WRU model, also known as BISG with census-based surname dictionary.
Expand All @@ -13,26 +13,11 @@
#' the augmented surname dictionary, and the first and middle name
#' dictionaries when making predictions.
#'
#' @param voter.file See documentation in \code{race_predict}.
#' @param census.surname See documentation in \code{race_predict}.
#' @param surname.only See documentation in \code{race_predict}.
#' @param surname.year See documentation in \code{race_predict}.
#' @param census.geo See documentation in \code{race_predict}.
#' @param census.key See documentation in \code{race_predict}.
#' @param census.data See documentation in \code{race_predict}.
#' @param age See documentation in \code{race_predict}.
#' @param sex See documentation in \code{race_predict}.
#' @param year See documentation in \code{race_predict}.
#' @param party See documentation in \code{race_predict}.
#' @param retry See documentation in \code{race_predict}.
#' @param impute.missing See documentation in \code{race_predict}.
#' @param names.to.use See documentation in \code{race_predict}.
#' @param race.init See documentation in \code{race_predict}.
#' @param name.dictionaries See documentation in \code{race_predict}.
#' @param ctrl See \code{control} in documentation for \code{race_predict}.
#' @inheritParams predict_race
#' @param ctrl See `control` in documentation for [predict_race()].
#' @param use.counties A logical, defaulting to FALSE. Should census data be filtered by counties available in \var{census.data}?
#'
#' @return See documentation in \code{race_predict}.
#' @inherit predict_race return
#'
#' @name modfuns
NULL
Expand Down Expand Up @@ -261,6 +246,7 @@ NULL
#' New race prediction function, implementing classical BISG with augmented
#' surname dictionary, as well as first and middle name information.
#' @rdname modfuns
#' @keywords internal
predict_race_new <- function(voter.file, names.to.use, year = "2020",age = FALSE, sex = FALSE,
census.geo, census.key = NULL, name.dictionaries, surname.only=FALSE,
census.data = NULL, retry = 0, impute.missing = TRUE, census.surname = FALSE,
Expand Down Expand Up @@ -429,7 +415,9 @@ predict_race_new <- function(voter.file, names.to.use, year = "2020",age = FALSE
#' New race prediction function, implementing fBISG (i.e. measurement
#' error correction, fully Bayesian model) with augmented
#' surname dictionary, as well as first and middle name information.
#' @importFrom dplyr pull
#' @rdname modfuns
#' @keywords internal
predict_race_me <- function(voter.file, names.to.use, year = "2020",age = FALSE, sex = FALSE,
census.geo, census.key, name.dictionaries, surname.only=FALSE,
census.data = NULL, retry = 0, impute.missing = TRUE, census.surname = FALSE,
Expand Down Expand Up @@ -604,7 +592,7 @@ predict_race_me <- function(voter.file, names.to.use, year = "2020",age = FALSE,
surname = last_c,
first = first_c,
middle = mid_c)
kw_names <- toupper(ntab[, 1])
kw_names <- toupper(dplyr::pull(ntab, 1))
proc_names_vf <- .name_preproc(voter.file[[ntype]], c(kw_names))
u_vf_names <- unique(proc_names_vf)
kw_in_vf <- kw_names %in% proc_names_vf
Expand Down
15 changes: 9 additions & 6 deletions R/wru-internal.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,10 @@
.onAttach <-
function(libname, pkgname) {
packageStartupMessage("\nPlease cite as: \n")
packageStartupMessage("Khanna K, Bertelsen B, Olivella S, Rosenman E, Imai K (2022). wru: Who are You?")
packageStartupMessage("Bayesian Prediction of Racial Category Using Surname, First Name, Middle Name, and Geolocation.")
packageStartupMessage("URL: https://CRAN.R-project.org/package=wru \n")
.onAttach <- function(libname, pkgname) {
packageStartupMessage(
"\n",
"Please cite as:", "\n\n",
format(citation("wru"), style = "text"), "\n\n",
"Note that wru 2.0.0 uses 2020 census data by default.", "\n",
'Use the argument `year = "2010"`, to replicate analyses produced with earlier package versions.',
"\n"
)
}
Loading
Loading