Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

All functions that take a Census API key as an argument now use Sys.getenv("CENSUS_API_KEY") by default #112

Closed
wants to merge 30 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
30 commits
Select commit Hold shift + click to select a range
de9dcce
Use predict_race instead of predict_race_new for initial race values
solivella Sep 6, 2023
10a650c
Fix bug caused by tibble's type safety when accessing columns by posi…
solivella Sep 6, 2023
8f6a522
For race.init, run prediction function quietly
solivella Sep 6, 2023
f52ccf5
docs: use README.Rmd
rossellhayes Nov 15, 2023
a3c8e96
docs(README): move badges to their own line
rossellhayes Nov 15, 2023
4d116d0
docs(README): use thumbnail to link to The Who song
rossellhayes Nov 15, 2023
4246b74
docs(README): add installation instructions
rossellhayes Nov 15, 2023
cedf57e
docs(README): link DOI
rossellhayes Nov 15, 2023
71cf85e
docs: enable markdown in Roxygen comments
rossellhayes Nov 15, 2023
4aa6019
chore: `use_tidy_description()`
rossellhayes Nov 15, 2023
2998d97
docs(predict_race): document that `census.key` will be retrieved from…
rossellhayes Nov 15, 2023
ff8fe3a
docs(README): move instructions for storing API key after instruction…
rossellhayes Nov 15, 2023
616c2c6
docs: inherit from `predict_race()` in other documentation
rossellhayes Nov 15, 2023
ede8ad0
docs(get_census_data): document use of `CENSUS_API_KEY` envvar
rossellhayes Nov 15, 2023
60dda21
fix(voters): use tract NY-061-014900 instead of 015100
rossellhayes Nov 15, 2023
9693562
fix(voters): use tract NY-061-014900 instead of 015100
rossellhayes Nov 15, 2023
e411414
chore: `use_tidy_description()`
rossellhayes Nov 15, 2023
13265c6
chore: add dependencies on `cli` and `rlang` (already indirect depend…
rossellhayes Nov 15, 2023
60c1cd2
feat: add helper function `validate_key()`
rossellhayes Nov 16, 2023
7f5c0b6
feat: all functions that take a census API key now check `CENSUS_API_…
rossellhayes Nov 16, 2023
9fb7fe0
docs(README): update README
rossellhayes Nov 15, 2023
cf3a3d3
docs(README): add hex sticker
rossellhayes Nov 16, 2023
57fbc2d
refactor(validate_key): use `rlang::caller_arg()` to determine `argum…
rossellhayes Nov 17, 2023
926c2c1
feat: add CITATION file and use it to generate package startup message
rossellhayes Nov 17, 2023
30ce8f2
feat(.onAttach): add startup message about change to 2020 data in wru…
rossellhayes Nov 17, 2023
e30ab8c
Merge pull request #119 from kosukeimai/bugfix
1beb Nov 28, 2023
05c6147
Merge pull request #116 from rossellhayes/docs/improvements
1beb Nov 28, 2023
5a9a1bd
Merge branch 'dev' into fix/update-voters-tract
1beb Nov 28, 2023
80a1ea3
Merge pull request #118 from rossellhayes/fix/update-voters-tract
1beb Nov 28, 2023
47274f2
Merge branch 'dev' into feat/key-envvar
1beb Nov 28, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .Rbuildignore
Original file line number Diff line number Diff line change
Expand Up @@ -12,3 +12,5 @@ ChangeLog

^cran-comments\.md$
^CRAN-SUBMISSION$
^README\.Rmd$
^data-raw$
57 changes: 33 additions & 24 deletions DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,47 +1,56 @@
Package: wru
Title: Who are You? Bayesian Prediction of Racial Category Using Surname,
First Name, Middle Name, and Geolocation
Version: 2.0.0
Date: 2023-07-12
Title: Who are You? Bayesian Prediction of Racial Category Using Surname, First Name, Middle Name, and
Geolocation
Authors@R: c(
person("Kabir", "Khanna", email = "[email protected]", role = c("aut")),
person("Brandon", "Bertelsen", email = "[email protected]", role = c("aut","cre")),
person("Santiago", "Olivella", email = "[email protected]", role = c("aut")),
person("Evan", "Rosenman", email = "[email protected]", role = c("aut")),
person("Kosuke", "Imai", email = "[email protected]", role = c("aut"))
person("Kabir", "Khanna", , "[email protected]", role = "aut"),
person("Brandon", "Bertelsen", , "[email protected]", role = c("aut", "cre")),
person("Santiago", "Olivella", , "[email protected]", role = "aut"),
person("Evan", "Rosenman", , "[email protected]", role = "aut"),
person("Kosuke", "Imai", , "[email protected]", role = "aut")
)
Description: Predicts individual race/ethnicity using surname, first name, middle name, geolocation,
and other attributes, such as gender and age. The method utilizes Bayes'
Rule (with optional measurement error correction) to compute the posterior probability of each racial category for any given
individual. The package implements methods described in Imai and Khanna (2016)
"Improving Ecological Inference by Predicting Individual Ethnicity from Voter
Registration Records" Political Analysis <DOI:10.1093/pan/mpw001> and Imai, Olivella, and Rosenman (2022)
"Addressing census data problems in race imputation via fully Bayesian Improved Surname Geocoding and name supplements"
<DOI:10.1126/sciadv.adc9824>. The package also incorporates the data described in Rosenman, Olivella, and Imai (2023)
"Race and ethnicity data for first, middle, and surnames" <DOI:10.1038/s41597-023-02202-2>.
Description: Predicts individual race/ethnicity using surname, first name,
middle name, geolocation, and other attributes, such as gender and
age. The method utilizes Bayes' Rule (with optional measurement error
correction) to compute the posterior probability of each racial
category for any given individual. The package implements methods
described in Imai and Khanna (2016) "Improving Ecological Inference by
Predicting Individual Ethnicity from Voter Registration Records"
Political Analysis <DOI:10.1093/pan/mpw001> and Imai, Olivella, and
Rosenman (2022) "Addressing census data problems in race imputation
via fully Bayesian Improved Surname Geocoding and name supplements"
<DOI:10.1126/sciadv.adc9824>. The package also incorporates the data
described in Rosenman, Olivella, and Imai (2023) "Race and ethnicity
data for first, middle, and surnames"
<DOI:10.1038/s41597-023-02202-2>.
License: GPL (>= 3)
URL: https://github.com/kosukeimai/wru
BugReports: https://github.com/kosukeimai/wru/issues
Depends:
R (>= 4.1.0),
utils
Imports:
cli,
dplyr,
furrr,
future,
piggyback (>= 0.1.4),
PL94171,
purrr,
Rcpp,
piggyback (>= 0.1.4),
PL94171
rlang

Suggests:
testthat (>= 3.0.0),
covr
covr,
testthat (>= 3.0.0)
LinkingTo:
Rcpp,
RcppArmadillo
LazyLoad: yes
Config/testthat/edition: 3
Encoding: UTF-8
LazyData: yes
LazyDataCompression: xz
License: GPL (>= 3)
LazyLoad: yes
Roxygen: list(markdown = TRUE)
RoxygenNote: 7.2.3
Encoding: UTF-8
Config/testthat/edition: 3
1 change: 1 addition & 0 deletions NAMESPACE
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ export(predict_race)
import(PL94171)
importFrom(Rcpp,evalCpp)
importFrom(dplyr,coalesce)
importFrom(dplyr,pull)
importFrom(furrr,future_map_dfr)
importFrom(piggyback,pb_download)
importFrom(purrr,map_dfr)
Expand Down
4 changes: 1 addition & 3 deletions R/census_data_preflight.R
Original file line number Diff line number Diff line change
@@ -1,8 +1,6 @@
#' Preflight census data
#'
#' @param census.data See documentation in \code{race_predict}.
#' @param census.geo See documentation in \code{race_predict}.
#' @param year See documentation in \code{race_predict}.
#' @inheritParams predict_race
#' @keywords internal

census_data_preflight <- function(census.data, census.geo, year) {
Expand Down
30 changes: 18 additions & 12 deletions R/census_geo_api.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,7 @@
#' This function allows users to download U.S. Census geographic data (2010 or 2020),
#' at either the county, tract, block, or place level, for a particular state.
#'
#' @param key A required character object. Must contain user's Census API
#' key, which can be requested \href{https://api.census.gov/data/key_signup.html}{here}.
#' @inheritParams get_census_data
#' @param state A required character object specifying which state to extract Census data for,
#' e.g., \code{"NJ"}.
#' @param geo A character object specifying what aggregation level to use.
Expand Down Expand Up @@ -35,24 +34,31 @@
#'
#' @examples
#' \dontshow{data(voters)}
#' \dontrun{census_geo_api(key = "...", states = c("NJ", "DE"), geo = "block")}
#' \dontrun{census_geo_api(key = "...", states = "FL", geo = "tract", age = TRUE, sex = TRUE)}
#' \dontrun{census_geo_api(key = "...", states = "MA", geo = "place", age = FALSE, sex = FALSE,
#' \dontrun{census_geo_api(states = c("NJ", "DE"), geo = "block")}
#' \dontrun{census_geo_api(states = "FL", geo = "tract", age = TRUE, sex = TRUE)}
#' \dontrun{census_geo_api(states = "MA", geo = "place", age = FALSE, sex = FALSE,
#' year = "2020")}
#'
#' @references
#' Relies on get_census_api, get_census_api_2, and vec_to_chunk functions authored by Nicholas Nagle,
#' available \href{https://rstudio-pubs-static.s3.amazonaws.com/19337_2e7f827190514c569ea136db788ce850.html}{here}.
#' Relies on `get_census_api()`, `get_census_api_2()`, and `vec_to_chunk()` functions authored by Nicholas Nagle,
#' available [here](https://rstudio-pubs-static.s3.amazonaws.com/19337_2e7f827190514c569ea136db788ce850.html).
#'
#' @importFrom furrr future_map_dfr
#' @importFrom purrr map_dfr
#' @keywords internal

census_geo_api <- function(key = NULL, state, geo = "tract", age = FALSE, sex = FALSE, year = "2020", retry = 3, save_temp = NULL, counties = NULL) {

if (missing(key)) {
stop('Must enter U.S. Census API key, which can be requested at https://api.census.gov/data/key_signup.html.')
}
census_geo_api <- function(
key = Sys.getenv("CENSUS_API_KEY"),
state,
geo = "tract",
age = FALSE,
sex = FALSE,
year = "2020",
retry = 3,
save_temp = NULL,
counties = NULL
) {
validate_key(key)

census <- NULL
state <- toupper(state)
Expand Down
26 changes: 17 additions & 9 deletions R/census_helper.R
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,7 @@
#' at the county, tract, block, or place level. Census data calculated are
#' Pr(Geolocation | Race) where geolocation is county, tract, block, or place.
#'
#' @param key A required character object. Must contain user's Census API
#' key, which can be requested \href{https://api.census.gov/data/key_signup.html}{here}.
#' @inheritParams get_census_data
#' @param voter.file An object of class \code{data.frame}. Must contain field(s) named
#' \code{\var{county}}, \code{\var{tract}}, \code{\var{block}}, and/or \code{\var{place}}
#' specifying geolocation. These should be character variables that match up with
Expand Down Expand Up @@ -54,34 +53,43 @@
#' data(voters)
#' }
#' \dontrun{
#' census_helper(key = "...", voter.file = voters, states = "nj", geo = "block")
#' census_helper(voter.file = voters, states = "nj", geo = "block")
#' }
#' \dontrun{
#' census_helper(
#' key = "...", voter.file = voters, states = "all", geo = "tract",
#' voter.file = voters, states = "all", geo = "tract",
#' age = TRUE, sex = TRUE
#' )
#' }
#' \dontrun{
#' census_helper(
#' key = "...", voter.file = voters, states = "all", geo = "county",
#' voter.file = voters, states = "all", geo = "county",
#' age = FALSE, sex = FALSE, year = "2020"
#' )
#' }
#'
#' @keywords internal

census_helper <- function(key, voter.file, states = "all", geo = "tract", age = FALSE, sex = FALSE, year = "2020", census.data = NULL, retry = 3, use.counties = FALSE) {
census_helper <- function(
key = Sys.getenv("CENSUS_API_KEY"),
voter.file,
states = "all",
geo = "tract",
age = FALSE,
sex = FALSE,
year = "2020",
census.data = NULL,
retry = 3,
use.counties = FALSE
) {
if (is.null(census.data) || (typeof(census.data) != "list")) {
toDownload <- TRUE
} else {
toDownload <- FALSE
}

if (toDownload) {
if (missing(key)) {
stop("Must enter U.S. Census API key, which can be requested at https://api.census.gov/data/key_signup.html.")
}
validate_key(key)
}

states <- toupper(states)
Expand Down
27 changes: 17 additions & 10 deletions R/census_helper_v2.R
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,7 @@
#' at the county, tract, block, or place level. Census data calculated are
#' Pr(Geolocation | Race) where geolocation is county, tract, block, or place.
#'
#' @param key A required character object. Must contain user's Census API
#' key, which can be requested \href{https://api.census.gov/data/key_signup.html}{here}.
#' @inheritParams get_census_data
#' @param voter.file An object of class \code{data.frame}. Must contain field(s) named
#' \code{\var{county}}, \code{\var{tract}}, \code{\var{block}}, and/or \code{\var{place}}
#' specifying geolocation. These should be character variables that match up with
Expand Down Expand Up @@ -49,15 +48,25 @@
#'
#' @examples
#' \dontshow{data(voters)}
#' \dontrun{census_helper_new(key = "...", voter.file = voters, states = "nj", geo = "block")}
#' \dontrun{census_helper_new(key = "...", voter.file = voters, states = "all", geo = "tract")}
#' \dontrun{census_helper_new(key = "...", voter.file = voters, states = "all", geo = "place",
#' \dontrun{census_helper_new(voter.file = voters, states = "nj", geo = "block")}
#' \dontrun{census_helper_new(voter.file = voters, states = "all", geo = "tract")}
#' \dontrun{census_helper_new(voter.file = voters, states = "all", geo = "place",
#' year = "2020")}
#'
#' @keywords internal

census_helper_new <- function(key, voter.file, states = "all", geo = "tract", age = FALSE, sex = FALSE, year = "2020", census.data = NULL, retry = 3, use.counties = FALSE) {

census_helper_new <- function(
key = Sys.getenv("CENSUS_API_KEY"),
voter.file,
states = "all",
geo = "tract",
age = FALSE,
sex = FALSE,
year = "2020",
census.data = NULL,
retry = 3,
use.counties = FALSE
) {
if (geo == "precinct") {
stop("Error: census_helper_new function does not currently support precinct-level data.")
}
Expand All @@ -76,9 +85,7 @@ census_helper_new <- function(key, voter.file, states = "all", geo = "tract", ag
}

if (toDownload) {
if (missing(key)) {
stop('Must enter U.S. Census API key, which can be requested at https://api.census.gov/data/key_signup.html.')
}
validate_key(key)
}

states <- toupper(states)
Expand Down
13 changes: 9 additions & 4 deletions R/get_census_api.R
Original file line number Diff line number Diff line change
Expand Up @@ -5,10 +5,9 @@
#' This function obtains U.S. Census data via the public API. User
#' can specify the variables and region(s) for which to obtain data.
#'
#' @inheritParams get_census_data
#' @param data_url URL root of the API,
#' e.g., \code{"https://api.census.gov/data/2020/dec/pl"}.
#' @param key A required character object containing user's Census API key,
#' which can be requested \href{https://api.census.gov/data/key_signup.html}{here}.
#' @param var.names A character vector of variables to get,
#' e.g., \code{c("P2_005N", "P2_006N", "P2_007N", "P2_008N")}.
#' If there are more than 50 variables, then function will automatically
Expand All @@ -23,7 +22,7 @@
#' @examples
#' \dontrun{
#' get_census_api(
#' data_url = "https://api.census.gov/data/2020/dec/pl", key = "...",
#' data_url = "https://api.census.gov/data/2020/dec/pl",
#' var.names = c("P2_005N", "P2_006N", "P2_007N", "P2_008N"), region = "for=county:*&in=state:34"
#' )
#' }
Expand All @@ -33,7 +32,13 @@
#' \href{https://rstudio-pubs-static.s3.amazonaws.com/19337_2e7f827190514c569ea136db788ce850.html}{here}.
#'
#' @keywords internal
get_census_api <- function(data_url, key, var.names, region, retry = 0) {
get_census_api <- function(
data_url,
key = Sys.getenv("CENSUS_API_KEY"),
var.names,
region,
retry = 0
) {
if (length(var.names) > 50) {
var.names <- vec_to_chunk(var.names) # Split variables into a list
get <- lapply(var.names, function(x) paste(x, sep = "", collapse = ","))
Expand Down
13 changes: 9 additions & 4 deletions R/get_census_api_2.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,9 @@
#' It is used by the \code{get_census_api} function. The user should not need to call this
#' function directly.
#'
#' @inheritParams get_census_data
#' @param data_url URL root of the API,
#' e.g., \code{"https://api.census.gov/data/2020/dec/pl"}.
#' @param key A required character object containing user's Census API key,
#' which can be requested \href{https://api.census.gov/data/key_signup.html}{here}.
#' @param get A character vector of variables to get,
#' e.g., \code{c("P2_005N", "P2_006N", "P2_007N", "P2_008N")}.
#' If there are more than 50 variables, then function will automatically
Expand All @@ -22,15 +21,21 @@
#' If unsuccessful, function prints the URL query that was constructed.
#'
#' @examples
#' \dontrun{try(get_census_api_2(data_url = "https://api.census.gov/data/2020/dec/pl", key = "...",
#' \dontrun{try(get_census_api_2(data_url = "https://api.census.gov/data/2020/dec/pl",
#' get = c("P2_005N", "P2_006N", "P2_007N", "P2_008N"), region = "for=county:*&in=state:34"))}
#'
#' @references
#' Based on code authored by Nicholas Nagle, which is available
#' \href{https://rstudio-pubs-static.s3.amazonaws.com/19337_2e7f827190514c569ea136db788ce850.html}{here}.
#'
#' @keywords internal
get_census_api_2 <- function(data_url, key, get, region, retry = 3){
get_census_api_2 <- function(
data_url,
key = Sys.getenv("CENSUS_API_KEY"),
get,
region,
retry = 3
){
if(length(get) > 1) {
get <- paste(get, collapse=',', sep='')
}
Expand Down
34 changes: 20 additions & 14 deletions R/get_census_data.R
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,13 @@
#' for specified state(s). Using this function to download Census data in advance
#' can save considerable time when running \code{predict_race} and \code{census_helper}.
#'
#' @param key A required character object containing a valid Census API key,
#' which can be requested \href{https://api.census.gov/data/key_signup.html}{here}.
#' @param key A character string containing a valid Census API key,
#' which can be requested from the
#' [U.S. Census API key signup page](https://api.census.gov/data/key_signup.html).
#'
#' By default, attempts to find a census key stored in an
#' [environment variable][Sys.getenv] named `CENSUS_API_KEY`.
#'
#' @param states which states to extract Census data for, e.g., \code{c("NJ", "NY")}.
#' @param age A \code{TRUE}/\code{FALSE} object indicating whether to condition on
#' age or not. If \code{FALSE} (default), function will return Pr(Geolocation | Race).
Expand All @@ -32,18 +37,19 @@
#' @export
#'
#' @examples
#' \dontrun{get_census_data(key = "...", states = c("NJ", "NY"), age = TRUE, sex = FALSE)}
#' \dontrun{get_census_data(key = "...", states = "MN", age = FALSE, sex = FALSE, year = "2020")}
get_census_data <- function(key = NULL, states, age = FALSE, sex = FALSE, year = "2020", census.geo = "block", retry = 3, county.list = NULL) {

if (is.null(key)) {
# Matches tidycensus name for env var
key <- Sys.getenv("CENSUS_API_KEY")
}

if (missing(key) | key == "") {
stop('Must enter valid Census API key, which can be requested at https://api.census.gov/data/key_signup.html.')
}
#' \dontrun{get_census_data(states = c("NJ", "NY"), age = TRUE, sex = FALSE)}
#' \dontrun{get_census_data(states = "MN", age = FALSE, sex = FALSE, year = "2020")}
get_census_data <- function(
key = Sys.getenv("CENSUS_API_KEY"),
states,
age = FALSE,
sex = FALSE,
year = "2020",
census.geo = "block",
retry = 3,
county.list = NULL
) {
validate_key(key)

states <- toupper(states)

Expand Down
Loading
Loading