-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Uninformative error message when exhausting names #83
Comments
Thank you for the comment. I will think about how to add better messaging for the circumstance you provide. If you are interested in getting a longer list of unique first.last name combinations, you can change sample.with.replacement = TRUE and then select out the unique combinations that occur. The error you provide is because the internal data probably doesn't have enough female or male first names. Since the package is making combinations of first and last, there are probably millions of those. To get 25,000 first/last name combinations you could do the following: gender <- rep(c("M", "F"), 15000) unique_names <- head(unique(names), 25000) I asked for 30,000 names to begin with to make sure I had 25,000 uniques. I've considered how to add this little trick for creating LONG lists of names, but haven't quite figured out how to put this into the package well. |
Thanks for the response. I hadn't realised that #' Sample names using [randomNames::randomNames()]
#'
#' @description
#' Sample names for specified genders by sampling with replacement to avoid
#' exhausting number of name when `sample.with.replacement = FALSE`. The
#' duplicated names during sampling need to be removed to ensure each
#' individual has a unique name. In order to have enough unique names, more
#' names than required are sampled from [randomNames()], and the level of
#' oversampling is determined by the `buffer_factor` argument. A
#' `buffer_factor` too high and the more names are sampled which takes longer,
#' a `buffer_factor` too low and not enough unique names are sampled and
#' the `.sample_names()` function will need to loop until it has enough
#' unique names.
#'
#' @inheritParams .add_date
#' @param buffer_factor A single `numeric` determining the level of
#' oversampling (or buffer) when creating a vector of unique names from
#' [randomNames()].
#'
#' @return A `character` vector.
#' @keywords internal
.sample_names <- function(.data,
buffer_factor = 1.5) {
m_idx <- .data$gender == "m"
f_idx <- .data$gender == "f"
num_m <- sum(m_idx)
num_f <- sum(f_idx)
num_sample_m <- ceiling(num_m * buffer_factor)
num_sample_f <- ceiling(num_f * buffer_factor)
# create sample of names so there are no duplicates
names_m <- character(0)
while(length(names_m) < num_m) {
names_m <- unique(
randomNames::randomNames(
which.names = "both",
name.sep = " ",
name.order = "first.last",
gender = rep("M", num_sample_m),
sample.with.replacement = TRUE
)
)
}
names_f <- character(0)
while(length(names_f) < num_f) {
names_f <- unique(
randomNames::randomNames(
which.names = "both",
name.sep = " ",
name.order = "first.last",
gender = rep("F", num_sample_f),
sample.with.replacement = TRUE
)
)
}
# subset to use required number of names
names_m <- names_m[1:num_m]
names_f <- names_f[1:num_f]
# order names with gender codes from .data
names_mf <- vector(mode = "character", length = nrow(.data))
names_mf[m_idx] <- names_m
names_mf[f_idx] <- names_f
# return vector of names
names_mf
}
|
It seems that when the number of names is exhausted when using
randomNames()
(withsample.with.replacement = FALSE
) it gives an uninformative error message about sampling. It would be great if the {randomNames} package could provide the user with an custom informative error message when the requested number of names is too large. This error message can also suggest turningsample.with.replacement
toTRUE
to help.Here is a reprex to show an example
Created on 2024-01-18 with reprex v2.0.2
The text was updated successfully, but these errors were encountered: