Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"Error in UPpivotal(sites$ip) : there are missing values in the pik vector" #42

Open
kaitlynstrickfaden opened this issue May 6, 2024 · 7 comments

Comments

@kaitlynstrickfaden
Copy link

kaitlynstrickfaden commented May 6, 2024

Not sure what this error means. I get it when I run grts(). I'm using an SF object in a projected coordinate system. I get the error when trying to run a caty_var and caty_n, but I no longer get the error when I comment out those two lines and just use sframe and n_base.

@jasonelaw
Copy link

jasonelaw commented May 6, 2024

Not an expert on the current code - but UPpivotal is the function that actually selects the sample, the sites$ip vector is the site inclusion probabilities, and it looks like the error is happening when you're selecting a stratified random sample. Do you have missing values or other issues with the stratification variable - caty_var?

@kaitlynstrickfaden
Copy link
Author

Thanks for your quick response. No, I don't have any NAs in my data. Some more background if it's helpful: I've split my study area into 1-km grid cells using st_make_grid and then selected only those grid cells that occur within our different subherd boundaries. The resulting sf data consists of 322 polygons. I then create a "Subherd" column based on which of the subherd boundaries that grid cell occurs in. This Subherd column is what I'm trying to use as my caty_var, because I need some grid cells in each of our subherd boundaries.

@jasonelaw
Copy link

I would try to create a reproducible example so that someone can debug the code and see what is happening. See https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example for ways to do that. Once, you have a small, runnable chunk of code that gives the error, copy it here.

@jasonelaw
Copy link

I can reproduce that error with the following code:

library(spsurvey)
data("NRSA_EPA7")

ret <- grts(
  sframe   = NRSA_EPA7, 
  caty_var = "STATE", 
  caty_n   = c("Missouri" = 2, "Kansas" = 2, "Missouri" = 2, "Nebraska" = 2), 
  n_base   = 8
)

Where caty_n values are not provided for every state - I've dropped Iowa. Check the names of your caty_n vector and ensure you have values for every value of caty_var. Adding Iowa fixes the issue:

ret <- grts(
  sframe   = NRSA_EPA7, 
  caty_var = "STATE", 
  caty_n   = c("Missouri" = 2, "Kansas" = 2, "Missouri" = 2, "Nebraska" = 2, "Iowa" = 2), 
  n_base   = 10
)

@kaitlynstrickfaden
Copy link
Author

kaitlynstrickfaden commented May 6, 2024

Ohh, I didn't notice the caty_n vector had to be a named vector. Sorry, that was a simple "look at the R documentation, dummy" error on my pat. The error message was really hard to decipher, so I hope you'll forgive me. I was able to get my code to work by making caty_n a named vector. Thank you!

@jasonelaw
Copy link

No problem. Glad you got it working. spsurvey can be a bit unconventional in the manner that the functions accept arguments - they require you to do a lot of matching up of names which is very easy to get wrong.

@michaeldumelle
Copy link
Collaborator

michaeldumelle commented May 9, 2024

Thanks @kaitlynstrickfaden and @jasonelaw for the discussion and @jasonelaw for providing the help! I will plan to add a check in the next version of spsurvey (the current version is 5.5.1) that returns an informative error message when there are any values in caty_var that are not contained in the names of caty_n.

I'll post on this thread when the check has been implemented!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants