Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using predict.gstat function with nsim>0 causes segfault error #141

Open
lea-c opened this issue May 7, 2024 · 9 comments
Open

Using predict.gstat function with nsim>0 causes segfault error #141

lea-c opened this issue May 7, 2024 · 9 comments
Labels

Comments

@lea-c
Copy link

lea-c commented May 7, 2024

Hi,

I've been encountering a problem I don't understand with the predict.gstat function.

I would like to use it to interpolate residuals using a cokriging model. Running the function with nsim=0 works fine but if I increase it to a larger number (even a small one like 10, although I need to run it eventually with nsim=300) it automatically causes the following error, even on computers with large computing power and memory:
*** caught segfault *** address 0x38, cause 'memory not mapped'

Could you help me identify the issue ?
I checked for duplicate coordinates with zerodist, for unmatching CRSs, I tried with many values of maxdist and nmax, as well as with fewer points, so I'm at a loss here.

Below is a minimal example, and here is the link to download data: https://filesender.renater.fr/?s=download&token=911d7d4c-6bf5-4bec-a316-8adc5417b9d8
Thank you very much in advance.
I remain available for any complementary information.

library(gstat)

# Load variogram
load("./var_awc.RData")
awc_sem <- cv.fit

# Load data to predict
val_sp <- readRDS("./point_data.Rdata")

# Works fine
gstat:::predict.gstat(awc_sem, newdata = val_sp)

# Crashes
gstat:::predict.gstat(awc_sem, newdata = val_sp, nsim=10, maxdist=500)

@lea-c
Copy link
Author

lea-c commented May 13, 2024

Hi, I've analysed the problem more in depth and it seems to stem from a projection error. The points in val_sp are projected too far away from the points used to fit the variogram and it may cause a memory overload ? This is my hypothesis here.
Unfortunately, I still have not been able to solve the issue.

Fyi, here is a dataset for which the 300 simulations can be carried out: https://filesender.renater.fr/?s=download&token=c23e2359-6334-42ae-b425-2127bd302b63
It contains a lot more points that val_sp so this is in line with the fact that the number of points in val_sp plays is not responsible here.

This dataset has the same CRS and coordinates of similar magnitude to val_sp's, but it does not look like their extents overlap. Weirdly, when I plot grd and then val_sp, they look like they are in the exact same projection.
grd <- readRDS("./fine_data.Rdata")
plot(grd)
plot(val_sp, add=T)
But when I plot val_sp before grd, an inconsistency appears, and the plots don't overlap.
plot(val_sp)
plot(grd, add=T)

I've dealt with many CRSs issues before but I cannot figure this one out. I've reprojected the datasets over and over to make sure the CRSs match, but without success. Could you just help me make sure that val_sp and grd align? This will certainly solve the issue. Thank you very much for your help! And again, I remain available if you have any question

@edzer
Copy link
Member

edzer commented May 13, 2024

Thanks for the update; I've looked at the issue and can reproduce the error you're getting. I can see where it happens when run in a debugger, but haven't been able to identify its cause.

@lea-c
Copy link
Author

lea-c commented May 15, 2024

Thank you very much for looking into this problem!
Here is another update: actually I don't think this is a CRS issue. I found a dataset very similar to mine and with the same CRS which does not cause any issue. You can download the new data here...
https://filesender.renater.fr/?s=download&token=7bc3178c-2c7a-4fd7-89b4-b15e2afcb419
...and run the following code:

library(sp)
library(gstat)

# Load variogram
variogram <- readRDS("./variogram.Rdata")

# Load the two datasets
spatial_points_problem <- readRDS("./spatial_points_problem.Rdata")
spatial_points_working <- readRDS("./spatial_points_working.Rdata")

# Compare CRS
proj4string(spatial_points_problem)
proj4string(spatial_points_working)

# Compare extents
plot(extent(spatial_points_problem@bbox), col="red")
plot(extent(spatial_points_working@bbox), col="blue",add=T)

# Check for duplicates
zerodist(spatial_points_problem, z=0.9)
zerodist(spatial_points_working, z=0.9)

# Compare number of points - the only difference here but again, it should not cause any issue
print(nrow(spatial_points_problem))
print(nrow(spatial_points_working))

# Plot the two datasets
plot(spatial_points_problem, col="red")
plot(spatial_points_working, col="blue", add=T)

# Works 
gstat:::predict.gstat(variogram, newdata = spatial_points_working, nsim=10) # I get a warning here due to the fact that I had to modify the dataset a bit but I don't think this is important

# Crashes
gstat:::predict.gstat(variogram, newdata = spatial_points_problem, nsim=10)  # crashes also with various maxdist and nmax values

This is very weird...

@lea-c
Copy link
Author

lea-c commented May 29, 2024

Hi again,
I tried to go further in diagnosing the problem and identified the points that may be responsible for the error. Below is a figure where:

  • the points of spatial_points_working are in blue
  • the points of raster::intersect(spatial_points_problem, spatial_points_working) are in green
  • the remaining points of spatial_points_problem are in red
    -> It seems that the extent of blue points is the delimitator between "good" and "bad" points here, but why would it play a role, since:
  • the bounding box is the extent of the points which served to adjust the variogram: it contains all the points
    points_problematiques

The predict.gstat function runs fine with nsim>0 on the green points: however, it crashes as soon as a red point is added.

Also, I have tried to debug the function myself but I have not been able to run it outside the package, because of the C calls.

I hope this piece of additional information is helpful

Best regards

@edzer edzer added the bug label Jun 19, 2024
@edzer
Copy link
Member

edzer commented Jun 19, 2024

This might be a clue:

> zerodist2(spatial_points_problem, variogram$data$awc_0_30_m$data)
     [,1] [,2]
[1,]  519 1711
> zerodist2(spatial_points_working, variogram$data$awc_0_30_m$data)
     [,1] [,2]

@edzer
Copy link
Member

edzer commented Jun 19, 2024

g = gstat:::predict.gstat(variogram, newdata = spatial_points_problem[-519,], nsim=10)

works!

@edzer edzer closed this as completed Jun 25, 2024
@lea-c
Copy link
Author

lea-c commented Jun 26, 2024

Sorry for my late reply. It works, thank you so much for your time and effort !

@edzer
Copy link
Member

edzer commented Jul 9, 2024

Thanks; note that this is a work-around, the bug is still present.

@edzer edzer reopened this Jul 9, 2024
@lea-c
Copy link
Author

lea-c commented Jul 11, 2024

Yes, I've noticed: for now I just removed tens of points from my dataset (larger than the one I sent to you). I would be curious to understand what happens.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants