Attempt at using pyKrige.ok3d with xarray.apply_ufunc to reduce memory issues #285

MDTocean · 2024-01-08T17:12:42Z

MDTocean
Jan 8, 2024

Hiya

I am attempting to use OrdinaryKriging3D with a large set of temperature measurements. These are scattered measurements taken at (longitude,latitude,time) positions, resulting in four very long 1D vector strings. I would then like to interpolate this onto a regular 3D grid of (lon_reg,lat_reg,time_reg) using 3D kriging. This appears to require huge amounts of memory, requiring an array of size len(lon_reg)*len(lat_reg)*len(time_reg)*len(temperature).

A number of other posts have reported on memory issues. To try and get around the large memory requirements, I have been attempting to treat the data arrays using lazy operations and calling the pyKrige commands from within xarray's apply_ufunc functionality. However, I am still running into memory issues.

I have pasted some example code below using randomised data. Can anyone comment on whether I am making a mistake, either in the data creation or in the apply_ufunc call, or whether the problem is more fundamental to the kriging operation itself -- for example, are kriging methods simply incompatible with the concept of dask chunks and lazy operations? The code works on small datasets (e.g. if sample_length is set to 300), but I continue to run out of memory for longer sample lengths (e.g. of about 2000).

Note that I have also tried running the code below on a dask cluster connected to multiple CPUs, but it didn't solve the memory problems.

Thank you for any advice.

import numpy as np
import xarray as xr
from pykrige.ok3d import OrdinaryKriging3D

# create random datasets
sample_length=2000
da_time=xr.DataArray(data=np.arange(0,sample_length),coords=dict(time=np.arange(0,sample_length))).chunk(chunks={"time" : 100})
da_lat=xr.DataArray(data=np.random.uniform(-12, 3, size=sample_length),coords=dict(time=da_time)).chunk(chunks={"time" : 100})
da_lon=xr.DataArray(data=np.random.uniform(45, 62, size=sample_length),coords=dict(time=da_time)).chunk(chunks={"time" : 100})
da_temp=xr.DataArray(data=np.random.rand(sample_length)+18,coords=dict(time=da_time)).chunk(chunks={"time" : 100})

# Define the function to apply to the dataset
def kriging_3d(da_lon,da_lat,da_time,da_temp):

    xi = np.linspace(-12, 3, 91)
    yi = np.linspace(45, 62, 103)
    zi = np.arange(np.min(da_time),np.max(da_time))

    # Create the 3D Kriging object
    OK3D = OrdinaryKriging3D(da_lon, da_lat, da_time, da_temp, variogram_model='linear')
    # Execute on grid
    out, ss = OK3D.execute('grid', xi, yi, zi)

    # convert the output into an xarray object
    out = xr.DataArray(out, coords=[("zi", zi), ("yi", yi), ("xi", xi)])

    return out

out = xr.apply_ufunc(
    kriging_3d,
    da_lon,da_lat,da_time,da_temp,
    input_core_dims=[["time"],["time"],["time"],["time"]],
    output_core_dims=[["zi", "yi", "xi"]],
    dask = 'allowed', 
    vectorize = True,
    )

MuellerSeb · 2024-01-15T09:25:01Z

MuellerSeb
Jan 15, 2024
Maintainer

Hey there,

the memory issue is of course always a problem with kriging on large data sets. Some thoughts to optimize:

use the moving window approach (specify n_closest_points in execute)
if the spatial coordinates don't change over time, you could use 2D kriging and iterate over the time dimension and just change the value at each location with (copied from OrdinaryKriging.__init__):

OK.Z = np.atleast_1d(np.squeeze(np.array(z, copy=True, dtype=np.float64)))

where z is the array of current values. This could also be combined with the moving window approach.

Hope this helps.
Sebastian

1 reply

MDTocean Jan 16, 2024
Author

Hi Sebastian

Thank you for the suggestions. I will run some tests using the n_closest_points moving window approach and see if that alleviates the problems I am facing.

Could you please elaborate on the second method, though? Which spatial dimensions are you referring to, the scattered data or the regular grid? If the latter, then that will indeed be fixed in time. However, I would like for the kriging to account for temporal variations as well as spatial variations: would that be consistent with your second method?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Attempt at using pyKrige.ok3d with xarray.apply_ufunc to reduce memory issues #285

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 1 reply

{{title}}

{{title}}

Select a reply

Attempt at using pyKrige.ok3d with xarray.apply_ufunc to reduce memory issues #285

MDTocean Jan 8, 2024

Replies: 1 comment · 1 reply

MuellerSeb Jan 15, 2024 Maintainer

MDTocean Jan 16, 2024 Author

MDTocean
Jan 8, 2024

Replies: 1 comment 1 reply

MuellerSeb
Jan 15, 2024
Maintainer

MDTocean Jan 16, 2024
Author