Skip to content

Optimal interpolation

Thomas Nipen edited this page Oct 14, 2022 · 57 revisions

Optimal interpolation (OI) can be used to create gridded analysis by combining a gridded background field (e.g. from an NWP model) with observations at points. OI weights the background and the observations based on the uncertainty of each data source.

Gridpp's OI method does not perform any quality control (QC) on the observations. Check out Titanlib, which contains suitable QC functions for this purpose. For a complete step-by-step guide on creating gridded analysis with titanlib and gridpp, checkout https://tnipen.github.io/2020/06/15/titanlib-gridpp.html.

The OI function signature looks like this:

gridpp.optimal_interpolation(bgrid, background, points, pobs, pratios,
   pbackground, structure, max_points) 

where bgrid is the background grid, background is the 2D background field, points is the locations of the observations, pobs is a vector of observations, pratios is ratio of observation error variance to background error variance, pbackground is the background interpolated to the observation points, structure is a structure function object, and max_points specifies the maximum number of observations to use for a gridpoint.

Structure function

The structure function specifies how analysis increments are spread spatially. Gridpp supports several structure functions, but typically a gaussian function is used (Barnes 1973):

structure = gridpp.BarnesStructure(100000, 200)

The first argument is the horizontal decorrelation length scale (in meters) and the second is the vertical decorrelation length scale (in meters). Optionally, a third argument specifying the decorrelation length across land area fraction (units 1), and a fourth argumnet specifying the maximum length that an observations will have an effect (in meters, also called the localization radius). If the third is unspecified, land area fractions are ignored in the structure function., If the fourth is unspecified, 3.64 times the horizontal scale is used.

A Cressman structure function is also available, where correlations decrease linearly to 0 at the length scales:

structure = gridpp.CressmanStructure(100000, 200)

Spatially varying structure function

The length scales in the Barnes structure function defined above is static. HOwever in some cases it can be useful to allow the decorrelation lengths to vary with the station density. The BarnesStructure class can also be initialized with a 2D field of decorrelation lengths. The grid you chose to compute the lengths for does not have to be the same grid you will use in the OI. The OI function will use the nearest neighbour in the decorrelation length grid when determining what decorrelation length to use.

h = gridpp.distance(points, grid, 5)
v = 200 * np.ones(h.shape)
w = np.zeros(h.shape)
structure = gridpp.BarnesStructure(grid, h, v, w)

where points are a Points object with the observations, and grid is the desired grid to compute the decorrelation lengths for. In this case, h is the distance to the 5th nearest observation point.

Using a spatially varying structure function is tricky, because it often creates artifacts in the final OI analysis. This is especially true if the decorrelation lengths vary quickly in space. Consider to smooth the field, for example using a neighbourhood filter:

h = gridpp.neighbourhood(h, 5, gridpp.Mean)

Observation operator

Applying the observation operator to the background field results in a vector of values at the observation points. There are several methods to do this, but a typical one is nearest neighbour.

pbackground = gridpp.nearest(bgrid, points, background)

Any of gridpp's downscalers can be used for this, or the users can compute the values using their own preferred method.

Error variances

OI only requires the ratio of the error variance of the observations and background. Since observations are more trustworth than the background, values less than 1 are typically used. The error variance ratios are passed in as vectors, meaning that different observations can be assigned diferent ratios.

Maximum locations

The computation time for OI is dependent on how many observations are available within the localization radius. If the domain has areas with very high density of stations, the calculations in this region can dominate the total computation time. max_points limits the number of observations that can be used for any particular gridpoint. If more observations are available, only the max_points stations with the highest correlation are used.

Example

pbackground = gridpp.bilinear(igrid, points, temp_analysis[:, :, 0])
pratios = np.full(points.size(), 0.1)
h = 100000
v = 200
structure = gridpp.BarnesStructure(h, v)
max_points = 50
analysis = gridpp.optimal_interpolation(igrid, temp_analysis[:, :, 0], points,
        temp_obs, pratios, pbackground, structure, max_points)

Transformation (experimental)

OI can also be applied to transformed variables, by using the gridpp.optimal_interpolation_transform function. This requires a transform object as well as the specification of the background and observation standard errors (i.e., instead of the error variance ratio as in the untransformed OI). Currently a Box-Cox and logarithmic transformations are available:

transform = gridpp.BoxCox(0.1)
transform = gridpp.Log()

The Box-Cox transformation takes an input argument that specifies the exponent in the transformation.

Here is an example:

threshold = 0.1
transform = gridpp.BoxCox(threshold)
bsigma = 2
psigma = np.full(points.size(), 0.5)
analysis = gridpp.optimal_interpolation_transform(igrid, temp_analysis[:, :, 0], bsigma, points,
        temp_obs, psigma, pbackground, structure, max_points, transform)

Ensemble mode (experimental)

Gridpp also supports an Ensemble-based Statistical Interpolation (EnSI; Lussana et al. 2019) scheme that uses spatial structure information from an ensemble of NWP model runs. The gridpp.optimal_interpolation_ensi function can be used for this. It takes a 3D input field (with the ensemble dimension being last) and produces a 3D analysis field. Here, only the observation standard error is needed, since the error variance of the background is deduced from the spread of the ensemble.

pbackground = np.zeros(temp_analysis.shape)
for e in range(temp_analysis.shape[2]):
      pbackground[:, :, e] = gridpp.bilinear(igrid, points, temp_analysis[:, :, e])
psigma = np.full(points.size(), 0.5)
analysis = gridpp.optimal_interpolation_ensi(igrid, temp_analysis, points,
        temp_obs, psigma, pbackground, structure, max_points)

Cross validation experiments

In verification experiements, it is unecessary to run the OI on a full high resolution grid. INstead one wishes to only output the analysis at points where the analysis will be evaluated at. Gridpp provides OI functions that perform OI only for a set of output points.

analysis = gridpp.optimal_interpolation(points, pbackground, points,
        temp_obs, pratios, pbackground, structure, max_points)

Such a setup will generally produce very accurate analyses at the observation points since the same observations have been used as input. Gridpp can compute the analysis in a "leave-one-out" cross-validation fashion. For each output point, the observation at that point is left out of the interpolation. This can be done by defining a cross-validation structure function as follows:

structure = gridpp.Barnes
structure_cv = gridpp.CrossValidation(structure, 2000)

The structure_cv object can then be passed into the OI as before. This will force the correlation of all observations within a 2000 m radius of an output point to be 0, effectively discarding the observation. This approach is highly efficient for finding the optimal parameters in the OI since the computational cost is very low.

Clone this wiki locally