raster_sampling

Compilation of resources and inter-comparison of raster sampling strategies

Introduction and Motivation

Sampling a raster or ndarray at points is a common, basic need, but there is surprisingly little discussion or intercomparison of available options.

This is such an important topic, and the scientific literature is full of errors related to a lack of expertise when sampling (i.e. using nearest neighbor for large grid cells). Most new users will go with rasterio sample function, without considering consequences.

There is value in general implementation of different approaches in a centralized library, potentially with different wrappers depending on input object (DataFrame, numpy arrays, xarray DataSet). This could be installed as a dependency and used for different applications.

Goals for sprint

Continue compiling notes/resources on available approaches
Discuss and prepare test data
Design intercomparison function/notebook to document spread of values/results when using different methods
Discuss potential integration with existing efforts (demquery, icepyx, rioxarray)

Existing approaches

rasterio sample for 2D raster
- @dshean started compiling raster sampling code in the IS2 Hackweek tutorial
- Appendix A of mega-notebook: https://github.com/ICESAT-2HackWeek/2020_ICESat-2_Hackweek_Tutorials/blob/master/05.Geospatial_Analysis/shean_ICESat-2_hackweek_tutorial_GeospatialAnalysis_rendered.ipynb.
Zonal statistics (local window or circular region)
- See @dshean tutorial above for some references for local window (and circular) sampling
  - https://github.com/dshean/demcoreg/blob/master/demcoreg/sample_raster_at_pts.py
  - https://github.com/dshean/pygeotools/blob/master/pygeotools/lib/geolib.py#L1019
- rasterstats - buffer around points
- Simple function to implement sampling using a buffer distance and zonal statistics
- regionmask: https://regionmask.readthedocs.io/en/stable/ (suggested by Emilio)
  - https://github.com/waterhackweek/waterdata
  - https://github.com/waterhackweek/waterdata/blob/master/mashup_waterbudget.ipynb
scipy.ndimage.map_coordinates
scipy interpolation options (for a local window of raster pixels)
demquery https://github.com/kylebarron/demquery (thanks to @scottyhq!)
xarray interpolation and sampling (http://xarray.pydata.org/en/stable/interpolation.html#advanced-interpolation)
- Example for GeoDataFrame of points points (creates new dimension "z" to store output):
```
import xarray, rioxarray, geopandas

#Open raster dataset to sample
dem = rioxarray.open_rasterio("dem.tif")

#Open point vector dataset and reproject
gdf = geopandas.read_file("points.geojson").to_crs(dem.rio.crs)

#Add z dimension to store samples
x = xarray.DataArray(gdf.geometry.x.values, dims=“z”)
y = xarray.DataArray(gdf.geometry.y.values, dims=“z”)

#Sample the raster
samples = dem.interp(x=x, y=y) #default method is linear

#Add to the geodataframe
gdf[‘dem’] = samples.values

#Compute difference with values in another column
gdf[‘h_mean diff’] = gdf[‘h_mean’] - gdf[‘dem’]
```
- Great discussion on these topics (and cross-referenced issues/PRs): pydata/xarray#475
- @scottyhq provided the following for xarray interpolation:
- https://github.com/ICESAT-2HackWeek/pangeo-demo
  - Note that xarray is using numpy behind the scenes, and in this case scipy.interpolate http://xarray.pydata.org/en/stable/generated/xarray.DataArray.interp.html.
  - Note that you can get rendered versions of hvplot if the repo is public and you use an nbviewer link (https://nbviewer.jupyter.org/github/ICESAT-2HackWeek/pangeo-demo/blob/master/atl06-sample-rendered.ipynb)
- Official doc (could use some improvement - another potential task): http://xarray.pydata.org/en/stable/interpolation.html#advanced-interpolation
- Thread from pangeo and IS2 Hackweek Slack: "you can change x and y from lists to DataArrays with a new dimension (‘z’) e.g. x = xr.DataArray([x1, x2, x3], dims='z')"
pangeo-pyinterp: https://github.com/CNES/pangeo-pyinterp
Custom implementations in existing tools from @SmithB and @tsutterley
Build bilinear equations in lsq framework, then solve: https://github.com/SmithB/LSsurf
https://geoviews.org/user_guide/Resampling_Grids.html (suggested by Friedrich)
https://github.com/OGGM/oggm
https://pyresample.readthedocs.io/en/latest/
https://docs.generic-mapping-tools.org/dev/grdtrack.html

Scaling Considerations

scipy ndinterp - create interpolator once, use multiple times
@tsutterley has implemention for MPI tile-based loading/sampling, using scipy splines:
- https://github.com/tsutterley/read-ICESat-2/blob/master/scripts/MPI_DEM_ICESat2_ATL06.py
- challenges around parallel HDF5, https://docs.h5py.org/en/stable/mpi.html

Issues

Handling of raster nodata/nan - many algorithms will propagate nans differently
Performance for large point arrays, large rasters
Input raster cell dimensions vs point sampling density
Handling of different raster data types (e.g., only use nearest for classified rasters)
xarray - great for ndarrays from GCMs, most efficient ways to chunk large point coordinate arrays, perform operations

How to choose?

Functions to assess input raster properties (e.g., classified vs. continuous, large gradients, spatial frequency content) and point properties
Relative point density for points and raster
Potentially add some logic to select best method based on these metrics

Method Comparison

For same input points and raster/ndarray, extract values from all samples
No "truth", but can look at various statistics
ATL06 Already has DEM attributes in ATL06, ArcticDEM/REMA (reverts to GDEM) - details on how sampling is performed unknown (Jeff Lee)

Candidate use cases

Simple use case: starting with point coordinates in NumPy arrays, raster DEM as GeoTiff
More advanced use case: GCM reanalysis data, xyzt point coordinates

User Guide and Documentation

Provide general recommendations based on typical scenarios
Document case where different methods yield very different results

Next steps

Review resources listed above, compile notes
Start creating notebooks with snippet for approach or specific problem, links to documentation

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
friedrich		friedrich
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

raster_sampling

Introduction and Motivation

Goals for sprint

Existing approaches

Scaling Considerations

Issues

How to choose?

Method Comparison

Candidate use cases

User Guide and Documentation

Next steps

About

Releases

Packages

Contributors 3

Languages

License

uw-cryo/raster_sampling

Folders and files

Latest commit

History

Repository files navigation

raster_sampling

Introduction and Motivation

Goals for sprint

Existing approaches

Scaling Considerations

Issues

How to choose?

Method Comparison

Candidate use cases

User Guide and Documentation

Next steps

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages