Skip to content

Commit

Permalink
Update documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
e-marshall committed Feb 15, 2024
1 parent ffd52ac commit 5af30b1
Show file tree
Hide file tree
Showing 12 changed files with 1,245 additions and 1,142 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
13 changes: 11 additions & 2 deletions _sources/accessing_s3_data.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,17 @@
"\n",
"This notebook will demonstrate how to access cloud-hosted Inter-mission Time Series of Land Ice Velocity and Elevation ([ITS_LIVE](https://its-live.jpl.nasa.gov/#access)) data from AWS S3 buckets. Here you will find examples of how to successfully access cloud-hosted data as well as some common errors and issues you may run into along the way, what they mean, and how to resolve them. \n",
"\n",
"*Learning goals:*\n",
"- accessing data stored in S3 buckets\n",
"## Learning goals\n",
"\n",
"### Conceptual\n",
"- Query and access cloud-optimized dataset from cloud object storage\n",
"- Create a vector data object representing the footprint of a raster dataset\n",
"- Preliminary visualization of data extent\n",
" \n",
"### Techniques\n",
"- Use [Xarray](https://xarray.dev/) to open [Zarr](https://zarr.readthedocs.io/en/stable/) datacube stored in [AWS S3 bucket](https://aws.amazon.com/s3/)\n",
"- Interactive data visualization with [hvplot](https://hvplot.holoviz.org/)\n",
"- Create [Geopandas](https://geopandas.org/en/stable/) `geodataframe` from Xarray `xr.Dataset` object\n",
"\n",
"```{note}\n",
"This tutorial was updated Fall 2023 to reflect changes to ITS_LIVE data urls and various software libraries\n",
Expand Down
122 changes: 55 additions & 67 deletions _sources/glacier_analysis_grouped.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,23 @@
"\n",
"The previous notebook demonstrated using xarray to analyze surface velocity data for an individual glacier. This notebook will show how we can examine spatial variability in surface velocity within a group of glaciers. To do this we will use **xarray** as well as **geopandas**, **geocube**, and **pandas**. We will start by using `.make_geocube()` to rasterize a vector object in the shape of an **ITS_LIVE** velocity raster object. We will then use the rasterized vector to group the **ITS_LIVE** object by individual glaciers and then calculate summary statistics of surface velocity for each glacier. The goal in this work flow is to end up with a **pandas dataframe** where each row is an individual glacier and columns for various surface velocity summary statistics. \n",
"\n",
"*Learning goals:*\n",
"- rasterizing vector data\n",
"- organizing and re-arranging data with xarray\n",
"- `.groupby()` for zonal statistics\n",
"- converting from xarray to pandas"
"## Learning goals\n",
"\n",
"### Concepts\n",
"- Querying + accessing raster data from cloud object storage\n",
"- Accessing + manipulating vector data\n",
"- Handling coordinate reference information\n",
"- Calculating and visualizing summary statistics\n",
"\n",
"### Techniques \n",
"- Access cloud-hosted [Zarr](https://zarr.readthedocs.io/en/stable/) datacubes using [Xarray](https://xarray.dev/)\n",
"- Reading [GeoParquet](https://geoparquet.org/) vector data using [GeoPandas](https://geopandas.org/en/stable/)\n",
"- Rasterize vector objects using [`make_geocube()`](https://corteva.github.io/geocube/html/geocube.html)\n",
"- Spatial joins of vector datasets using [GeoPandas](https://geopandas.org/en/stable/)\n",
"- Using [dask](https://www.dask.org/) to work with out-of-memory data\n",
"- Calculating summary statistics of [Xarray](https://xarray.dev/) and [Pandas](https://pandas.pydata.org/) data objects\n",
"- Data visualization using [Pandas](https://pandas.pydata.org/)\n",
"- Interactive data visualization with [GeoPandas](https://geopandas.org/en/stable/) \n"
]
},
{
Expand Down Expand Up @@ -638,7 +650,7 @@
"id": "20a61c98-901e-425d-ad6c-4627b92bf6df",
"metadata": {},
"source": [
"## Accessing ITS_LIVE data"
"## Accessing + reading in raster data (ITS_LIVE velocity data)"
]
},
{
Expand Down Expand Up @@ -5509,7 +5521,7 @@
"id": "9100f674",
"metadata": {},
"source": [
"## Vector data "
"## Reading in vector data (glacier outlines)"
]
},
{
Expand Down Expand Up @@ -6245,50 +6257,6 @@
"First, get the bbox of the ITS_LIVE data as a vector"
]
},
{
"cell_type": "code",
"execution_count": 17,
"id": "9d1e1e5a-b7d0-4819-bed9-8673e0ebc132",
"metadata": {},
"outputs": [],
"source": [
"#def get_bbox_single(input_xr, epsg = input_xr.projection):\n",
"def get_bbox_single(input_xr, epsg):\n",
"\n",
" '''Takes input xr object (from itslive data cube), plots a quick map of the footprint. \n",
" currently only working for granules in crs epsg 32645'''\n",
"\n",
" xmin = input_xr.coords['x'].data.min()\n",
" xmax = input_xr.coords['x'].data.max()\n",
"\n",
" ymin = input_xr.coords['y'].data.min()\n",
" ymax = input_xr.coords['y'].data.max()\n",
"\n",
" pts_ls = [(xmin, ymin), (xmax, ymin),(xmax, ymax), (xmin, ymax), (xmin, ymin)]\n",
"\n",
" #print(input_xr.mapping.spatial_epsg)\n",
" #print(f\"epsg:{input_xr.mapping.spatial_epsg}\")\n",
" crs = epsg\n",
" #crs = {'init':f'epsg:{input_xr.mapping.spatial_epsg}'}\n",
" #crs = 'epsg:32645'\n",
" #print(crs)\n",
"\n",
" polygon_geom = Polygon(pts_ls)\n",
" polygon = gpd.GeoDataFrame(index=[0], crs=crs, geometry=[polygon_geom]) \n",
" #polygon = polygon.to_crs('epsg:4326')\n",
"\n",
" #bounds = polygon.total_bounds\n",
" #bounds_format = [bounds[0]-15, bounds[2]+15, bounds[1]-15, bounds[3]+15]\n",
"\n",
" #states_provinces = cfeature.NaturalEarthFeature(\n",
" # category = 'cultural',\n",
" # name = 'admin_1_states_provinces_lines',\n",
" # scale='50m',\n",
" # facecolor='none'\n",
" #)\n",
" return polygon"
]
},
{
"cell_type": "code",
"execution_count": 18,
Expand Down Expand Up @@ -6338,7 +6306,7 @@
"id": "f01bcd9f-293a-495e-8b5e-81701154695e",
"metadata": {},
"source": [
"## Rasterize vector"
"## Rasterize vector objects"
]
},
{
Expand Down Expand Up @@ -7145,20 +7113,6 @@
"Before moving forward, we will take a temporal subset of the full dataset to make it a bit easier to work with. Then, merge the rasterized vector and the dataset containing the velocity data into an xarray dataset:"
]
},
{
"cell_type": "code",
"execution_count": 27,
"id": "8441d13e-0755-4db5-9fd8-4af73696ea65",
"metadata": {},
"outputs": [],
"source": [
"#resample_obj = dc.resample(mid_date = '1Y')\n",
"#dc_resample = resample_obj.mean(dim='mid_date')\n",
"#dc_resample_2d = dc_resample.mean(dim='mid_date')\n",
"#dc_resample['v_mag'] = np.sqrt(dc_resample_2d.vx**2 + dc_resample_2d.vy**2)\n",
"#out_grid['v'] = (dc_resample.v_mag.dims, dc_resample.v_mag.values, dc_resample.v_mag.attrs, dc_resample.v_mag.encoding)\n"
]
},
{
"cell_type": "code",
"execution_count": 28,
Expand Down Expand Up @@ -7189,6 +7143,14 @@
"dc_sub_2d['v_mag'] = np.sqrt(dc_sub_2d.vx**2+dc_sub_2d.vy**2)"
]
},
{
"cell_type": "markdown",
"id": "fbd469c0-302e-4937-969a-2fbb8f436279",
"metadata": {},
"source": [
"## Combining data"
]
},
{
"cell_type": "markdown",
"id": "dbb22577-9c3c-4a22-8b1c-7bb11311923b",
Expand All @@ -7199,6 +7161,14 @@
"```"
]
},
{
"cell_type": "markdown",
"id": "55b8da67-491a-47ba-9fa8-34002d15a1fe",
"metadata": {},
"source": [
"In the cell below, we are adding a new variable (`v`) to the `out_grid` object that is an Xarray Dataset. You can see that we do this by specifying a tuple that contains the different elements of the data variable we'd like to add: dimensions, values, attributes and additional encoding data. "
]
},
{
"cell_type": "code",
"execution_count": 31,
Expand Down Expand Up @@ -7659,6 +7629,14 @@
"out_grid = out_grid.assign_coords({'RGI_int':out_grid.RGI_int})"
]
},
{
"cell_type": "markdown",
"id": "1e497f11-e6ad-41c5-bec9-2b227f5d0c19",
"metadata": {},
"source": [
"### Grouping by RGI ID"
]
},
{
"cell_type": "markdown",
"id": "95c4df21-0fc9-44a9-ac98-dbe79c9d1e2f",
Expand Down Expand Up @@ -7690,7 +7668,9 @@
"id": "4e2ca92b-1dfc-47d3-a169-db766d35686c",
"metadata": {},
"source": [
"and compute summary statistics for a single variable on the grouped object:"
"### Calculating summary statistics\n",
"\n",
"And compute summary statistics for a single variable on the grouped object:"
]
},
{
Expand Down Expand Up @@ -7739,6 +7719,14 @@
"print(grid_min_sp.RGI_int.equals(grid_min_sp.RGI_int))"
]
},
{
"cell_type": "markdown",
"id": "9d82ca8b-d95c-4890-afbd-9e5d544772b8",
"metadata": {},
"source": [
"### Transitioning from 'lazy' operations to in-memory "
]
},
{
"cell_type": "markdown",
"id": "bc331b29-8f7e-47c5-94cf-3be465137344",
Expand Down
38 changes: 31 additions & 7 deletions _sources/ind_glacier_analysis.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,19 @@
"\n",
"This notebook will build upon the data access and inspection steps in the earlier notebooks and demonstrate basic data analysis and visualization of surface velocity data at the scale of an individual glacier using xarray. \n",
"\n",
"*Learning goals*: \n",
"- using xarray label-based indexing and selection tools\n",
"- computation and grouped computation\n",
"- visualization"
"## Learning goals\n",
"\n",
"### Conceptual\n",
"- Visualizing statistical distributions\n",
"- Vectorized calculation of summary statistics over large dimensions\n",
"- Working with velocity component vectors\n",
"- Calculating magnitude of displacement from velocity component vectors\n",
"### Techniques\n",
"- Using [`xr.reduce()`](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.reduce.html) to apply [scipy statistical functions](https://docs.scipy.org/doc/scipy/reference/stats.html) designed to ingest numpy arrays to Xarray objects\n",
"- Re-organize Xarray objects using [`xr.sortby()`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.sortby.html)\n",
"- Temporal resampling using [`xr.resample()`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.resample.html)\n",
"- Vectorized computation and reductions using [`xr.groupby()`](https://docs.xarray.dev/en/stable/user-guide/groupby.html) and [`xr.map()`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.map.html)\n",
"- Visualization using [`FacetGrid`](https://docs.xarray.dev/en/latest/generated/xarray.plot.FacetGrid.html) objects\n"
]
},
{
Expand Down Expand Up @@ -1123,6 +1132,13 @@
"sample_glacier_raster"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Visualize and examine distributions of different variables"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -1216,7 +1232,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Calculating magnitude of velocity and inspecting the distribution of velocity data in space and time\n",
"## Calculating magnitude of velocity \n",
"\n",
"We'll first define a function for calculating magnitude of velocity in two ways. Because we want to calculate the magnitude of the displacement vector after we have already reduced the data along a dimension, we write a function that creates two magnitude of velocity variables, one where magnitude is calculated from the means of the vx and vy vectors in space and one where the magnitude is calculated from the medians of the vx and vy vectors in time. We just need to be careful which variable we use. \n",
"\n"
Expand Down Expand Up @@ -1248,6 +1264,13 @@
"sample_glacier_raster_mag = calc_velocity_magnitude(sample_glacier_raster)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Visualize velocity variability "
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -1331,6 +1354,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Checking coverage\n",
"Now that we've reduced the dataset, we can look at the coverage of the magnitude variables using xarray methods.\n",
"First, we want to know how many observations (not NaNs) exist along the time dimension of `v_mag_time`. We can use `xr.DataArray.count()`:"
]
Expand Down Expand Up @@ -1516,7 +1540,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Resampling"
"## Temporal resampling"
]
},
{
Expand Down Expand Up @@ -2805,7 +2829,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Calculating velocity anomalies\n",
"## Calculating velocity anomalies\n",
"\n",
"To do this, we will use xarray `.groupby()` and `.map()` \n",
"\n",
Expand Down
44 changes: 28 additions & 16 deletions _sources/ind_glacier_data_inspection.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -12,24 +12,27 @@
"\n",
"## Learning goals\n",
"\n",
"### Techniques (xarray, python, general)\n",
"### Conceptual:\n",
"\n",
"- Subset large raster dataset to area of interest using vector data\n",
"- Examine different types of data stored within a raster object (in this example, the data is in the format of a [zarr datacube](https://zarr.readthedocs.io/en/stable/getting_started.html)\n",
"- Handling different coordinate reference systems and projections\n",
"- Dataset inspection using: \n",
" * Xarray label- and index-based selection \n",
" * Grouped computations and reductions \n",
" * Visualization tools\n",
"- Dimensional reductions and computations\n",
"- Examining velocity component vectors\n",
"- Calculating the magnitude of the displacement vector from velocity component vectors\n",
" \n",
"### Techniques \n",
"\n",
"- Read in raster data using `xarray`\n",
"- Read in vector data using `geopandas`\n",
"- Manipulate and organize raster data using `xarray` functionality\n",
"- Explore larger-than-memory data using `dask` and `xarray`\n",
"\n",
"\n",
"### High-level science goals:\n",
"\n",
"- Subset large raster dataset to area of interest using vector data\n",
"- Examine different types of data stored within a raster object (in this example, the data is in the format of a [zarr datacube](https://zarr.readthedocs.io/en/stable/getting_started.html)\n",
"- Handling different coordinate reference systems and projections\n",
"- Dataset inspection using:\n",
" - Xarray label- and index-based selection\n",
" - Grouped computations and reductions\n",
" - Visualization tools\n",
" \n",
"- Troubleshooting errors and warnings\n",
"- Visualizing Xarray data using [FacetGrid](https://docs.xarray.dev/en/latest/generated/xarray.plot.FacetGrid.html) objects\n",
"\n",
"\n",
"\n",
Expand Down Expand Up @@ -106,7 +109,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Reading in raster dataset (ITS_LIVE ice velocity data)"
"## Reading in ITS_LIVE ice velocity dataset (raster data)"
]
},
{
Expand Down Expand Up @@ -10449,7 +10452,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"## Incorporating vector data \n",
"## Incorporating glacier outlines (vector data )\n",
"\n",
"Vector data represent discrete features. They contain geometry data as well as attribute data about the features. For a more in-depth description of vector data, read [this](https://datacarpentry.org/organization-geospatial/02-intro-vector-data.html). We will use vector data to focus our analysis on specific glaciers. The dataset we will be usign is called the Randolph Glacier Inventory (RGI). It is a very important dataset for glaciology research; you can read more about it [here](http://www.glims.org/rgi_user_guide/welcome.html).\n",
"\n",
Expand All @@ -10468,6 +10471,13 @@
"se_asia = gpd.read_parquet('rgi7_region15_south_asia_east.parquet')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Handling projections"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand Down Expand Up @@ -11982,7 +11992,9 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"### Clip ITS_LIVE dataset to individual glacier extent\n",
"## Clip raster data using vector object \n",
"\n",
"Here we will subset the full ITS_LIVE granule to the extent of an individual glacier.\n",
"\n",
"First, we need to use `rio.write_crs()` to assign a CRS to the itslive object. If we don't do that first the `rio.clip()` command will produce an error\n",
"\n",
Expand Down
Loading

0 comments on commit 5af30b1

Please sign in to comment.