Update documentation

e-marshall · Feb 15, 2024 · 5af30b1 · 5af30b1
1 parent ffd52ac
commit 5af30b1
Show file tree

Hide file tree

Showing 12 changed files with 1,245 additions and 1,142 deletions.
diff --git a/_images/8a96b33780f94af8595633664f147ea7e788a4d687d473592154bdef531793de.png b/_images/8a96b33780f94af8595633664f147ea7e788a4d687d473592154bdef531793de.png
diff --git a/_sources/accessing_s3_data.ipynb b/_sources/accessing_s3_data.ipynb
@@ -8,8 +8,17 @@
     "\n",
     "This notebook will demonstrate how to access cloud-hosted Inter-mission Time Series of Land Ice Velocity and Elevation ([ITS_LIVE](https://its-live.jpl.nasa.gov/#access)) data from AWS S3 buckets. Here you will find examples of how to successfully access cloud-hosted data as well as some common errors and issues you may run into along the way, what they mean, and how to resolve them. \n",
     "\n",
-    "*Learning goals:*\n",
-    "- accessing data stored in S3 buckets\n",
+    "## Learning goals\n",
+    "\n",
+    "### Conceptual\n",
+    "- Query and access cloud-optimized dataset from cloud object storage\n",
+    "- Create a vector data object representing the footprint of a raster dataset\n",
+    "- Preliminary visualization of data extent\n",
+    "  \n",
+    "### Techniques\n",
+    "- Use [Xarray](https://xarray.dev/) to open [Zarr](https://zarr.readthedocs.io/en/stable/) datacube stored in [AWS S3 bucket](https://aws.amazon.com/s3/)\n",
+    "- Interactive data visualization with [hvplot](https://hvplot.holoviz.org/)\n",
+    "- Create [Geopandas](https://geopandas.org/en/stable/) `geodataframe` from Xarray `xr.Dataset` object\n",
     "\n",
     "```{note}\n",
     "This tutorial was updated Fall 2023 to reflect changes to ITS_LIVE data urls and various software libraries\n",

diff --git a/_sources/glacier_analysis_grouped.ipynb b/_sources/glacier_analysis_grouped.ipynb
@@ -10,11 +10,23 @@
     "\n",
     "The previous notebook demonstrated using xarray to analyze surface velocity data for an individual glacier. This notebook will show how we can examine spatial variability in surface velocity within a group of glaciers. To do this we will use **xarray** as well as **geopandas**, **geocube**, and **pandas**. We will start by using `.make_geocube()` to rasterize a vector object in the shape of an **ITS_LIVE** velocity raster object. We will then use the rasterized vector to group the **ITS_LIVE** object by individual glaciers and then calculate summary statistics of surface velocity for each glacier. The goal in this work flow is to end up with a **pandas dataframe** where each row is an individual glacier and columns for various surface velocity summary statistics. \n",
     "\n",
-    "*Learning goals:*\n",
-    "- rasterizing vector data\n",
-    "- organizing and re-arranging data with xarray\n",
-    "- `.groupby()` for zonal statistics\n",
-    "- converting from xarray to pandas"
+    "## Learning goals\n",
+    "\n",
+    "### Concepts\n",
+    "- Querying + accessing raster data from cloud object storage\n",
+    "- Accessing + manipulating vector data\n",
+    "- Handling coordinate reference information\n",
+    "- Calculating and visualizing summary statistics\n",
+    "\n",
+    "### Techniques \n",
+    "- Access cloud-hosted [Zarr](https://zarr.readthedocs.io/en/stable/) datacubes using [Xarray](https://xarray.dev/)\n",
+    "- Reading [GeoParquet](https://geoparquet.org/) vector data using [GeoPandas](https://geopandas.org/en/stable/)\n",
+    "- Rasterize vector objects using [`make_geocube()`](https://corteva.github.io/geocube/html/geocube.html)\n",
+    "- Spatial joins of vector datasets using [GeoPandas](https://geopandas.org/en/stable/)\n",
+    "- Using [dask](https://www.dask.org/) to work with out-of-memory data\n",
+    "- Calculating summary statistics of [Xarray](https://xarray.dev/) and [Pandas](https://pandas.pydata.org/) data objects\n",
+    "- Data visualization using [Pandas](https://pandas.pydata.org/)\n",
+    "- Interactive data visualization with [GeoPandas](https://geopandas.org/en/stable/) \n"
    ]
   },
   {
@@ -638,7 +650,7 @@
    "id": "20a61c98-901e-425d-ad6c-4627b92bf6df",
    "metadata": {},
    "source": [
-    "## Accessing ITS_LIVE data"
+    "## Accessing + reading in raster data (ITS_LIVE velocity data)"
    ]
   },
   {
@@ -5509,7 +5521,7 @@
    "id": "9100f674",
    "metadata": {},
    "source": [
-    "## Vector data "
+    "## Reading in vector data (glacier outlines)"
    ]
   },
   {
@@ -6245,50 +6257,6 @@
     "First, get the bbox of the ITS_LIVE data as a vector"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": 17,
-   "id": "9d1e1e5a-b7d0-4819-bed9-8673e0ebc132",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "#def get_bbox_single(input_xr, epsg = input_xr.projection):\n",
-    "def get_bbox_single(input_xr, epsg):\n",
-    "\n",
-    "    '''Takes input xr object (from itslive data cube), plots a quick map of the footprint. \n",
-    "    currently only working for granules in crs epsg 32645'''\n",
-    "\n",
-    "    xmin = input_xr.coords['x'].data.min()\n",
-    "    xmax = input_xr.coords['x'].data.max()\n",
-    "\n",
-    "    ymin = input_xr.coords['y'].data.min()\n",
-    "    ymax = input_xr.coords['y'].data.max()\n",
-    "\n",
-    "    pts_ls = [(xmin, ymin), (xmax, ymin),(xmax, ymax), (xmin, ymax), (xmin, ymin)]\n",
-    "\n",
-    "    #print(input_xr.mapping.spatial_epsg)\n",
-    "    #print(f\"epsg:{input_xr.mapping.spatial_epsg}\")\n",
-    "    crs = epsg\n",
-    "    #crs = {'init':f'epsg:{input_xr.mapping.spatial_epsg}'}\n",
-    "    #crs = 'epsg:32645'\n",
-    "    #print(crs)\n",
-    "\n",
-    "    polygon_geom = Polygon(pts_ls)\n",
-    "    polygon = gpd.GeoDataFrame(index=[0], crs=crs, geometry=[polygon_geom]) \n",
-    "    #polygon = polygon.to_crs('epsg:4326')\n",
-    "\n",
-    "    #bounds = polygon.total_bounds\n",
-    "    #bounds_format = [bounds[0]-15, bounds[2]+15, bounds[1]-15, bounds[3]+15]\n",
-    "\n",
-    "    #states_provinces = cfeature.NaturalEarthFeature(\n",
-    "    #    category = 'cultural',\n",
-    "    #    name = 'admin_1_states_provinces_lines',\n",
-    "    #    scale='50m',\n",
-    "    #    facecolor='none'\n",
-    "    #)\n",
-    "    return polygon"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": 18,
@@ -6338,7 +6306,7 @@
    "id": "f01bcd9f-293a-495e-8b5e-81701154695e",
    "metadata": {},
    "source": [
-    "## Rasterize vector"
+    "## Rasterize vector objects"
    ]
   },
   {
@@ -7145,20 +7113,6 @@
     "Before moving forward, we will take a temporal subset of the full dataset to make it a bit easier to work with. Then, merge the rasterized vector and the dataset containing the velocity data into an xarray dataset:"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": 27,
-   "id": "8441d13e-0755-4db5-9fd8-4af73696ea65",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "#resample_obj = dc.resample(mid_date = '1Y')\n",
-    "#dc_resample = resample_obj.mean(dim='mid_date')\n",
-    "#dc_resample_2d = dc_resample.mean(dim='mid_date')\n",
-    "#dc_resample['v_mag'] = np.sqrt(dc_resample_2d.vx**2 + dc_resample_2d.vy**2)\n",
-    "#out_grid['v'] = (dc_resample.v_mag.dims, dc_resample.v_mag.values, dc_resample.v_mag.attrs, dc_resample.v_mag.encoding)\n"
-   ]
-  },
   {
    "cell_type": "code",
    "execution_count": 28,
@@ -7189,6 +7143,14 @@
     "dc_sub_2d['v_mag'] = np.sqrt(dc_sub_2d.vx**2+dc_sub_2d.vy**2)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "fbd469c0-302e-4937-969a-2fbb8f436279",
+   "metadata": {},
+   "source": [
+    "## Combining data"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "dbb22577-9c3c-4a22-8b1c-7bb11311923b",
@@ -7199,6 +7161,14 @@
     "```"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "55b8da67-491a-47ba-9fa8-34002d15a1fe",
+   "metadata": {},
+   "source": [
+    "In the cell below, we are adding a new variable (`v`) to the `out_grid` object that is an Xarray Dataset. You can see that we do this by specifying a tuple that contains the different elements of the data variable we'd like to add: dimensions, values, attributes and additional encoding data. "
+   ]
+  },
   {
    "cell_type": "code",
    "execution_count": 31,
@@ -7659,6 +7629,14 @@
     "out_grid = out_grid.assign_coords({'RGI_int':out_grid.RGI_int})"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "1e497f11-e6ad-41c5-bec9-2b227f5d0c19",
+   "metadata": {},
+   "source": [
+    "### Grouping by RGI ID"
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "95c4df21-0fc9-44a9-ac98-dbe79c9d1e2f",
@@ -7690,7 +7668,9 @@
    "id": "4e2ca92b-1dfc-47d3-a169-db766d35686c",
    "metadata": {},
    "source": [
-    "and compute summary statistics for a single variable on the grouped object:"
+    "### Calculating summary statistics\n",
+    "\n",
+    "And compute summary statistics for a single variable on the grouped object:"
    ]
   },
   {
@@ -7739,6 +7719,14 @@
     "print(grid_min_sp.RGI_int.equals(grid_min_sp.RGI_int))"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "id": "9d82ca8b-d95c-4890-afbd-9e5d544772b8",
+   "metadata": {},
+   "source": [
+    "### Transitioning from 'lazy' operations to in-memory "
+   ]
+  },
   {
    "cell_type": "markdown",
    "id": "bc331b29-8f7e-47c5-94cf-3be465137344",

diff --git a/_sources/ind_glacier_analysis.ipynb b/_sources/ind_glacier_analysis.ipynb
@@ -8,10 +8,19 @@
     "\n",
     "This notebook will build upon the data access and inspection steps in the earlier notebooks and demonstrate basic data analysis and visualization of surface velocity data at the scale of an individual glacier using xarray. \n",
     "\n",
-    "*Learning goals*: \n",
-    "- using xarray label-based indexing and selection tools\n",
-    "- computation and grouped computation\n",
-    "- visualization"
+    "## Learning goals\n",
+    "\n",
+    "### Conceptual\n",
+    "- Visualizing statistical distributions\n",
+    "- Vectorized calculation of summary statistics over large dimensions\n",
+    "- Working with velocity component vectors\n",
+    "- Calculating magnitude of displacement from velocity component vectors\n",
+    "### Techniques\n",
+    "- Using [`xr.reduce()`](https://docs.xarray.dev/en/stable/generated/xarray.DataArray.reduce.html) to apply [scipy statistical functions](https://docs.scipy.org/doc/scipy/reference/stats.html) designed to ingest numpy arrays to Xarray objects\n",
+    "- Re-organize Xarray objects using [`xr.sortby()`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.sortby.html)\n",
+    "- Temporal resampling using [`xr.resample()`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.resample.html)\n",
+    "- Vectorized computation and reductions using [`xr.groupby()`](https://docs.xarray.dev/en/stable/user-guide/groupby.html) and [`xr.map()`](https://docs.xarray.dev/en/stable/generated/xarray.Dataset.map.html)\n",
+    "- Visualization using [`FacetGrid`](https://docs.xarray.dev/en/latest/generated/xarray.plot.FacetGrid.html) objects\n"
    ]
   },
   {
@@ -1123,6 +1132,13 @@
     "sample_glacier_raster"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Visualize and examine distributions of different variables"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -1216,7 +1232,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Calculating magnitude of velocity and inspecting the distribution of velocity data in space and time\n",
+    "## Calculating magnitude of velocity \n",
     "\n",
     "We'll first define a function for calculating magnitude of velocity in two ways. Because we want to calculate the magnitude of the displacement vector after we have already reduced the data along a dimension, we write a function that creates two magnitude of velocity variables, one where magnitude is calculated from the means of the vx and vy vectors in space and one where the magnitude is calculated from the medians of the vx and vy vectors in time. We just need to be careful which variable we use. \n",
     "\n"
@@ -1248,6 +1264,13 @@
     "sample_glacier_raster_mag = calc_velocity_magnitude(sample_glacier_raster)"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Visualize velocity variability "
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -1331,6 +1354,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
+    "### Checking coverage\n",
     "Now that we've reduced the dataset, we can look at the coverage of the magnitude variables using xarray methods.\n",
     "First, we want to know how many observations (not NaNs) exist along the time dimension of `v_mag_time`. We can use `xr.DataArray.count()`:"
    ]
@@ -1516,7 +1540,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Resampling"
+    "## Temporal resampling"
    ]
   },
   {
@@ -2805,7 +2829,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Calculating velocity anomalies\n",
+    "## Calculating velocity anomalies\n",
     "\n",
     "To do this, we will use xarray `.groupby()` and `.map()` \n",
     "\n",

diff --git a/_sources/ind_glacier_data_inspection.ipynb b/_sources/ind_glacier_data_inspection.ipynb
@@ -12,24 +12,27 @@
     "\n",
     "## Learning goals\n",
     "\n",
-    "### Techniques (xarray, python, general)\n",
+    "### Conceptual:\n",
+    "\n",
+    "- Subset large raster dataset to area of interest using vector data\n",
+    "- Examine different types of data stored within a raster object (in this example, the data is in the format of a [zarr datacube](https://zarr.readthedocs.io/en/stable/getting_started.html)\n",
+    "- Handling different coordinate reference systems and projections\n",
+    "- Dataset inspection using:  \n",
+    "    * Xarray label- and index-based selection  \n",
+    "    * Grouped computations and reductions  \n",
+    "    * Visualization tools\n",
+    "- Dimensional reductions and computations\n",
+    "- Examining velocity component vectors\n",
+    "- Calculating the magnitude of the displacement vector from velocity component vectors\n",
+    " \n",
+    "### Techniques \n",
     "\n",
     "- Read in raster data using `xarray`\n",
     "- Read in vector data using `geopandas`\n",
     "- Manipulate and organize raster data using `xarray` functionality\n",
     "- Explore larger-than-memory data using `dask` and `xarray`\n",
-    "\n",
-    "\n",
-    "### High-level science goals:\n",
-    "\n",
-    "- Subset large raster dataset to area of interest using vector data\n",
-    "- Examine different types of data stored within a raster object (in this example, the data is in the format of a [zarr datacube](https://zarr.readthedocs.io/en/stable/getting_started.html)\n",
-    "- Handling different coordinate reference systems and projections\n",
-    "- Dataset inspection using:\n",
-    "      - Xarray label- and index-based selection\n",
-    "      - Grouped computations and reductions\n",
-    "      - Visualization tools\n",
-    "  \n",
+    "- Troubleshooting errors and warnings\n",
+    "- Visualizing Xarray data using [FacetGrid](https://docs.xarray.dev/en/latest/generated/xarray.plot.FacetGrid.html) objects\n",
     "\n",
     "\n",
     "\n",
@@ -106,7 +109,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Reading in raster dataset (ITS_LIVE ice velocity data)"
+    "## Reading in ITS_LIVE ice velocity dataset (raster data)"
    ]
   },
   {
@@ -10449,7 +10452,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Incorporating vector data \n",
+    "## Incorporating glacier outlines (vector data )\n",
     "\n",
     "Vector data represent discrete features. They contain geometry data as well as attribute data about the features. For a more in-depth description of vector data, read [this](https://datacarpentry.org/organization-geospatial/02-intro-vector-data.html). We will use vector data to focus our analysis on specific glaciers. The dataset we will be usign is called the Randolph Glacier Inventory (RGI). It is a very important dataset for glaciology research; you can read more about it [here](http://www.glims.org/rgi_user_guide/welcome.html).\n",
     "\n",
@@ -10468,6 +10471,13 @@
     "se_asia = gpd.read_parquet('rgi7_region15_south_asia_east.parquet')"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Handling projections"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {},
@@ -11982,7 +11992,9 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Clip ITS_LIVE dataset to individual glacier extent\n",
+    "## Clip raster data using vector object \n",
+    "\n",
+    "Here we will subset the full ITS_LIVE granule to the extent of an individual glacier.\n",
     "\n",
     "First, we need to use `rio.write_crs()` to assign a CRS to the itslive object. If we don't do that first the `rio.clip()` command will produce an error\n",
     "\n",