Addressed Jessica's suggestions for QUEST notebook.

icesat2py · Nov 27, 2023 · d888127 · d888127
1 parent 283fc04
commit d888127
Showing 1 changed file with 174 additions and 65 deletions.
diff --git a/doc/source/example_notebooks/QUEST_argo_data_access.ipynb b/doc/source/example_notebooks/QUEST_argo_data_access.ipynb
@@ -36,18 +36,6 @@
     "import icepyx as ipx"
    ]
   },
-  {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "41bb9895",
-   "metadata": {},
-   "outputs": [],
-   "source": [
-    "%load_ext autoreload\n",
-    "import icepyx as ipx\n",
-    "%autoreload 2\n"
-   ]
-  },
   {
    "cell_type": "markdown",
    "id": "5c35f5df-b4fb-4a36-8d6f-d20f1552767a",
@@ -184,35 +172,35 @@
   },
   {
    "cell_type": "markdown",
-   "id": "62afb9ad",
-   "metadata": {},
+   "id": "7bade19e-5939-410a-ad54-363636289082",
+   "metadata": {
+    "user_expressions": []
+   },
    "source": [
-    "**ZACH**\n",
-    "\n",
-    "Could you add a little bit of text around argo parameters/presRange and the ability to search and download multiple times (outside the quest `search_all` and `download_all` options)? A few highlights that come to mind after recent updates:\n",
-    "- by default only temperature is gotten, but you can supply a list of the parameters you want to `reg_a.add_argo()`\n",
-    "- you can also directly, at any time, view or update the `reg_a.datasets['argo'].params` value, which will then be used in your next search or download\n",
-    "- alternatively, you can directly search/download via `reg_a.datasets['argo'].search_data()` and provide `params` or `presRange` keyword arguments that will replace the existing values of `reg_a.datasets['argo'].params`/`reg_a.datasets['argo'].presRange`\n",
-    "- when downloading, you can also provide the `keep_existing=True` kwarg to add more profiles, parameters, pressure ranges to your existing dataframe (and have them merged nicely for you)"
+    "When accessing Argo data, the variables of interest will be organized as vertical profiles as a function of pressure. By default, only temperature is queried, but the user can supply a list of desired parameters using the code below."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "dc921ca5",
-   "metadata": {},
+   "id": "6739c3aa-1a88-4d8e-9fd8-479528c20e97",
+   "metadata": {
+    "tags": []
+   },
    "outputs": [],
-   "source": []
+   "source": [
+    "# Customized variable query\n",
+    "reg_a.add_argo(params=['temperature'])"
+   ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
-   "id": "435a1243",
-   "metadata": {},
-   "outputs": [],
+   "cell_type": "markdown",
+   "id": "2d06436c-2271-4229-8196-9f5180975ab1",
+   "metadata": {
+    "user_expressions": []
+   },
    "source": [
-    "# see what argo parameters will be searched for or downloaded\n",
-    "reg_a.datasets['argo'].params"
+    "Additionally, a user may view or update the list of Argo parameters at any time through `reg_a.datasets['argo'].params`. If a user submits an invalid parameter (\"temp\" instead of \"temperature\", for example), an `AssertionError` will be passed."
    ]
   },
   {
@@ -223,29 +211,49 @@
    "outputs": [],
    "source": [
     "# update the list of argo parameters\n",
-    "reg_a.datasets['argo'].params = ['temperature','salinity']\n",
-    "\n",
-    "# if you submit an invalid parameter (such as 'temp' instead of 'temperature') you'll get an \n",
-    "# AssertionError and message saying the parameter is invalid (example: reg_a.datasets['argo'].params = ['temp','salinity'])"
+    "reg_a.datasets['argo'].params = ['temperature','salinity']"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "453900c1-cd62-40c9-820c-0615f63f17f5",
+   "metadata": {
+    "user_expressions": []
+   },
+   "source": [
+    "Another approach to directly search or download Argo data is to use `reg_a.datasets['argo'].search_data()`, and `reg_a.datasets['argo'].download()` as long as specific parameters and pressure ranges are given to `params` and `presRange`, respectively."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "3f55be4e-d261-49c1-ac14-e19d8e0ff828",
+   "metadata": {
+    "user_expressions": []
+   },
+   "source": [
+    "With our current setup, let's see what Argo parameters we will get."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "c15675df",
+   "id": "435a1243",
    "metadata": {},
    "outputs": [],
    "source": [
-    "reg_a.datasets['argo'].search_data()"
+    "# see what argo parameters will be searched for or downloaded\n",
+    "reg_a.datasets['argo'].params"
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "id": "db56cc33",
+   "id": "c15675df",
    "metadata": {},
    "outputs": [],
-   "source": []
+   "source": [
+    "reg_a.datasets['argo'].search_data()"
+   ]
   },
   {
    "cell_type": "markdown",
@@ -271,23 +279,109 @@
     "path = '/icepyx/quest/downloaded-data/'\n",
     "\n",
     "# Access Argo and ICESat-2 data simultaneously\n",
-    "reg_a.download_all(path)"
+    "reg_a.download_all()"
    ]
   },
   {
    "cell_type": "markdown",
-   "id": "6970f0ad-9364-4732-a5e6-f93cf3fc31a3",
+   "id": "ad29285e-d161-46ea-8a57-95891fa2b237",
    "metadata": {
+    "tags": [],
     "user_expressions": []
    },
    "source": [
-    "We now have 19 available Argo profiles, each containing `temperature` and `pressure`, compiled into a Pandas DataFrame. **NOTE: BGC Argo is currently fully implemented** When BGC Argo is fully implemented to QUEST, we could add more variables to this list.\n",
-    "\n",
-    "We also have a series of files containing ICESat-2 ATL03 data. Because these data files are very large, we are only going to focus on one of these files for this example.\n",
+    "We now have 19 available Argo profiles, each containing `temperature` and `pressure`, compiled into a Pandas DataFrame. BGC Argo is also available through QUEST, so we could add more variables to this list.\n",
     "\n",
-    "Let's now load one of the ICESat-2 files and see where it passes relative to the Argo float data.\n",
+    "If the user wishes to add more profiles, parameters, and/or pressure ranges to a pre-existing DataFrame, then they should use `reg_a.download_all(path, keep_existing=True)` to retain previously queried data."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "6970f0ad-9364-4732-a5e6-f93cf3fc31a3",
+   "metadata": {
+    "user_expressions": []
+   },
+   "source": [
+    "The download function also provided a series of files containing ICESat-2 ATL03 data. Because these data files are very large, we are only going to focus on one file for this example.\n",
     "\n",
-    "**Zach** would you be open to switching this to use icepyx's read module? We could easily use the `xarray.to_dataframe` to then work with the rest of this notebook!"
+    "The below workflow uses the icepyx Read module to quickly load ICESat-2 data into the XArray format."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "88f4b1b0-8c58-414c-b6a8-ce1662979943",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "#path_root = '/icepyx/quest-test-data/'\n",
+    "path_root = '/icepyx/quest-test-data/processed_ATL03_20220419002753_04111506_006_02.h5'\n",
+    "reader = ipx.Read(path_root)"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "665d79a7-7360-4846-99c2-222b34df2a92",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "reader.vars.append(beam_list=['gt2l'], \n",
+    "                   var_list=['h_ph', \"lat_ph\", \"lon_ph\", 'signal_conf_ph'])"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "e7158814-50f0-4940-980c-9bb800360982",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "ds = reader.load()\n",
+    "ds"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "id": "1040438c-d806-4964-b4f0-1247da9f3f1f",
+   "metadata": {
+    "user_expressions": []
+   },
+   "source": [
+    "To make the data more easily plottable, let's convert the data into a Pandas DataFrame. Note that this method is memory-intensive for ATL03 data, so users are suggested to look at small spatial domains to prevent the notebook from crashing."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "bc086db7-f5a1-4ba7-ba90-5b19afaf6808",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "is2_pd = ds.to_dataframe()"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "fc67e039-338c-4348-acaf-96f605cf0030",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# Rearrange the data to only include \"ocean\" photons\n",
+    "is2_pd = is2_pd.reset_index(level=[0,1,2])\n",
+    "is2_pd_ocean = is2_pd[is2_pd.index==1]\n",
+    "is2_pd_ocean"
    ]
   },
   {
@@ -299,14 +393,6 @@
    },
    "outputs": [],
    "source": [
-    "# Load ICESat-2 latitudes, longitudes, heights, and photon confidence (optional)\n",
-    "is2_pd = pd.DataFrame()\n",
-    "with h5py.File(f'{path_root}processed_ATL03_20220419002753_04111506_006_02.h5', 'r') as f:\n",
-    "    is2_pd['lat'] = f['gt2l/heights/lat_ph'][:]\n",
-    "    is2_pd['lon'] = f['gt2l/heights/lon_ph'][:]\n",
-    "    is2_pd['height'] = f['gt2l/heights/h_ph'][:]\n",
-    "    is2_pd['signal_conf'] = f['gt2l/heights/signal_conf_ph'][:,1]\n",
-    "    \n",
     "# Set Argo data as its own DataFrame\n",
     "argo_df = reg_a.datasets['argo'].argodata"
    ]
@@ -321,8 +407,8 @@
    "outputs": [],
    "source": [
     "# Convert both DataFrames into GeoDataFrames\n",
-    "is2_gdf = gpd.GeoDataFrame(is2_pd, \n",
-    "                           geometry=gpd.points_from_xy(is2_pd.lon, is2_pd.lat),\n",
+    "is2_gdf = gpd.GeoDataFrame(is2_pd_ocean, \n",
+    "                           geometry=gpd.points_from_xy(is2_pd_ocean['lon_ph'], is2_pd_ocean['lat_ph']),\n",
     "                           crs='EPSG:4326'\n",
     ")\n",
     "argo_gdf = gpd.GeoDataFrame(argo_df, \n",
@@ -338,7 +424,22 @@
     "user_expressions": []
    },
    "source": [
-    "To view the relative locations of ICESat-2 and Argo, the below cell uses the `explore()` function from GeoPandas. For large datasets like ICESat-2, loading the map might take a while."
+    "To view the relative locations of ICESat-2 and Argo, the below cell uses the `explore()` function from GeoPandas. The time variables cause errors in the function, so we will drop those variables first. \n",
+    "\n",
+    "Note that for large datasets like ICESat-2, loading the map might take a while."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "7178fecc-6ca1-42a1-98d4-08f57c050daa",
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "# Drop time variables that would cause errors in explore() function\n",
+    "is2_gdf = is2_gdf.drop(['data_start_utc','data_end_utc','delta_time','atlas_sdp_gps_epoch'], axis=1)"
    ]
   },
   {
@@ -351,8 +452,8 @@
    "outputs": [],
    "source": [
     "# Plot ICESat-2 track (medium/high confidence photons only) on a map\n",
-    "m = is2_gdf[is2_gdf['signal_conf']>=3].explore(tiles='Esri.WorldImagery',\n",
-    "                                             name='ICESat-2')\n",
+    "m = is2_gdf[is2_gdf['signal_conf_ph']>=3].explore(column='rgt', tiles='Esri.WorldImagery',\n",
+    "                                                  name='ICESat-2')\n",
     "\n",
     "# Add Argo float locations to map\n",
     "argo_gdf.explore(m=m, name='Argo', marker_kwds={\"radius\": 6}, color='red')"
@@ -408,7 +509,7 @@
    "outputs": [],
    "source": [
     "# Only consider ICESat-2 signal photons\n",
-    "is2_pd_signal = is2_pd[is2_pd['signal_conf']>0]\n",
+    "is2_pd_signal = is2_pd_ocean[is2_pd_ocean['signal_conf_ph']>=0]\n",
     "\n",
     "## Multi-panel plot showing ICESat-2 and Argo data\n",
     "\n",
@@ -425,7 +526,7 @@
     "world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))\n",
     "world.plot(ax=ax1, color='0.8', edgecolor='black')\n",
     "argo_df.plot.scatter(ax=ax1, x='lon', y='lat', s=25.0, c='green', zorder=3, alpha=0.3)\n",
-    "is2_pd.plot.scatter(ax=ax1, x='lon', y='lat', s=10.0, zorder=2, alpha=0.3)\n",
+    "is2_pd_signal.plot.scatter(ax=ax1, x='lon_ph', y='lat_ph', s=10.0, zorder=2, alpha=0.3)\n",
     "ax1.plot(lons, lats, linewidth=1.5, color='orange', zorder=2)\n",
     "#df.plot(ax=ax2, x='lon', y='lat', marker='o', color='red', markersize=2.5, zorder=3)\n",
     "ax1.set_xlim(-160,-100)\n",
@@ -436,7 +537,7 @@
     "\n",
     "# Plot Zoomed View of Ground Tracks\n",
     "argo_df.plot.scatter(ax=ax2, x='lon', y='lat', s=50.0, c='green', zorder=3, alpha=0.3)\n",
-    "is2_pd.plot.scatter(ax=ax2, x='lon', y='lat', s=10.0, zorder=2, alpha=0.3)\n",
+    "is2_pd_signal.plot.scatter(ax=ax2, x='lon_ph', y='lat_ph', s=10.0, zorder=2, alpha=0.3)\n",
     "ax2.plot(lons, lats, linewidth=1.5, color='orange', zorder=1)\n",
     "ax2.scatter(-151.98956, 34.43885, color='orange', marker='^', s=80, zorder=4)\n",
     "ax2.set_xlim(min(lons) - lon_margin, max(lons) + lon_margin)\n",
@@ -446,10 +547,10 @@
     "ax2.set_ylabel('Latitude', fontsize=18)\n",
     "\n",
     "# Plot ICESat-2 along-track vertical profile. A dotted line notes the location of a nearby Argo float\n",
-    "is2 = ax3.scatter(is2_pd_signal['lat'], is2_pd_signal['height'], s=0.1)\n",
+    "is2 = ax3.scatter(is2_pd_signal['lat_ph'], is2_pd_signal['h_ph']+13.1, s=0.1)\n",
     "ax3.axvline(34.43885, linestyle='--', linewidth=3, color='black')\n",
     "ax3.set_xlim([34.3, 34.5])\n",
-    "ax3.set_ylim([-15, 5])\n",
+    "ax3.set_ylim([-20, 5])\n",
     "ax3.set_xlabel('Latitude', fontsize=18)\n",
     "ax3.set_ylabel('Approx. IS-2 Depth [m]', fontsize=16)\n",
     "ax3.set_yticklabels(['15', '10', '5', '0', '-5'])\n",
@@ -467,6 +568,14 @@
     "# Save figure\n",
     "#plt.savefig('/icepyx/quest/figures/is2_argo_figure.png', dpi=500)"
    ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "9b6548e2-0662-4c8b-a251-55ca63aff99b",
+   "metadata": {},
+   "outputs": [],
+   "source": []
   }
  ],
  "metadata": {
@@ -485,7 +594,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.10"
+   "version": "3.10.12"
   }
  },
  "nbformat": 4,