From d888127f22efcaade481c7612eeba144164eb506 Mon Sep 17 00:00:00 2001 From: zachghiaccio Date: Mon, 27 Nov 2023 17:32:39 +0000 Subject: [PATCH] Addressed Jessica's suggestions for QUEST notebook. --- .../QUEST_argo_data_access.ipynb | 239 +++++++++++++----- 1 file changed, 174 insertions(+), 65 deletions(-) diff --git a/doc/source/example_notebooks/QUEST_argo_data_access.ipynb b/doc/source/example_notebooks/QUEST_argo_data_access.ipynb index 21307ebe2..4c6d71ffc 100644 --- a/doc/source/example_notebooks/QUEST_argo_data_access.ipynb +++ b/doc/source/example_notebooks/QUEST_argo_data_access.ipynb @@ -36,18 +36,6 @@ "import icepyx as ipx" ] }, - { - "cell_type": "code", - "execution_count": null, - "id": "41bb9895", - "metadata": {}, - "outputs": [], - "source": [ - "%load_ext autoreload\n", - "import icepyx as ipx\n", - "%autoreload 2\n" - ] - }, { "cell_type": "markdown", "id": "5c35f5df-b4fb-4a36-8d6f-d20f1552767a", @@ -184,35 +172,35 @@ }, { "cell_type": "markdown", - "id": "62afb9ad", - "metadata": {}, + "id": "7bade19e-5939-410a-ad54-363636289082", + "metadata": { + "user_expressions": [] + }, "source": [ - "**ZACH**\n", - "\n", - "Could you add a little bit of text around argo parameters/presRange and the ability to search and download multiple times (outside the quest `search_all` and `download_all` options)? A few highlights that come to mind after recent updates:\n", - "- by default only temperature is gotten, but you can supply a list of the parameters you want to `reg_a.add_argo()`\n", - "- you can also directly, at any time, view or update the `reg_a.datasets['argo'].params` value, which will then be used in your next search or download\n", - "- alternatively, you can directly search/download via `reg_a.datasets['argo'].search_data()` and provide `params` or `presRange` keyword arguments that will replace the existing values of `reg_a.datasets['argo'].params`/`reg_a.datasets['argo'].presRange`\n", - "- when downloading, you can also provide the `keep_existing=True` kwarg to add more profiles, parameters, pressure ranges to your existing dataframe (and have them merged nicely for you)" + "When accessing Argo data, the variables of interest will be organized as vertical profiles as a function of pressure. By default, only temperature is queried, but the user can supply a list of desired parameters using the code below." ] }, { "cell_type": "code", "execution_count": null, - "id": "dc921ca5", - "metadata": {}, + "id": "6739c3aa-1a88-4d8e-9fd8-479528c20e97", + "metadata": { + "tags": [] + }, "outputs": [], - "source": [] + "source": [ + "# Customized variable query\n", + "reg_a.add_argo(params=['temperature'])" + ] }, { - "cell_type": "code", - "execution_count": null, - "id": "435a1243", - "metadata": {}, - "outputs": [], + "cell_type": "markdown", + "id": "2d06436c-2271-4229-8196-9f5180975ab1", + "metadata": { + "user_expressions": [] + }, "source": [ - "# see what argo parameters will be searched for or downloaded\n", - "reg_a.datasets['argo'].params" + "Additionally, a user may view or update the list of Argo parameters at any time through `reg_a.datasets['argo'].params`. If a user submits an invalid parameter (\"temp\" instead of \"temperature\", for example), an `AssertionError` will be passed." ] }, { @@ -223,29 +211,49 @@ "outputs": [], "source": [ "# update the list of argo parameters\n", - "reg_a.datasets['argo'].params = ['temperature','salinity']\n", - "\n", - "# if you submit an invalid parameter (such as 'temp' instead of 'temperature') you'll get an \n", - "# AssertionError and message saying the parameter is invalid (example: reg_a.datasets['argo'].params = ['temp','salinity'])" + "reg_a.datasets['argo'].params = ['temperature','salinity']" + ] + }, + { + "cell_type": "markdown", + "id": "453900c1-cd62-40c9-820c-0615f63f17f5", + "metadata": { + "user_expressions": [] + }, + "source": [ + "Another approach to directly search or download Argo data is to use `reg_a.datasets['argo'].search_data()`, and `reg_a.datasets['argo'].download()` as long as specific parameters and pressure ranges are given to `params` and `presRange`, respectively." + ] + }, + { + "cell_type": "markdown", + "id": "3f55be4e-d261-49c1-ac14-e19d8e0ff828", + "metadata": { + "user_expressions": [] + }, + "source": [ + "With our current setup, let's see what Argo parameters we will get." ] }, { "cell_type": "code", "execution_count": null, - "id": "c15675df", + "id": "435a1243", "metadata": {}, "outputs": [], "source": [ - "reg_a.datasets['argo'].search_data()" + "# see what argo parameters will be searched for or downloaded\n", + "reg_a.datasets['argo'].params" ] }, { "cell_type": "code", "execution_count": null, - "id": "db56cc33", + "id": "c15675df", "metadata": {}, "outputs": [], - "source": [] + "source": [ + "reg_a.datasets['argo'].search_data()" + ] }, { "cell_type": "markdown", @@ -271,23 +279,109 @@ "path = '/icepyx/quest/downloaded-data/'\n", "\n", "# Access Argo and ICESat-2 data simultaneously\n", - "reg_a.download_all(path)" + "reg_a.download_all()" ] }, { "cell_type": "markdown", - "id": "6970f0ad-9364-4732-a5e6-f93cf3fc31a3", + "id": "ad29285e-d161-46ea-8a57-95891fa2b237", "metadata": { + "tags": [], "user_expressions": [] }, "source": [ - "We now have 19 available Argo profiles, each containing `temperature` and `pressure`, compiled into a Pandas DataFrame. **NOTE: BGC Argo is currently fully implemented** When BGC Argo is fully implemented to QUEST, we could add more variables to this list.\n", - "\n", - "We also have a series of files containing ICESat-2 ATL03 data. Because these data files are very large, we are only going to focus on one of these files for this example.\n", + "We now have 19 available Argo profiles, each containing `temperature` and `pressure`, compiled into a Pandas DataFrame. BGC Argo is also available through QUEST, so we could add more variables to this list.\n", "\n", - "Let's now load one of the ICESat-2 files and see where it passes relative to the Argo float data.\n", + "If the user wishes to add more profiles, parameters, and/or pressure ranges to a pre-existing DataFrame, then they should use `reg_a.download_all(path, keep_existing=True)` to retain previously queried data." + ] + }, + { + "cell_type": "markdown", + "id": "6970f0ad-9364-4732-a5e6-f93cf3fc31a3", + "metadata": { + "user_expressions": [] + }, + "source": [ + "The download function also provided a series of files containing ICESat-2 ATL03 data. Because these data files are very large, we are only going to focus on one file for this example.\n", "\n", - "**Zach** would you be open to switching this to use icepyx's read module? We could easily use the `xarray.to_dataframe` to then work with the rest of this notebook!" + "The below workflow uses the icepyx Read module to quickly load ICESat-2 data into the XArray format." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "88f4b1b0-8c58-414c-b6a8-ce1662979943", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "#path_root = '/icepyx/quest-test-data/'\n", + "path_root = '/icepyx/quest-test-data/processed_ATL03_20220419002753_04111506_006_02.h5'\n", + "reader = ipx.Read(path_root)" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "665d79a7-7360-4846-99c2-222b34df2a92", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "reader.vars.append(beam_list=['gt2l'], \n", + " var_list=['h_ph', \"lat_ph\", \"lon_ph\", 'signal_conf_ph'])" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "e7158814-50f0-4940-980c-9bb800360982", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "ds = reader.load()\n", + "ds" + ] + }, + { + "cell_type": "markdown", + "id": "1040438c-d806-4964-b4f0-1247da9f3f1f", + "metadata": { + "user_expressions": [] + }, + "source": [ + "To make the data more easily plottable, let's convert the data into a Pandas DataFrame. Note that this method is memory-intensive for ATL03 data, so users are suggested to look at small spatial domains to prevent the notebook from crashing." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "bc086db7-f5a1-4ba7-ba90-5b19afaf6808", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "is2_pd = ds.to_dataframe()" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "fc67e039-338c-4348-acaf-96f605cf0030", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Rearrange the data to only include \"ocean\" photons\n", + "is2_pd = is2_pd.reset_index(level=[0,1,2])\n", + "is2_pd_ocean = is2_pd[is2_pd.index==1]\n", + "is2_pd_ocean" ] }, { @@ -299,14 +393,6 @@ }, "outputs": [], "source": [ - "# Load ICESat-2 latitudes, longitudes, heights, and photon confidence (optional)\n", - "is2_pd = pd.DataFrame()\n", - "with h5py.File(f'{path_root}processed_ATL03_20220419002753_04111506_006_02.h5', 'r') as f:\n", - " is2_pd['lat'] = f['gt2l/heights/lat_ph'][:]\n", - " is2_pd['lon'] = f['gt2l/heights/lon_ph'][:]\n", - " is2_pd['height'] = f['gt2l/heights/h_ph'][:]\n", - " is2_pd['signal_conf'] = f['gt2l/heights/signal_conf_ph'][:,1]\n", - " \n", "# Set Argo data as its own DataFrame\n", "argo_df = reg_a.datasets['argo'].argodata" ] @@ -321,8 +407,8 @@ "outputs": [], "source": [ "# Convert both DataFrames into GeoDataFrames\n", - "is2_gdf = gpd.GeoDataFrame(is2_pd, \n", - " geometry=gpd.points_from_xy(is2_pd.lon, is2_pd.lat),\n", + "is2_gdf = gpd.GeoDataFrame(is2_pd_ocean, \n", + " geometry=gpd.points_from_xy(is2_pd_ocean['lon_ph'], is2_pd_ocean['lat_ph']),\n", " crs='EPSG:4326'\n", ")\n", "argo_gdf = gpd.GeoDataFrame(argo_df, \n", @@ -338,7 +424,22 @@ "user_expressions": [] }, "source": [ - "To view the relative locations of ICESat-2 and Argo, the below cell uses the `explore()` function from GeoPandas. For large datasets like ICESat-2, loading the map might take a while." + "To view the relative locations of ICESat-2 and Argo, the below cell uses the `explore()` function from GeoPandas. The time variables cause errors in the function, so we will drop those variables first. \n", + "\n", + "Note that for large datasets like ICESat-2, loading the map might take a while." + ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "7178fecc-6ca1-42a1-98d4-08f57c050daa", + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "# Drop time variables that would cause errors in explore() function\n", + "is2_gdf = is2_gdf.drop(['data_start_utc','data_end_utc','delta_time','atlas_sdp_gps_epoch'], axis=1)" ] }, { @@ -351,8 +452,8 @@ "outputs": [], "source": [ "# Plot ICESat-2 track (medium/high confidence photons only) on a map\n", - "m = is2_gdf[is2_gdf['signal_conf']>=3].explore(tiles='Esri.WorldImagery',\n", - " name='ICESat-2')\n", + "m = is2_gdf[is2_gdf['signal_conf_ph']>=3].explore(column='rgt', tiles='Esri.WorldImagery',\n", + " name='ICESat-2')\n", "\n", "# Add Argo float locations to map\n", "argo_gdf.explore(m=m, name='Argo', marker_kwds={\"radius\": 6}, color='red')" @@ -408,7 +509,7 @@ "outputs": [], "source": [ "# Only consider ICESat-2 signal photons\n", - "is2_pd_signal = is2_pd[is2_pd['signal_conf']>0]\n", + "is2_pd_signal = is2_pd_ocean[is2_pd_ocean['signal_conf_ph']>=0]\n", "\n", "## Multi-panel plot showing ICESat-2 and Argo data\n", "\n", @@ -425,7 +526,7 @@ "world = gpd.read_file(gpd.datasets.get_path('naturalearth_lowres'))\n", "world.plot(ax=ax1, color='0.8', edgecolor='black')\n", "argo_df.plot.scatter(ax=ax1, x='lon', y='lat', s=25.0, c='green', zorder=3, alpha=0.3)\n", - "is2_pd.plot.scatter(ax=ax1, x='lon', y='lat', s=10.0, zorder=2, alpha=0.3)\n", + "is2_pd_signal.plot.scatter(ax=ax1, x='lon_ph', y='lat_ph', s=10.0, zorder=2, alpha=0.3)\n", "ax1.plot(lons, lats, linewidth=1.5, color='orange', zorder=2)\n", "#df.plot(ax=ax2, x='lon', y='lat', marker='o', color='red', markersize=2.5, zorder=3)\n", "ax1.set_xlim(-160,-100)\n", @@ -436,7 +537,7 @@ "\n", "# Plot Zoomed View of Ground Tracks\n", "argo_df.plot.scatter(ax=ax2, x='lon', y='lat', s=50.0, c='green', zorder=3, alpha=0.3)\n", - "is2_pd.plot.scatter(ax=ax2, x='lon', y='lat', s=10.0, zorder=2, alpha=0.3)\n", + "is2_pd_signal.plot.scatter(ax=ax2, x='lon_ph', y='lat_ph', s=10.0, zorder=2, alpha=0.3)\n", "ax2.plot(lons, lats, linewidth=1.5, color='orange', zorder=1)\n", "ax2.scatter(-151.98956, 34.43885, color='orange', marker='^', s=80, zorder=4)\n", "ax2.set_xlim(min(lons) - lon_margin, max(lons) + lon_margin)\n", @@ -446,10 +547,10 @@ "ax2.set_ylabel('Latitude', fontsize=18)\n", "\n", "# Plot ICESat-2 along-track vertical profile. A dotted line notes the location of a nearby Argo float\n", - "is2 = ax3.scatter(is2_pd_signal['lat'], is2_pd_signal['height'], s=0.1)\n", + "is2 = ax3.scatter(is2_pd_signal['lat_ph'], is2_pd_signal['h_ph']+13.1, s=0.1)\n", "ax3.axvline(34.43885, linestyle='--', linewidth=3, color='black')\n", "ax3.set_xlim([34.3, 34.5])\n", - "ax3.set_ylim([-15, 5])\n", + "ax3.set_ylim([-20, 5])\n", "ax3.set_xlabel('Latitude', fontsize=18)\n", "ax3.set_ylabel('Approx. IS-2 Depth [m]', fontsize=16)\n", "ax3.set_yticklabels(['15', '10', '5', '0', '-5'])\n", @@ -467,6 +568,14 @@ "# Save figure\n", "#plt.savefig('/icepyx/quest/figures/is2_argo_figure.png', dpi=500)" ] + }, + { + "cell_type": "code", + "execution_count": null, + "id": "9b6548e2-0662-4c8b-a251-55ca63aff99b", + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { @@ -485,7 +594,7 @@ "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", - "version": "3.10.10" + "version": "3.10.12" } }, "nbformat": 4,