Skip to content

Commit

Permalink
up on blocks
Browse files Browse the repository at this point in the history
  • Loading branch information
robfatland committed Jul 11, 2024
1 parent 631a3b0 commit 7152f48
Show file tree
Hide file tree
Showing 2 changed files with 160 additions and 29 deletions.
187 changes: 159 additions & 28 deletions book/chapters/dataloader.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -11,14 +11,26 @@
"# Data Loader\n",
"\n",
"\n",
"In the event that the data referenced in this book are not available: This notebook has sections to load data from the cloud.\n",
"In particular we are using the **interactive oceans** zarr file collections via the Python `s3fs` library.\n",
"The data referenced in this Jupyter Book exceed recommended GitHub repository capacity. As a result\n",
"this notebook loads data from the cloud as needed to a directory (folder) external to the repository.\n",
"This directory *appears* to be part of the repository by means of a symbolic link. \n",
"\n",
"\n",
"There are two important preliminary steps necessary for data loading: Creating a landing place for the data\n",
"and enabling the ingest processing. This takes about five minutes.\n",
"The data source is the **interactive oceans** zarr-format collections found via the Python `s3fs` library\n",
"under the name `ooi-data`.\n",
"\n",
"> 1. Landing site for data. The repo data folder resides in the `chapters` folder with a path like\n",
"\n",
"Consequently the data load process can be summarized as 'I created a data folder with a couple Gigabytes\n",
"of available capacity, I set up a symbolic link from within the Jupyter Book **`oceanography`** repository\n",
"(which I created using `git clone`); and then I connected to the cloud using the Python `s3fs` library\n",
"to access data of interest.'\n",
"\n",
"\n",
"The two important preliminary steps (after cloning this repository) are creating that landing space \n",
"for the data and enabling ingest processing in this notebook/chapter. These steps take about five minutes.\n",
"\n",
"\n",
"> 1. **Setting up a landing site for data** The repo data folder resides in the `chapters` folder with a path like\n",
"`data/rca/sensors/osb`. The necessary data volume is about 1GB. Supposer your home directory is `/home/roger`.\n",
"The the full path to this folder, assuming you have the `oceanography` repo installed in your home\n",
"folder, would be `/home/roger/oceanography/books/chapters/data/rca/sensors/osb`. Notice that this is\n",
Expand All @@ -44,17 +56,16 @@
"\n",
"\n",
"With these two steps done (and presuming you have `s3fs` installed in your environment)\n",
"you are ready to run the cells in this notebook that grab the different sensor stream\n",
"datasets. At the moment there are 15 but another 8 are pending. This is all but one of\n",
"the shallow profiler sensors accounted for. The 83-channel \n",
"spectrophotometer is treated separately.\n",
"you are ready to run the cells in this notebook that connect to the sensor stream\n",
"datasets. At the moment there are 15 sensors of interest with more pending. One\n",
"instrument, the 83-channel spectrophotometer, is treated separately.\n",
"\n",
"\n",
"If you run an ingest cell twice you may get an error message: The code is not set up \n",
"to overwrite an existing output file. The charts all follow a reload of saved data; \n",
"so the charting code should really check first to see that the intended local data file\n",
"actually exists. Finally all of this code (across ten separate instruments; so ten versions\n",
"of the same code) could be consolidated into a single version.\n",
"> Pro tips:\n",
"> - Possible issue if running an ingest cell a second time: The code may try and fail to clobber an existing data file\n",
"> - charting is done from locally saved data, not from cloud data\n",
"> - improvement: charting code ought to check for the local file existence first\n",
"> - improvement: the code should be consolidated as monolithic\n",
"\n",
"\n",
"For more on Zarr store use see [Joe Duprey's gist on GitHub](https://gist.github.com/jdduprey/7d5735d6de9c0c46fd16b78ee865f612).\n",
Expand All @@ -65,7 +76,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 5,
"id": "e3021b6b-d5b8-4c06-9b2d-05e1ccceb286",
"metadata": {
"tags": []
Expand All @@ -75,8 +86,6 @@
"name": "stdout",
"output_type": "stream",
"text": [
"\n",
"Jupyter Notebook running Python 3\n",
"List Oregon Slope Base Profiler streams:\n",
"\n",
"ooi-data/RS01SBPS-SF01A-2A-CTDPFA102-streamed-ctdpf_sbe43_sample\n",
Expand Down Expand Up @@ -134,6 +143,9 @@
"import s3fs\n",
"from shallowprofiler import *\n",
"from charts import *\n",
"from sys import exit\n",
"from os import path\n",
"\n",
"\n",
"doIngest = True\n",
"\n",
Expand Down Expand Up @@ -240,7 +252,7 @@
},
{
"cell_type": "markdown",
"id": "1b94a1e6-0ae5-4423-9606-1ecfeff4ca94",
"id": "c413eb11-1326-459a-8e98-2a5f302f5fcc",
"metadata": {},
"source": [
"### Go through all 10 osb profiler streams in sequence\n",
Expand All @@ -256,10 +268,105 @@
"- nutnr_a_dark_sample\n",
"- nutnr_a_sample\n",
"- velpt\n",
"- pco2w\n",
"- pco2w"
]
},
{
"cell_type": "code",
"execution_count": 11,
"id": "336aa001-e1ef-4a0d-aad9-56d5bfc4619e",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"streamlist_profiler_site_keys = {'osb':'SF01A', 'oof':'SF01B', 'axb':'SF03A'}\n",
"streamlist_platform_site_keys = {'osb':'PC01A', 'oof':'PC01B', 'axb':'PC03A'}\n",
"\n",
"site_key = 'osb'\n",
"\n",
"profiler_instrument_streams = [sname for sname in streamlist if streamlist_profiler_site_keys[site_key] in sname]"
]
},
{
"cell_type": "code",
"execution_count": 21,
"id": "16f9cf4c-a4ce-4f9d-9e2d-23b88e90502c",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"official_instrument_keys = ['ctdpf', 'phsen', 'flort', 'nutnr_a_dark_sample', 'nutnr_a_sample', 'velpt', 'pco2w']\n",
"official_unused_instrument_keys = \n",
"\n",
"doIngest = True\n",
"do_ingest = [doIngest]*len(instrument_keys)\n",
"sensor_official_names = [['corrected_dissolved_oxygen','sea_water_density','sea_water_electrical_conductivity','sea_water_practical_salinity','sea_water_temperature'], \\\n",
" ['ph_seawater'], \\\n",
" ['fluorometric_cdom','fluorometric_chlorophyll_a','optical_backscatter'], \\\n",
" ['nitrate_concentration'], \\\n",
" ['salinity_corrected_nitrate'], \\\n",
" ['velpt_d_upward_velocity','velpt_d_northward_velocity','velpt_d_eastward_velocity'], \n",
" ['pco2_seawater']]\n",
"sensor_informal_names = [['do','density','conductivity','salinity','temp'], ['ph'], ['fdom','chlora','backscatter'], ['nitrate_dark'], ['nitrate'], ['up','north','east'], ['pco2']]\n",
"sensor_official_depth = ['sea_water_pressure','int_ctd_pressure','int_ctd_pressure','int_ctd_pressure','int_ctd_pressure','int_ctd_pressure','int_ctd_pressure']\n",
"\n",
"nSensors = [len(sensorlist) for sensorlist in sensor_official_names]\n",
"nSensorsCheck = [len(sensorlist) for sensorlist in sensor_informal_names]\n",
"if not nSensors == nSensorsCheck:\n",
" print(\"Sensor descriptions official/informal do not align\")\n",
" exit()"
]
},
{
"cell_type": "code",
"execution_count": 13,
"id": "3508d41a-19f7-4acb-a9fb-b3f4f4280720",
"metadata": {
"tags": []
},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Found instrument ctdpf: ooi-data/RS01SBPS-SF01A-2A-CTDPFA102-streamed-ctdpf_sbe43_sample\n",
"Found instrument phsen: ooi-data/RS01SBPS-SF01A-2D-PHSENA101-streamed-phsen_data_record\n",
"Found instrument flort: ooi-data/RS01SBPS-SF01A-3A-FLORTD101-streamed-flort_d_data_record\n",
"Found instrument nutnr_a_dark_sample: ooi-data/RS01SBPS-SF01A-4A-NUTNRA101-streamed-nutnr_a_dark_sample\n",
"Found instrument nutnr_a_sample: ooi-data/RS01SBPS-SF01A-4A-NUTNRA101-streamed-nutnr_a_sample\n",
"Found instrument velpt: ooi-data/RS01SBPS-SF01A-4B-VELPTD102-streamed-velpt_velocity_data\n",
"Found instrument pco2w: ooi-data/RS01SBPS-SF01A-4F-PCO2WA101-streamed-pco2w_a_sami_data_record\n"
]
}
],
"source": [
"for ik in instrument_keys: # going in order CTD ... pCO2, ik is a string like 'ctdpf'\n",
" idx = instrument_keys.index(ik) # idx is an instrument index\n",
" for s in profiler_instrument_streams: # profiler_instrument_streams is a list of 10 long stream names\n",
" if ik in s: # match up the one we want \n",
" key = profiler_instrument_streams.index(s) # key is the index of the stream of interest\n",
" print('Found instrument ' + ik + ': ' + s)\n",
" if doIngest[idx]:\n",
" ds = loadData(\n",
" for sensor in sensor_official_names[idx]:\n",
" sensor_index = sensor_official_names[idx].index(sensor)\n",
" \n",
" else:\n",
" print(\"Skipping ingest on this instrument.\")\n",
" "
]
},
{
"cell_type": "markdown",
"id": "d10e5137-aa7c-4da7-bcf6-369f4c66695f",
"metadata": {},
"source": [
"This is the development code\n",
"\n",
"#### 1 of 10: **ctdpf** i.e. CTD"
"#### 1 of 10: **ctdpf** i.e. CTD\n",
"\n"
]
},
{
Expand Down Expand Up @@ -745,21 +852,42 @@
},
{
"cell_type": "code",
"execution_count": null,
"id": "7e641072-2737-4d98-bc84-9394cf9081c1",
"execution_count": 23,
"id": "116d035e-c844-44a3-b021-49ec801c0d88",
"metadata": {
"tags": []
},
"outputs": [],
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Found this instrument stream: ooi-data/RS01SBPS-SF01A-3C-PARADA101-streamed-parad_sa_sample\n",
"Compare: ooi-data/RS01SBPS-SF01A-3C-PARADA101-streamed-parad_sa_sample\n"
]
}
],
"source": [
"if False:\n",
"if True:\n",
" instrument_key = 'parad'\n",
" for s in osb_profiler_streams: \n",
" if instrument_key in s: \n",
" print('Found this instrument stream:', s)\n",
" print('Compare: ooi-data/RS01SBPS-SF01A-3C-PARADA101-streamed-parad_sa_sample') \n",
" instrument_stream = s\n",
" break\n",
"\n",
" break"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "62581c86-5bd2-4adf-b70e-383b50a44c9a",
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"if True:\n",
" ds = loadData(instrument_stream) # lazy load\n",
" t0, t1 = '2022-01-01T00', '2022-12-31T23' # January 2022\n",
" ds = ds.sel(time=slice(t0, t1)) # Subset the full time range to one month\n",
Expand All @@ -772,7 +900,10 @@
"id": "a85cfc79-50dd-42a0-a7e6-315bb4c4f123",
"metadata": {},
"source": [
"***seems to fail: kernel restart (timeout?)***"
"***seems to fail: kernel restart (timeout?)***\n",
"\n",
"\n",
"Compare: Joe says this stream is ok: `RS01SBPS-SF01A-3C-PARADA101-streamed-parad_sa_sample`"
]
},
{
Expand Down Expand Up @@ -1803,7 +1934,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
"version": "3.11.4"
}
},
"nbformat": 4,
Expand Down
2 changes: 1 addition & 1 deletion book/chapters/epipelargosy.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -820,7 +820,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
"version": "3.11.4"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 7152f48

Please sign in to comment.