Feature/surface heights and thermistor depths #278

BaptisteVandecrux · 2024-07-08T07:03:30Z

This PR builds on top of #271 as it adds functionality to L2toL3.py.

Surface height processing

The idea of the surface height processing is that there are many cases where automated adjustments can be made. F.e. if the z_pt_cor is working fine, then we can adjust the two SR50 to the end of the ablation period. Or if z_pt_cor is missing but that z_stake is available, then z_stake can be used to describe the ablation of ice instead of the pressure transducer. There are many test being done in combineSurfaceHeight corresponding to recurring situations where the continuity in at least one of the three surface-sensing instruments would allow to adjust the two others.

However, there are still many situations where manual adjustments are needed: e.g. when the station, the stake assembly or the pressure transducer are being maintained, the shift introduced needs to be removed manually. So it is an iterative process between what can be done automatically and what needed to be done manually. A clear tutorial on how to make those adjustment should follow once the PR is implemented.

The result is a z_surf_combined ensuring continuity from year to year and combining information from all three surface-sensing instruments. For the ablation stations, the minimum of z_surf_combined in the past one year (or in other words the "only-decreasing" version of z_surf_combined) represents z_ice_surf. The different between z_surf_combined and z_ice_surf (always positive) is the snow_height.

The surface height processing does the following:

It calculates z_surf_1 from z_boom_u, z_surf_2 from either z_boom_l or z_boom_l, and z_ice_surf from z_pt_cor
After z_surf_1, z_surf_2 and z_ice_surf_1 are created, they are partially adjusted using the QC adjustData function.
The processing is different is the station is defined as site_type = 'accumulation' or site_type = 'ablation' in the updated config files on aws-l0
For the accumulation sites, the z_surf_combined and snow_height are both defined as the average of z_surf_1 and z_surf_2 (after the jumps due to maintenance are being removed)
For the ablation sites, combineSurfaceHeight makes a list of tests and adjustments based on recurrent situations at the station. It first determines for each year the duration of the ablation period (first from z_pt_cor, and if not available, from z_stake, and if not available from the month). Once those periods estimated, then the adjustment loop can begin and is basically a long list of test like "if pressure transducer was working all summer, then adjust the the surface height derived from the two SR50s to the ice surface height on the last ablation day".
Eventually z_surf_combined is taken (to put it simply) as the combination of z_pt_cor and z_stake in the ablation period and as the combination of z_boom_u and z_stake in the summer. It should be continuous from year to year and should take its reference "zero" height as the surface height at installation. It only includes surface processes and therefore does not describes completely the elevation change at that site. For this the height change due to ice dynamic should be added to z_surf_combined
For each timestamp, the minimum of z_surf_combined in the past one year (or in other words the "only-decreasing" version of z_surf_combined) represents z_ice_surf. The different between z_surf_combined and z_ice_surf (always positive) is the snow_height.

Thermistor depth calculation

Besides a continuous surface height, the thermistor depth calculation needs a list of thermistor re-installation and, if available, the non-standard depths at which they were installed. This info was digitized from maintenance sheet and added to the station config files AWS-L0/metadata/station_configurations.

The depth for each thermistor d_t_* s calculated from surface height change (burial when surface height increases and inversely) and depth are reset each time there is a maintenance. These depth then indicate whenever thermistors are likely melting out due to ablation, and therefore can be used to further clean the t_i_* variables.

After those depths are calculates, and interpolation function allows to calculate, for each timestamp the 10 m subsurface temperature t_i_10m (if there are thermistors close enough to that depth). Note that, in the accumulation area, we assume that the snow and firn compaction does not affect the spacing between the thermistor.

https://www.mermaidchart.com/raw/68e721f2-f270-41e4-b1e6-bb8591b3d4a1?theme=light&version=v0.1&format=svg

and removed unnecessary packages

also removed variables newly derived variables form the list of vars to drop in the historical data: "z_surf_1", "z_surf_2", "z_surf_combined", "depth_t_i_1", "depth_t_i_2", "depth_t_i_3", "depth_t_i_4", "depth_t_i_5", "depth_t_i_6", "depth_t_i_7", "depth_t_i_8", "depth_t_i_9", "depth_t_i_10", "t_i_10m" also realized bedrock stations should be considered as accumulation stations: no ice surface height to derive

src/pypromice/resources/variables.csv

src/pypromice/process/L2toL3.py

…info to debug

…vg/alt_avg + save origin as separate attribute

…metadata.csv from L3 files (#277) * added make_metadata_csv.py, made it a CLI * File paths specified rather than inferred (#279) * fixed EOL in file attributes * added project and stations as columns in metadata CSV * update make_metadata_csv.py after review, store location_type attribute from config file into L3 dataset attribute --------- Co-authored-by: Penny How <[email protected]>

…or depth

…er a shift

and removed unnecessary packages

also removed variables newly derived variables form the list of vars to drop in the historical data: "z_surf_1", "z_surf_2", "z_surf_combined", "depth_t_i_1", "depth_t_i_2", "depth_t_i_3", "depth_t_i_4", "depth_t_i_5", "depth_t_i_6", "depth_t_i_7", "depth_t_i_8", "depth_t_i_9", "depth_t_i_10", "t_i_10m" also realized bedrock stations should be considered as accumulation stations: no ice surface height to derive

…info to debug

…vg/alt_avg + save origin as separate attribute

…metadata.csv from L3 files (#277) * added make_metadata_csv.py, made it a CLI * File paths specified rather than inferred (#279) * fixed EOL in file attributes * added project and stations as columns in metadata CSV * update make_metadata_csv.py after review, store location_type attribute from config file into L3 dataset attribute --------- Co-authored-by: Penny How <[email protected]>

…or depth

…er a shift

* L2toL3 test added * 3.8 and 3.9 tests removed * tests only for 3.10 and 3.11 * troubleshooting * echo syntax changed * updated input file paths

…://github.com/GEUS-Glaciology-and-Climate/pypromice into feature/surface-heights-and-thermistor-depths

+ caught exception when there's insufficient data for resample + set default values of station_config when no config file is found + better adjustment of height in join_l3 when there's overlap between old and new data

datasets are ordered based on their first timestamp and in reverse chronological order

…pths

Reimplementing edits from Mads that have been removed during the merge of `develop` into this branch some frequency and iloc updated to remove deprecation warning

* Update .gitignore * L2 split from L3 CLI processing * unit tests moved to separate module * file writing functions moved to separate module * Loading functions moved to separate module * Handling and reformating functions moved * resampling functions moved * aws module updated with structure changes * get_l2 and l2_to_l3 process test added * data prep and write function moved out of AWS class * stations for testing changed * creating folder before writing files, writing hourly daily monthly files out in L2toL3, trying not to re-write sorted tx file if already sorted * update get_l3 to add historical data * resampling frequency specified * renamed join_levels to join_l2 because join_l3 will have different merging function, attribute management and use site_id and list_station_id * skipping resample after join_l2, fixed setup.py for join_l2 * fixing test * fixed function names * update get_l3 to add historical data * update get_l3 to add historical data * Create get_l3_new.py * further work on join_l3, varible_aliases in ressource folder * cleaning up debug code in join_l3 * small fix in join_l3 * working verion * delete encoding info after reading netcdf, debug of getColNames * delete get_l3.py * removing variables and output files metadata * new variable description files * added back ressource files, use level attributes for output definition * make list of sites from station_config, switched print to logger.info * removing get_l3, remove inst. values from averaged files, fixes on logging, attributes and tests, * Updates to numpy dependency version and pandas deprecation warnings (#258) * numpy dependency <2.0 * resample rules updated (deprecation warning) * fillna replaced with ffill (deprecation warning) * get_l3 called directly rather than from file * process action restructured * small changes following review, restored variable.csv history renamed new variable.csv moved old variable.csv renamed new variables.csv recreate variables.csv * buiding a station list instead of a station_dict * renamed l3m to l3_merged, reintroduced getVars and getMeta * moving gcnet_postprocessing as part of readNead * sorting out the station_list in reverse chronological order * using tilde notation in setup.py * better initialisation of station_attributes attribute * moved addMeta, addVars, roundValues, reformatTime, reformatLon to write.py * Inline comment describing enocding attribute removal when reading a netcdf * loading toml file as dictionary within join_l3 instead of just reading the stid to join * ressources renamed to resources (#261) * using project attribute of a station locate AWS file and specify whether it's a Nead file * update test after moving addVars and addMeta * fixed logger message in resample * better definition of monthly sample rates in addMeta * dummy datasaet built in unit test now has 'level' attribute * not storing timestamp_max for each station but pulling the info directly from the dataset when sorting * removing unecessary import of addMeta, roundValues * make CLI scripts usable within python * return result in join_l2 and join_l3 * removing args from join_l2 function * proper removal of encoding info when reading netcdf * Refactored and Organized Test Modules - Moved test modules and data from the package directory to the root-level tests directory. - Updated directory structure to ensure clear separation of source code and tests. - Updated import statements in test modules to reflect new paths. - Restructured the tests module: - Renamed original automatic tests to `e2e` as they primarily test the main CLI scripts. - Added `unit` directory for unit tests. - Created `data` directory for shared test data files. This comprehensive refactoring improves project organization by clearly separating test code from application code. It facilitates easier test discovery and enhances maintainability by following common best practices. * Limited the ci tests to only run e2e * naming conventions changed * Feature/smoothing and extrapolating gps coordinates (#268) * implemented gps postprocessing on top of the #262 This update: - clears up the SHF LHF calculation - reads dates of station relocations (when station coordinates are discontinuous) from the `aws-l0/metadata/station_configurations` - for each interval between station relocations, a linear function is fitted to the GPS observations of latitude longitude and altitude and is used to interpolate and extrapolate the gps observations - these new smoothed and gap-free coordinates are the variables `lat, lon, alt` - for bedrock stations (like KAN_B) static coordinates are used to build `lat, lon, alt` - eventually `lat_avg`, `lon_avg` `alt_avg` are calculated from `lat, lon, alt` and added as attributes to the netcdf files. Several minor fixes were also brought like: - better encoding info removal when reading netcdf - printing to files variables full of NaNs at `L2` and `L3/stations` but not printing them in the `L3/sites files`. - recalculate dirWindSpd if needed for historical data - due to xarray version, new columns need to be added manually before a concatenation of different datasets in join_l3 * Updated persistence.py to use explicit variable thresholds Avoided applying the persistence filter on averaged pressure variables (`p_u` and `p_l`) due to their 0 decimal precision often leading to incorrect filtering. * Fixed bug in persistence QC where initial repetitions were ignored * Relocated unit persistence tests * Added explicit test for `get_duration_consecutive_true` * Renamed `duration_consecutive_true` to `get_duration_consecutive_true` for imperative clarity * Updated python version in unittest * Fixed bug in get_bufr Configuration variables were to strictly validated. * Made bufr_integration_test explicit * Added __all__ to get_bufr.py * Applied black code formatting * Made bufr_to_csv as cli script in setup.py * Updated read_bufr_file to use wmo_id as index * Added script to recreate bufr files * Added corresponding unit tests * Added flag to raise exceptions on errors * Added create_bufr_files.py to setup * Updated tests parameters Updated station config: * Added sonic_ranger_from_gps * Changed height_of_gps_from_station_ground from 0 to 1 * Added test for missing data in get_bufr - Ensure get_bufr_variables raises AttributeError when station dimensions are missing * Updated get_bufr to support static GPS heights. * Bedrock stations shouldn’t depend on the noisy GPS signal for elevation. * Added station dimension values for WEG_B * Added corresponding unittest * Updated github/workflow to run unittests Added eccodes installation * Updated get_bufr to support station config files in folder * Removed station_configurations.toml from repository * Updated bufr_utilities.set_station to validate wmo id * Implemented StationConfig io tests * Extracted StationConfiguration utils from get_bufr * Added support for loading multiple station configuration files Other * Made ArgumentParser instantiation inline * Updated BUFRVariables with scales and descriptions * Added detailed descriptions with references to the attributes in BUFRVariables * Change the attribute order to align with the exported schema * Changed variable roundings to align with the scales defined in the BUFR schemas: * Latitude and longitude is set to 5. Was 6 * heightOfStationGroundAboveMeanSeaLevel is set to 1. Was 2 * heightOfBarometerAboveMeanSeaLevel is set to to 1. Was 2 * pressure is set to -1. Was 1. Note: The BUFRVariable unit is Pa and not hPA * airTemperature is set to 2. Was 1. * heightOfSensorAboveLocalGroundOrDeckOfMarinePlatformTempRH is set to 2. Was 4 * heightOfSensorAboveLocalGroundOrDeckOfMarinePlatformWSPD is set to 2. Was 4 * Added unit tests to test the roundings * Updated existing unit tests to align with corrected precision * Increased the real_time_utilities rounding precisions * Updated get_bufr to separate station position from bufr * The station position determination (AWS_latest_locations) is separated from the bufr file export * Updated the unit tests Corrected minimum data check to allow p_i or t_i to be nan Renamed process_station parameters for readability * Rename now_timestamp -> target_timestamp * Rename time_limit -> linear_regression_time_limit Applied black * Minor cleanup * Updated StationConfiguration IO to handle unknown attributes from input * Updated docstring in create_bufr_files.py * Renamed e2e unittest methods Added missing "test" prefix required by the unittest framework. * Feature/surface heights and thermistor depths (#278) * processes surface heights variables: `z_surf_combined`, `z_ice_surf`, `snow_height`, and thermistors' depths: `d_t_i_1-11` * `variable.csv` was updated accordingly * some clean-up of turbulent fluxes calculation, including renaming functions * handling empty station configuration files and making errors understandable * updated join_l3 so that surface height and thermistor depths in historical data are no longer ignored and to adjust the surface height between the merged datasets * calculated either from `gps_lat, gps_lon, gps_alt` or `lat, lon, alt`, static values called `latitude`, `longitude` and `altitude` are saved as attributes along with `latitude_origin`, `longitude_origin` and `altitude_origin` which states whether they come from gappy observations `gps_lat, gps_lon, gps_alt` or from gap-filled postprocess `lat, lon, alt` * changed "H" to "h" in pandas and added ".iloc" when necessary to remove deprecation warnings * made `make_metadata_csv.py` to update latest location in `aws-l3/AWS_station_metadata.csv` and `aws-l3/AWS_sites_metadata.csv` --------- Co-authored-by: Penny How <[email protected]> * L2toL3 test added (#282) * 3.8 and 3.9 tests removed, tests only for 3.10 and 3.11 * echo syntax changed * updated input file paths --------- * better adjustment of surface height in join_l3, also adjusting z_ice_surf (#289) * different decoding of GPS data if "L" is in GPS string (#288) * Updated pressure field for BUFR output files * Updated get_l2 to use aws.vars and aws.meta get_l2 were previously also loading vars and meta in addition to AWS. AWS is populating meta with source information during instantiation. * Removed static processing level attribute from file_attributes * Run black on write.py * Implemented alternative helper functions for reading variables and metadata files * Refactor getVar getMeta * Use pypromice.resources instaed of pkg_resources * Select format from multiple L0 input files The format string was previously selected from the last l0 file. * Updated attribute metadata * Added test case for output meta data * Added json formatted source string to attributes * Added title string to attributes * Updated ID string to include level * Added utility function for fetching git commit id * Updated test_process with full pipeline test * Added test station configuration * Cleanup test data files * Removed station configuration generation * Renamed folder name in temporaty test directory * Added data issues repository path as an explicit parameter to AWS * Added data issues path to process_test.yml * Applied black on join_l3 * Updated join_l3 to generate source attribute for sites Validate attribute keys in e2e test * job name changed * Bugfix/passing adj dir to l3 processing plus attribute fix (#292) * passing adjustment_dir to L2toL3.py * fixing attributes in join_l3 - station_attribute containing info from merged dataset was lost when concatenating the datasets - The key "source" is not present in the attributes of the old GC-Net files so `station_source = json.loads(station_attributes["source"])` was throwing an error * give data_issues_path to get_l2tol3 in test_process * using data_adjustments_dir as input in AWS.getL3 * adding path to dummy data_issues folder to process_test * making sure data_issues_path is Path in get_l2tol3 --------- Co-authored-by: PennyHow <[email protected]> Co-authored-by: Mads Christian Lund <[email protected]>

* processes surface heights variables: `z_surf_combined`, `z_ice_surf`, `snow_height`, and thermistors' depths: `d_t_i_1-11` * `variable.csv` was updated accordingly * some clean-up of turbulent fluxes calculation, including renaming functions * handling empty station configuration files and making errors understandable * updated join_l3 so that surface height and thermistor depths in historical data are no longer ignored and to adjust the surface height between the merged datasets * calculated either from `gps_lat, gps_lon, gps_alt` or `lat, lon, alt`, static values called `latitude`, `longitude` and `altitude` are saved as attributes along with `latitude_origin`, `longitude_origin` and `altitude_origin` which states whether they come from gappy observations `gps_lat, gps_lon, gps_alt` or from gap-filled postprocess `lat, lon, alt` * changed "H" to "h" in pandas and added ".iloc" when necessary to remove deprecation warnings * made `make_metadata_csv.py` to update latest location in `aws-l3/AWS_station_metadata.csv` and `aws-l3/AWS_sites_metadata.csv` --------- Co-authored-by: Penny How <[email protected]> * L2toL3 test added (#282) * 3.8 and 3.9 tests removed, tests only for 3.10 and 3.11 * echo syntax changed * updated input file paths ---------

BaptisteVandecrux and others added 2 commits July 5, 2024 16:04

bring changes on top of #268

eb341f5

better handle of empty config and changing length of string

94ff617

and removed unnecessary packages

BaptisteVandecrux force-pushed the feature/surface-heights-and-thermistor-depths branch from 6b003b4 to 94ff617 Compare July 8, 2024 07:07

PennyHow reviewed Jul 9, 2024

View reviewed changes

src/pypromice/resources/variables.csv Outdated Show resolved Hide resolved

src/pypromice/process/L2toL3.py Outdated Show resolved Hide resolved

src/pypromice/process/L2toL3.py Outdated Show resolved Hide resolved

BaptisteVandecrux and others added 25 commits July 9, 2024 19:03

changed order of variables in variable.csv

839a182

catching cases with missing variables, ensuring lon<0, switch logger …

dbaa231

…info to debug

rename functions

bb85f47

define latitude/longitude/altitude attribute instead of lat_avg/lon_a…

afdcb6a

…vg/alt_avg + save origin as separate attribute

removing z_surf_1 and z_surf_2 from output variable list

83caa59

Better description of coordinate variables

d88bc9a

small adjustments following latest assessment

fced84a

few height adjustment

d73c7b1

switch order of code to get desired output, NaN for negative thermist…

6589796

…or depth

some gapfilling of ice surface height, recalculation of hs_winter aft…

f0d7ec9

…er a shift

bring changes on top of #268

c0aa0a0

better handle of empty config and changing length of string

7490aee

and removed unnecessary packages

changed order of variables in variable.csv

26a2d34

catching cases with missing variables, ensuring lon<0, switch logger …

7190158

…info to debug

rename functions

8116408

define latitude/longitude/altitude attribute instead of lat_avg/lon_a…

2586c3d

…vg/alt_avg + save origin as separate attribute

removing z_surf_1 and z_surf_2 from output variable list

ea7f7a7

Better description of coordinate variables

9ca3bee

small adjustments following latest assessment

5262a50

few height adjustment

6ad64cb

switch order of code to get desired output, NaN for negative thermist…

886551d

…or depth

some gapfilling of ice surface height, recalculation of hs_winter aft…

d47200e

…er a shift

L2toL3 test added (#282)

49ca5b5

* L2toL3 test added * 3.8 and 3.9 tests removed * tests only for 3.10 and 3.11 * troubleshooting * echo syntax changed * updated input file paths

ladsmund force-pushed the feature/surface-heights-and-thermistor-depths branch from b4becc1 to 49ca5b5 Compare August 8, 2024 13:49

BaptisteVandecrux and others added 9 commits August 14, 2024 07:24

Merge branch 'feature/surface-heights-and-thermistor-depths' of https…

b156dde

…://github.com/GEUS-Glaciology-and-Climate/pypromice into feature/surface-heights-and-thermistor-depths

catching error when metadata csv is empty

cd144ea

Merge branch 'feature/surface-heights-and-thermistor-depths' of https…

8098629

…://github.com/GEUS-Glaciology-and-Climate/pypromice into feature/surface-heights-and-thermistor-depths

making surface height calc failsafe

bd1a8ce

+ caught exception when there's insufficient data for resample + set default values of station_config when no config file is found + better adjustment of height in join_l3 when there's overlap between old and new data

fixing error in persistence

975f19f

adjusting order calculation before joining in join_l3

3897140

datasets are ordered based on their first timestamp and in reverse chronological order

more instructive error messages + some deprecation fixes

2818bcb

Merge branch 'develop' into feature/surface-heights-and-thermistor-de…

885fae3

…pths

fixing tests

03b7546

Reimplementing edits from Mads that have been removed during the merge of `develop` into this branch some frequency and iloc updated to remove deprecation warning

BaptisteVandecrux force-pushed the feature/surface-heights-and-thermistor-depths branch from 52c2653 to 03b7546 Compare August 15, 2024 19:57

BaptisteVandecrux merged commit 777c5e8 into develop Aug 15, 2024
3 checks passed

BaptisteVandecrux deleted the feature/surface-heights-and-thermistor-depths branch August 15, 2024 20:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/surface heights and thermistor depths #278

Feature/surface heights and thermistor depths #278

BaptisteVandecrux commented Jul 8, 2024 •

edited

Loading

Feature/surface heights and thermistor depths #278

Feature/surface heights and thermistor depths #278

Conversation

BaptisteVandecrux commented Jul 8, 2024 • edited Loading

Surface height processing

Thermistor depth calculation

BaptisteVandecrux commented Jul 8, 2024 •

edited

Loading