-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Feature/smoothing and extrapolating gps coordinates #268
Changes from all commits
77004ab
94d7f93
2251a20
29a7bde
fe8f11d
5e02f5e
537d88f
5c452b3
d96246c
d6d1735
f66251f
8e32b9c
55b5f3a
d86384f
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Large diffs are not rendered by default.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,6 +6,11 @@ | |
import numpy as np | ||
import pandas as pd | ||
import xarray as xr | ||
logging.basicConfig( | ||
format="%(asctime)s; %(levelname)s; %(name)s; %(message)s", | ||
level=logging.INFO, | ||
stream=sys.stdout, | ||
) | ||
logger = logging.getLogger(__name__) | ||
|
||
def parse_arguments_joinl3(debug_args=None): | ||
|
@@ -100,8 +105,20 @@ def readNead(infile): | |
# combining thermocouple and CS100 temperatures | ||
ds['TA1'] = ds['TA1'].combine_first(ds['TA3']) | ||
ds['TA2'] = ds['TA2'].combine_first(ds['TA4']) | ||
|
||
ds=ds.rename(var_name) | ||
|
||
standard_vars_to_drop = ["NR", "TA3", "TA4", "TA5", "NR_cor", | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Are these variables related to the NEAD format or specifically for the files you are reading? 🤔 There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Would it make sense to add them to There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Technically, they are related to the files we are reading (here historical GC-Net).
will shortly be derived for the GEUS stations, and therefore won't need to be skipped anymore in the historical files. and that in the longer term, we could also calculate
and thereafter remove them from the list of variables to skip in the historical files. Once again, keeping this list in one single place (rather than multiple config files) makes every update easier. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we will look at making a smart solution for this in the future, but I think it works fine for now |
||
"z_surf_1", "z_surf_2", "z_surf_combined", | ||
"TA2m", "RH2m", "VW10m", "SZA", "SAA", | ||
"depth_t_i_1", "depth_t_i_2", "depth_t_i_3", "depth_t_i_4", "depth_t_i_5", | ||
"depth_t_i_6", "depth_t_i_7", "depth_t_i_8", "depth_t_i_9", "depth_t_i_10", "t_i_10m" | ||
] | ||
standard_vars_to_drop = standard_vars_to_drop + [v for v in list(ds.keys()) if v.endswith("_adj_flag")] | ||
|
||
# Drop the variables if they are present in the dataset | ||
ds = ds.drop_vars([var for var in standard_vars_to_drop if var in ds]) | ||
|
||
ds=ds.rename({'timestamp':'time'}) | ||
return ds | ||
|
||
|
@@ -116,7 +133,8 @@ def loadArr(infile, isNead): | |
ds = xr.Dataset.from_dataframe(df) | ||
|
||
elif infile.split('.')[-1].lower() in 'nc': | ||
ds = xr.open_dataset(infile) | ||
with xr.open_dataset(infile) as ds: | ||
ds.load() | ||
# Remove encoding attributes from NetCDF | ||
for varname in ds.variables: | ||
if ds[varname].encoding!={}: | ||
|
@@ -202,16 +220,23 @@ def join_l3(config_folder, site, folder_l3, folder_gcnet, outpath, variables, me | |
|
||
filepath = os.path.join(folder_l3, stid, stid+'_hour.nc') | ||
isNead = False | ||
if station_info["project"].lower() in ["historical gc-net", "glaciobasis"]: | ||
if station_info["project"].lower() in ["historical gc-net"]: | ||
filepath = os.path.join(folder_gcnet, stid+'.csv') | ||
isNead = True | ||
if not os.path.isfile(filepath): | ||
logger.info(stid+' is from an project '+folder_l3+' or '+folder_gcnet) | ||
if not os.path.isfile(filepath): | ||
logger.info(stid+' was listed as station but could not be found in '+folder_l3+' nor '+folder_gcnet) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I anticipated that this would trigger an error and raise an exception. If the station list specifies data files, I would have expected those files to exist. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. My intention was that if a historical station is listed but cannot be found, then the latest data (which can most likely be found because it was produced by pypromice) should still be loaded and written in a l3/sites file. So that people fetching the latest data are not affected if there is a mess-up with the historical files' path. |
||
continue | ||
|
||
l3, _ = loadArr(filepath, isNead) | ||
l3, _ = loadArr(filepath, isNead) | ||
|
||
# removing specific variable from a given file | ||
specific_vars_to_drop = station_info.get("skipped_variables",[]) | ||
if len(specific_vars_to_drop)>0: | ||
logger.info("Skipping %s from %s"%(specific_vars_to_drop, stid)) | ||
l3 = l3.drop_vars([var for var in specific_vars_to_drop if var in l3]) | ||
|
||
list_station_data.append((l3, station_info)) | ||
|
||
# Sort the list in reverse chronological order so that we start with the latest data | ||
sorted_list_station_data = sorted(list_station_data, key=lambda x: x[0].time.max(), reverse=True) | ||
sorted_stids = [info["stid"] for _, info in sorted_list_station_data] | ||
|
@@ -246,19 +271,10 @@ def join_l3(config_folder, site, folder_l3, folder_gcnet, outpath, variables, me | |
for v in l3_merged.data_vars: | ||
if v not in l3.data_vars: | ||
l3[v] = l3.t_u*np.nan | ||
|
||
# if l3 (older data) has variables that does not have l3_merged (newer data) | ||
# then they are removed from l3 | ||
list_dropped = [] | ||
for v in l3.data_vars: | ||
if v not in l3_merged.data_vars: | ||
if v != 'z_stake': | ||
list_dropped.append(v) | ||
l3 = l3.drop(v) | ||
else: | ||
l3_merged[v] = ('time', l3_merged.t_u.data*np.nan) | ||
logger.info('Unused variables in older dataset: '+' '.join(list_dropped)) | ||
|
||
if v not in l3_merged.data_vars: | ||
l3_merged[v] = l3_merged.t_u*np.nan | ||
|
||
# saving attributes of station under an attribute called $stid | ||
st_attrs = l3_merged.attrs.get('stations_attributes', {}) | ||
st_attrs[stid] = l3.attrs.copy() | ||
|
@@ -280,6 +296,8 @@ def join_l3(config_folder, site, folder_l3, folder_gcnet, outpath, variables, me | |
|
||
|
||
# Assign site id | ||
if not l3_merged: | ||
logger.error('No level 2 data file found for '+site) | ||
l3_merged.attrs['site_id'] = site | ||
l3_merged.attrs['stations'] = ' '.join(sorted_stids) | ||
l3_merged.attrs['level'] = 'L3' | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific reason why the order of the L1 datasets is reversed for
combine_first
?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is because the logger/transmission files are loaded from older to newer.
In combine first, it starts to load the first element of the list and adds info (variables but also attributes) from the other elements if needed.
It means that the attributes of the oldest elements prevails over the newer elements.
For sites like EGP and KAN_U that switched from one boom to two booms, we need the newest data to be the first element, so that the latest value of attributes such as "number_of_booms" is used instead of older values, and therefore need to reverse that list.