-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inconsistent directory hierarchy #11
Comments
I agree with Chus. It will be helpful for us to document our outputs in our community paper. |
The protocol has been made consistent and all simulation paths should include the version folder. |
Should all the paths content the same version or can it be institution dependent? |
this is institution-dependent as it is a representative date for a given simulation output: the day it was finished, the day you started uploading the data to the server, or any other meaningful date. It is important not to mix dates if the data come in the same batch. E.g. do not put a different version to a given file just because it was uploaded one day after the rest. Decide a date for a given batch, and use it consistently. |
Thanks, much more clearer now! |
One needs a PhD and several postdocs on the topic to get it nearly right... I managed to get at least the paths close to right (in the fixed-drs-paths branch), rebuilding the directory structure and linking the files from the faulty one. With this, I could plot all simulations together: The image is devastating, as the matrix should be packed with blue squares. It allows to easily identify naming problems. If your model contributes to make the matrix sparse, you very likely have problems with your data. |
So, to understand the figure, the more blue, the better the postprocess? |
Also, maybe this should be another thread, but again, in order to recognize if a variable is instantaneous or statistically retrieved, one need to open the file and retrieve this information from the attributes, cell_methods, etc...? This is not directly inferred from the name of the file. |
All the experiments provide a table with the list of variables that are required by the experiment (CORDEX, FPSCONV, FPS-URBAN etc.). In this table for FPS-URBAN it is explicitly indicated in which form the variable are required to be shaped (column ag). Sometimes it happens that the model provides instantaneous values, but accumulated form is required. In WRF an example for this is surface fluxes - in case of hfss the variable is calculated from HFX like - [HFX(time1)+HFX(time2)]/2. I think the example you are showing is for a GCM model, where file naming is a bit different. I believe in Amon - A refers to the Atmospheric part of a GCM, and mon is monthly frequency. |
Yes, this would lead to a cleaner table. I will do (#19). However, this will hide not only extra variables/frequencies, but also wrong names which we still need to fix
The figure is not that advanced. Colored cells only indicate that a particular variable and frequency (x axis) is available for a particular model and realization (y axis). It only checks the filenames. Many other things can go wrong inside the file. Still, if you look at the x axis, you will see many variables which were not requested and also some with wrong spelling. In the parts of the plot where your model is the only blue row (i.e. no other model is providing this variable), there's likely a problem. Some problems do not appear, because I already fixed in my script (not in the data!) some mismatches and reported the problems as issues here (e.g. #15 #16 #17 #18). |
The directory hierarchy is inconsistent at the moment, with half of the simulations providing a version directory (e.g. v20241121) between the variable folder and the actual files:
And the other half not including it:
The protocol says that this version folder should be present (but there are inconsistent examples). In any case, we need to make a decision and change the file hierarchy and/or the protocol accordingly.
Apart from that, there are many other inconsistencies regarding the variable frequencies, version_realization strings, source_ids including the institution, ... (check here for your institution ) . All this needs to be fixed ASAP to go on with the analyses.
The text was updated successfully, but these errors were encountered: