Skip to content

Post processing of UM output

dangrosvenor edited this page Apr 28, 2023 · 5 revisions

Instructions for Dan Grosvenor's scripts

https://groups.io/g/AtmosModellingLeeds/wiki/29650

https://github.com/dangrosvenor/CEMAC_UM_TIPs/wiki/Instructions-for-using-Dan-Grosvenor's-post-processing-scripts

Meeting where the way different people deal with post processing was discussed

On 23rd Jan, 2018 we had a cloud group meeting and discussed how we each approach post-processing of UM output (generated on Monsoon), what our bottlenecks are and what might be the best approach going forward - with the idea of standardising the approach as much as possible, so that we have shared tools. Hopefully, this will save duplication of efforts and is less likely to lead to mistakes in calculations, scripts, etc. Plus, when we have new people working on the UM they will be able to follow these procedures, hopefully making it easier for them and for the supervisors.

Useful script examples

• Jesus developed some tools in Python that are likely to form the basis of a standard approach. See :-

○ Jesus's GitHub :-

§ https://github.com/Numlet/UKCA_postproc

Slack channel on Jesus's scripts :-

https://icas-glomap.slack.com/messages/C3N5SP396/

• Annette Miltenberger had some Fortran scripts, etc. that may be useful and could be incorporated into Jesus's scripts for example (e.g., call the Fortran from Python) for speed.

• CIS tools (command line tools for regridding model output, sampling datasets at the same times, etc.) might also be useful, see :-

CIS tools

Approaches that people use :-

Phil

• Extracts from MASS for individual variables.

• Uses C - reads in pp files.

• First time gets all the headers and saves to a file, then reads are quicker afterwards.

○ Pp2nc also does this.

Annette

• Speed is an issue since she produces output from ensemble runs - IRIS is quite slow.

• Has issues with the slowness of IRIS, especially for reading pp files, but also for NetCDF (although they are a bit quicker).

• Archives using shell scripts, which also convert to NetCDF using pp2nc tool.

• Retrieves from MASS onto JASMIN - the Met Office does not like large files being transferred (or at least this used to be the case).

• Fortran to process NetCDF files to calculate variables (e.g. radar reflectivity, LWP - although she tries to do this online in the UM if possible).

• Hamish

• Using Python on Jasmin - transferring via MASS?

• Wants to run some things on Monsoon (e.g. 3D to 2D calculations) before transferring to Jasmin in order to reduce transfer bottleneck.

• Dan

• Processing pp files direct from output on Monsoon using postproc using Python-IRIS - e.g. convert from 3D to 2D - save as NetCDF.

• Transfer NetCDFs elsewhere for further processing, plotting, etc.

• Fast, but perhaps not sustainable since might require long term storage of files on Monsoon.

• Transfer to Jasmin is possible, but is a bottleneck.

Other notes

• Reading of PP files is very slow since the headers are spread throughout the file. Once converted to NetCDF this is no longer the case and access should be much quicker. Or Phil has a C tool that writes out the headers upon reading for the first time.

• Can use the queues on xcs and Jasmin for post-processing. See here for an example on Jasmin :-

• Submitting jobs to queues on JASMIN - see here :-

Submitting to queues on Jasmin

• Allows more memory to be requested and is better for resource management, etc.

• Transferring files to JASMIN - the official method is to use MASS

• Also people do this from xcs if needed (need to have ssh keys running, though - i.e. follow the process used to log into Jasmin, but on the xcs - copy ssh key to xcs, run ssh-setup script and ssh-add).

○ But not for v. large files (Annette got told off for old machine).

○ Definitely don't do using postproc.

• Panopoly (Hamish)

• Better version of xconv.

• Ask Joanne from IT (Rachel).

• Ryan Neely uses too.

• Using programming classes might be useful (C or Python).

• MASS job issues - depends on how busy it is - might need to increase queue time.

• Use of cdo (NetCDF tool) might prove useful to e.g. add together files, do simple arithmetic, etc. - writes output to another file.

• Can also do for more complicated calculations, e.g., air density.

• Call C/Fortran from Python for speed? Or easier to call the Fortran directly? Although this requires a separate Fortran stage and writing the output to file.

Way forward?

• Convert pp2nc using special convsh script.

○ Or could also use Jesus's scripts - but these are likely slower and use a lot of memory - would require use of the queues on xcs or Jasmin.

• Hamish will try to convert Jesus's scripts for L0 to L1 (e.g. 3D to 2D for each time, e.g., LWP, etc.) into Fortran.

• Although, for adding new variables, which are a bit complicated, e.g. screening for high cloud, CF>80%, etc. (using 3D fields) using Fortran might be more tricky - since debugging could be hard.

• Test and streamline Jesus's scripts for large UM output situations (nested UKCA, CASIM, etc.)

• Standardize which files to put variables in which p[a-z] files?

• Or, develop tools that go through each file to find the variables needed (likely quick once converted to NetCDF).