Skip to content

Latest commit

 

History

History
452 lines (420 loc) · 16.7 KB

README.org

File metadata and controls

452 lines (420 loc) · 16.7 KB

pyshemkf: Python environment for handling SHEMAT-Suite output of EnKF simulations

pyshemkf (py thon environment for handling shem at-suite output of en kf simulations) is a python environment for post-processing SHEMAT-Suite (Rath 2006) output.

pyshemkf can generate graphs and 2D-figures of variables and parameters provided by typical output of SHEMAT-Suite.

Additionally, pyshemkf contains scripts and functions for the evaluation and visualization of RMSE distributions from Ensemble Kalman Filter (EnKF, Evensen 2003) synthetic experiments computed with SHEMAT-Suite.

For Python scripts used in the WRR paper Keller 2018 written by the author, see Manuscript scripts. The version of the Python scripts, which corresponds to the time of the work on the manuscript, has been saved under:

https://zenodo.org/badge/DOI/10.5281/zenodo.1344337.svg

The data used by the Python scripts is saved under:

https://zenodo.org/badge/DOI/10.5281/zenodo.1343571.svg


Table of Contents

Prerequisites

The scripts are written for python3.7.3. They make use of the following Python modules:

  • matplotlib, matplotlib.pyplot, matplotlib.cm, matplotlib.colors, matplotlib.mlab are used for plotting
  • numpy, scipy, scipy.stats provide numerical and matrix functions
  • shutil, os, subprocess, shlex are used to run commands on the operating system
  • string provides string manipulation utilities
  • vtk and h5py provide functions to read vtk and hdf files.

The location of the included module pskf has to be added to the environment variable PYTHONPATH:

export PYTHONPATH=${PYTHONPATH}:<path_to>/pyshemkf/site-packages

Another important prerequisite of pyshemkf is the proper SHEMAT-Suite output directory structure (see SHEMAT-Suite output directory).

Tutorial

Getting started with pyshemkf. This tutorial explains the usage of a typical script of pyshemkf.

Checking Python modules

A good point to start is the script runplot.py in the forward directory:

$HOME/pyshemkf/scripts/forward/runplot.py

This script is made up of different parts. We will discuss the first three parts of the script.

  1. Loading Python modules
  2. Setting switches
  3. Setting output specifications

At first, Python modules are loaded

import numpy as np
import matplotlib.pyplot as plt

from pskf.tools.plot import specs as sc
import pskf.scripts.forward.arrays as fa
import pskf.scripts.forward.plot as fp
import pskf.scripts.forward.read as fr
import pskf.tools.run.pythonmodule as pm
import pskf.tools.plot.plotfunctions as pf

The first lines load general modules, the later lines load modules from the included modules pskf (see Module =pskf=).

The next part of the script contains switches.

# Switches
is_read = 0
is_plot = 0
is_save = 0
is_show = 0
is_backup = 0

These switches determine, whether following parts of the script are executed.

After the switches, output specifications are set. These specifications always include the model name, the date of the output directory and the letter of the output directory. Additionally, tag-specific specifications can be set. For the tag forward, the name of the variable to be included in the figure is saved under varname.

# Specs
model_name = 'model'
dat = '2010_01_30'

let = 'a'
varname = 'uindex'

At this point, we have discussed the first part of a typical script.

Now please run the IPython script

ipython $HOME/pyshemkf/scripts/forward/runplot.ipy

Since all switches are set to zero, this execution should have produced no output. Executing the script was rather to test whether all Python modules are loaded. If you receive an ImportError, you should do one of the following two things:

  1. ImportError for numpy or matplotlib: Check your general Python setup.
  2. ImportError for pskf: Most probably, PYTHONPATH is not set to include pyshemkf (see Prerequisites)

If no error message is shown, you can move forward to Reading SHEMAT-Suite output.

Reading SHEMAT-Suite output

If all Python modules are loaded correctly, the is_read switch can be set to 1.

# Switches
is_read = 1
is_plot = 0
is_save = 0
is_show = 0
is_backup = 0

If you re-run the script now, the execution will yield the following error:

IOError: [Errno 2] No such file or directory:
'$HOME/shematOutputDir/model_output/2010_01_30/2010_01_30_a/samples_output/MODEL_EO_time_out_0.vtk'

The directory in the error message is generated from the specifications under # Specs. The error message tells us which output file the script is looking for and how the SHEMAT-Suite output directories should be structured.

To find the right file we need to:

  1. Set up the correct SHEMAT-Suite output directory
  2. Put the right specifications under # Specs.

If the output files are still not found, the function fr.read offers the possibility to explicitly specify directory and name of a given vtk-output using the function parameters fname and fdir.

# Read
if is_read:
    numpy_array, numpy_array_name = fr.read(
        model_name,
        dat,
        let,
        varname=varname,
        fname='example_output.vtk'
        fdir='example_dir/to/output'
    )

    np.save(numpy_array_name, numpy_array)
    print('Saved as ' + numpy_array_name)

After successfully saving the python array, its name is printed.

Saved as $HOME/pyshemkf/output/forward/npy/<varname>_<model_name>_<dat>_<let>_1.npy

If this message is displayed after executing runplot.py, you can move on to Plotting.

Plotting

If the numpy array is saved, the switches of runplot.py can be changed as follows:

# Switches
is_read = 0
is_plot = 1
is_save = 0
is_show = 1
is_backup = 0

The output specifications can (and should) be left the same as for reading the output files.

While calling the plotting routine, the appearance of the plot can be influenced through input parameters.

# Plot
if is_plot:

    # Figure
    fig = plt.figure(1, figsize=[15, 10])

    # Run plot function
    ax, pic_name = fp.plot(
        fig.add_subplot(1, 1, 1),
        model_name,
        dat,
        let,
        varname=varname,
    )

    # Monitoring points
    ax = pf.scatter(
        ax,
        model_name,
        dat,
        let,
    )

    # Colorbar
    cb_ax = pf.cb(
        fig.add_subplot(1, 2, 1),
        ax,
        varname=varname,
    )

    # Save
    if is_save:
        plt.savefig(pic_name)
        print('Saved as ' + pic_name)

    # Show
    if is_show:
        plt.show()
    else:
        plt.clf()

Via the switch is_save, the figure can be saved, via the switch is_backup, a backup of runplot.ipy is generated in the subdirectory backup/.

In the case of forward, monitoring points are included as well as a colorbar. If these function calls cause any problems (for example, when there are no monitoring points in the given SHEMAT-Suite output), they can be removed.

General information

First, the directory structure of pyshemkf is explained. Then, a naming convention for directories of SHEMAT-Suite output is introduced. This naming convention is required for compatibility with pyshemkf.

pyshemkf structure

There are three directories in the root directory of pyshemkf: One for output, one for IPython-scripts and one for the Python module pskf.

output/

Directory for all output. output/ has one subdirectory for each tag (see Tags). Each of these tag-subdirectories contains subdirectories, whose names correspond to file endings: npy/, png/, pdf/ and eps/. The scripts of pyshemkf write output of a format to the directory with the corresponding name. Example:

$HOME/pyshemkf/output/pdf/example_output.pdf

The directories dists (output/dists/) and specs (output/specs/) contain only numpy arrays in the subdirectory npy/. dists contains RMSE distributions, specs contains specifications of the simulated model (for example the discretization).

/scripts

IPython scripts for reading and plotting SHEMAT-Suite output sorted by tags (see Tags, Scripts).

/site-packages/pskf

Module containing functions used by the IPython scripts of pyshemkf. Some functions (for reading and plotting) are meant to be used by specific IPython scripts in /scripts, others are general functions used throughout pyshemkf (see Module =pskf=).

For the module pskf to be loaded by Python, its path has to be added to the environment variable PYTHONPATH (see Prerequisites).

SHEMAT-Suite output directory

pyshemkf needs a specific naming convention of SHEMAT-Suite output directories. A single output directory should be named as follows:

$HOME/shematOutputDir/<model_name>_output/<dat>/<dat>_<let>

An example with <model_name>=wavereal, <dat>=2010_01_30, <let>=a:

$HOME/shematOutputDir/wavereal_output/2010_01_30/2010_01_30_a

Inside the SHEMAT-Suite output directories, input files are saved alongside output directories.

  • Input files
    • general input file <MODEL_NAME> (WAVEREAL)
    • true input file <MODEL_NAME>_TRUE (WAVEREAL_TRUE)
    • EnKF input file <MODEL_NAME>.enkf (WAVEREAL.enkf)
    • SGSim input files sgsim_k_<modelname>_true.par (sgsim_k_wavereal_true), sgsim_k_<modelname>.par (sgsim_k_wavereal)
  • Output directories
    • samples_output/: forward output
    • enkf_output/: EnKF output
    • single_cell_output/: output at single cells

Tags

Tags are used to organize different groups of read and plot routines. They determine the output-path, the script-path and the path of to the function definitions of pskf.

There are two groups of tags in pyshemkf, corresponding roughly to the following functionalities: analysis, forward, monitor and singlecell are scripts reading general SHEMAT-Suite. errorplot, gaussianity and numcomp provide visualization of RMSE distributions of large numbers of EnKF synthetic experiments.

analysis

2D-Images of ensemble mean variable/parameter fields or single realization variable/parameter fields from EnKF-simulations in SHEMAT-Suite.

errorplot

Figures showing RMSE means of different EnKF-methods.

forward

2D-Images of variable/parameter fields in a single forward run of SHEMAT-Suite.

gaussianity

RMSE distributions from a large number of EnKF synthetic experiments with SHEMAT-Suite.

monitor

Visualizing monitoring point output from SHEMAT-Suite.

numcomp

Matrix plots visualizing RMSE statistics from a large number of EnKF synthetic experiments with SHEMAT-Suite.

singlecell

Visualizing single cell output from SHEMAT-Suite.

Scripts

endresread.ipy

The script endresread.ipy (scripts/endresread.ipy) is not part of one of the scripting tags. It has the preliminary task of reading RMSE distributions from SHEMAT-Suite output.

tag-scripts

For each tag, there is a runplot.ipy general script that calls the read and plot functions from pskf (see Module =pskf=). If wanted, numpy arrays and figures are saved, figures are shown and a backup of the script is generated in the corresponding backup directory.

templates

A /scripts/templates directory will not be part of the git-repository and can for example be used for new scripts, before they are ready to be committed to the repository.

Manuscript scripts

The following scripts generate the figures of the WRR manuscript associated with this repository:

Module pskf

scripts

The functions in the scripts directory (/site-packages/pskf/scripts/) are tag-specific, i.e. they are meant to be used by the runplot.ipy scripts under a certain tag (for example analysis). Three typical file types exist in one tag directory:

tools

The tools directory (/site-packages/pskf/tools/) contains general functions (opposed to the tag-specific functions in scripts).

plot

General variables and functions related to plotting.

plotarrays

Important collection of dates, letters, number of runs and number of observations for different EnKF runs. According to this information, specifiers for the different output are defined and standardized.

plotfunctions

Plotting functions for handling vtk-input, grid properties, colormaps, colorbars, scatterplots, hdf (not yet fully tested).

specs

Utility functions for reading grid properties from SHEMAT-Suite output files in SHEMAT-Suite output directories. Important functions defining the specifiers used to standardize output of the IPython scripts.

run

pythonmodule

Python-related directory variables

  • python_dir
  • python_scripts_dir
  • python_output_dir

Python-related functions for generating specific directories, filenames for saving and backups.

runmodule

General utility functions for replacing strings, make temporal files, handling letter endings of specifiers, running shell scripts, reading and manipulating SHEMAT-Suite input files, compiling SHEMAT, running matlab, generating lists of SHEMAT-Suite specific files and directories. Some of these functions are used in scripts to run SHEMAT-Suite that are not part of the pyshemkf repository.

References

Rath 2006

Rath, V., Wolf, A., & Bücker, H. M., Joint three-dimensional inversion of coupled groundwater flow and heat transfer based on automatic differentiation: sensitivity calculation, verification, and synthetic examples, Geophysical Journal International, 167(1), 453–466 (2006). http://dx.doi.org/10.1111/j.1365-246x.2006.03074.x

Evensen 2003

Evensen, G., The ensemble kalman filter: theoretical formulation and practical implementation, Ocean Dynamics, 53(4), 343–367 (2003). http://dx.doi.org/10.1007/s10236-003-0036-9

Keller 2018

Keller, J., Hendricks Franssen, H.-J., & Marquart, G. (2018). Comparing seven variants of the ensemble Kalman filter: How many synthetic experiments are needed? Water Resources Research, 54. https://doi.org/10.1029/2018WR023374