Implement BoutOptionsFile #94

TomNicholas · 2019-12-15T12:51:27Z

We want to fully integrate a class structure storing all input file options.

We can't just use boutdata.BoutOptionsFile though, because it uses boututils.DataFile in places, so it will need at least a partial rewrite to work with xBOUT.

I actually tried to see if there was a way to rewrite BoutOptionsFile completely (aiming for a much shorter, less home-grown implementation), but the unique structure of BOUT.inp files foiled me. BOUT.inp files are similar to .ini files and YAML files, but have distinguishing properties which make handling them with variations of standard python library config classes difficult:

Missing initial section header. This is annoying but solvable - you can still read and write .inp files with configparser.ConfigParser but you have to special-case the first line.
Nested sections. This rules out easily subclassing configparser.ConfigParser - I thought maybe you could override ConfigParser.sections() with a recursive lookup method but as ConfigParser inherits its structure from collections.MutableMapping I couldn't see any easy way to make this work.
Specifying nesting in header names. The [section:subsection] syntax isn't shared by any commonly-used config file structure I could find. I think this rules out using configobj.ConfigObj, which is a shame because that one supports nesting (and nested substitution).

While trying out these things I did write some tests though, so we could use them (boutdata.BoutOptionsFile currently has no direct tests).

Therefore I suggest what we should do is:

Write some tests defining the expected behaviour of the options file class (I have some already, which I will submit later),
Copy over the basic parts of the boutdata.BoutOptionsFile code to xBOUT,
Reimplement the methods which rely on boututils.DataFile.

This should hopefully allow us to improve the boutdata.BoutOptionsFile code if needed but without breaking backwards compatibility.

The text was updated successfully, but these errors were encountered:

TomNicholas · 2019-12-16T14:34:37Z

Alternatively, if we are okay with breaking backwards compatibility, there are lots of ways in which BoutOptionsFile could be improved...

(By breaking backwards compatibility I don't mean that the BOUT.inp structure would have to be changed, I just mean you wouldn't be able to just drop in xbout.BoutOptionsFile for boutdata.BoutOptionsFile and not have to change the syntax of any calls.)

Here's a list of things which I think an updated version of BoutOptionsFile could feature, or at least should do differently from the current boutdata.BoutOptionsFile:

I/O

Should have the option to read grid size data from a xarray.Dataset object provided by xBOUT.
Should use pathlib.Path() under the hood. (pathlib is so much better than os, it even gives automatic compatibility with Windows-style paths)
Make casting to lowercase optional. BOUT in general should not do this in my opinion, in case one of the values in the input file is a path.
Ability to read from and write to a nested dictionary format. Potentially useful for making tests which don't have to read or write files, and testing eval features.
__str__ could have an option to print out as a nested dictionary (potentially by using prettyprint.PrettyPrinter). This would then be a serializable representation of the object.

Evaluation and substitution

A .get() method with optional arguments for evaluation. The default should be to evaluate the function in the file, then opt-out with .get(key, eval=False).
Also store in-line comments, and then optionally strip them when fetching with .get(key, comment=False) (defaulting to True). That would allow near-roundtripping: writing the BoutOptionsFile back out to a new .inp file would then preserve the original inline comments. It would be super-helpful when finding an old dataset for it to still have the comments in the input file. (You might even consider a scheme to keep the lone comments as well as the inline ones...)
configparser has classes for this, which could be used as inspiration (they call it "interpolation").

Indexing

The .keys() method returns a counterintuitive result: it returns a list where the first half is sections and the second half is keys. It should just return the keys for that section.
__iter__ is similar, it first yields the sections, then yields the keys. BoutOptions fundamentally has a dictionary structure: __iter__ should yield (key, section) pairs.
There should be an .items(section=None) method, which returns key, value pairs in a section.
The getSection() method should not also be a setter - currently if it doesn't find a section in the input file it will create one. This should instead throw a KeyError, and require a new set_section() method. Also getSection() -> get_section() for PEP8 compliance.
Better error handling when indexing: configparser has lots of more specific errors which could be imported and thrown when appropriate.

Code structure

It needs tests, including of evaluation of different types.
It should have a __repr__ method, which I think should just return BoutOptionsFile('/path/to/BOUT.inp')
.path() is a potentially-confusing method name for an object which also has an attribute storing the path to a file. (Could call it .route() or .branch()?)
boutdata.BoutOptions implements nesting by storing itself, but it would be clearer to have a top-level BoutOptionsParser class, which could read from dicts, and is subclassed as BoutOptionsFile. This parser would contain a nested dictionary of instances of a Section class.

I'm not saying we should do all this, or do any of it now, but I wanted to write down all the ways I think it could be improved in one place, in case they are helpful in future.

(xref boutproject/BOUT-dev#304 and boutproject/BOUT-dev#1066)

TomNicholas · 2019-12-19T12:57:39Z

I've just read BOUT's documentation on the input file in detail, and apparently BOUT++ does several weird things when parsing that I was not aware of. These all will have to also be in any xbout.OptionsFile parser otherwise your analysis might not be consistent with your data.

Detect escaped arithmetic symbols (+-*/^), dot (.), or brackets ((){}[]) in option names
Detect escaping through backquotes in option names (`2ndvalue`)
Evaluate numletter as num*letter (and numbracket as num*bracket)
Substitute unicode representation of pi
Ensure option names don't contain : or =
Ensure all expressions are rounded to nearest integer. I'm very suprised by this, I did not think dx = Lx/nx evaluated to an integer internally... Does BOUT actually do this @johnomotani ??

johnomotani · 2019-12-19T13:00:55Z

Ensure all expressions are rounded to nearest integer. I'm very suprised by this, I did not think dx = Lx/nx evaluated to an integer internally... Does BOUT actually do this

I didn't think so... is it only in options that are being read as ints in the C++ code?

TomNicholas · 2019-12-19T13:37:57Z

I'm not sure I'm understanding the documentation correctly, what it says here is

All expressions are calculated in floating point and then converted to an integer when read inside BOUT++. The conversion is done by rounding to the nearest integer, but throws an error if the floating point value is not within (1e-3) of an integer. This is to minimise unexpected behaviour. If you want to round any result to an integer, use the round function

johnomotani · 2019-12-19T13:45:02Z

I think the documentation needs re-phrasing. I believe it means:

all expressions are calculated as floating point
if an expression is read into an int, it is automatically rounded, as long as the value is within 1e-3 of an integer. If the value is more than 1.e-3 from an integer, an error is thrown.
- but, if you actually want to put in something like nxpe=nx/42 for an integer parameter nxpe where the expression might not give an integer result, you can round it using nxpe=round(nx/42) to avoid the error.

dschwoerer · 2019-12-19T18:31:49Z

Seems it might be easier to use boutcore :-)

bendudson · 2019-12-19T18:58:27Z

I think this depends on what level you want to be able to reproduce the inputs (and at what point an input become a calculated result). The BOUT.inp is programmable, with a pretty complete parser. This functionality is really useful when doing MMS testing or complex inputs.
The file can be read as a pretty standard INI file (with subsections not being standardised). The inputs however would then just be stored as strings if they are expressions. To fully duplicate the parsing and evaluation which BOUT++ applies to its input might be quite a bit of work. I think @dschwoerer is probably right that at that point it becomes easier to just use 'boutcore'.

johnomotani · 2019-12-19T19:05:16Z

It is useful to be able to get at least some settings evaluated from the input file, for things like normalisations. One could argue that the PhysicsModel code should just save everything necessary to the dump files, and I wouldn't argue with that, but it would be nice to have less code to write... would it make sense to save at least all the scalar options to the output files (which would then give us access to them in evaluated form)?

Edit: I guess the problem would be that the Options objects should not need to know about a Datafile dump global object, or PhysicsModel-specific object ideally, but maybe PhysicsModel could set a dump file for the Options during initialisation?

TomNicholas · 2019-12-20T13:09:32Z

at that point it becomes easier to just use 'boutcore'.

But you would have to import BOUT's c++ code for that then right? So xBOUT would no longer be a pure python package.

The file can be read as a pretty standard INI file (with subsections not being standardised).

I don't know if it will ever be worth it to change the format (BOUT v5 maybe?), but I think YAML would have been the best choice. It's clear and human-readable, supports nesting, and so parsers could be used out of the box in both python (pyyaml) and C++ (yaml-cpp). You can do nested sections with indentation like this:

mesh:
  - nx: 484 
  - dx: Lx/(nx-4)
  ddx:
    - 'first': 'C2'

there is a nice example of a .yml file used in a similar way here.

To fully duplicate the parsing and evaluation which BOUT++ applies to its input might be quite a bit of work.

I'm not sure it's that bad - string manipulation in python is pretty comprehensive and it's a task very amenable to unit testing.

It is useful to be able to get at least some settings evaluated from the input file, for things like normalisations.

Yes, even just my mesh: dx = Lx / nx example justifies doing this in my opinion.

dschwoerer · 2019-12-20T13:59:05Z

On 12/20/19 2:09 PM, Tom Nicholas wrote: at that point it becomes easier to just use 'boutcore'. But you would have to import BOUT's c++ code for that then right? So xBOUT would no longer be a pure python package.

`boutcore` needs to link against the BOUT++ library. And a C++ compiler is needed to compile the cython generated boutcore code. But I assume a C/C++ compiler is anyway needed, as the netcdf python package needs also a compiler. If you would rely on boutcore as external package, xBOUT would still be as pure python as it is today. It will make however installation more difficult, as you not only need a netcdf library (among others) but also the bout++ library for boutcore.

TomNicholas · 2019-12-20T14:37:52Z

Thanks for the clarification David.

I assume a
C/C++ compiler is anyway needed, as the netcdf python package needs also
a compiler.

Well the user doesn't have to worry about this if they have installed netcdf-python via conda or some other way.

It will make however installation more difficult, as you not only need a
netcdf library (among others) but also the bout++ library for boutcore.

This was what I meant - we don't want people to have to download the bout++ library or install anything beyond just pip install xbout, in order to simply open up their data.

If you would rely on boutcore as external package

It would be nice to include boutcore in xbout, but it should be as an optional dependency.

bendudson · 2019-12-20T16:06:42Z

Almost all the things that are hard about reading a BOUT.inp file are to do with evaluating expressions, which is the point where the escaping and special handling is needed. Reading the INI file is pretty trivial, and I think easier for both humans and machines to read and write than YAML: what I've read about YAML is not generally very positive.

johnomotani · 2020-04-27T10:12:53Z

hypnotoad has some different issues with an Options class boutproject/hypnotoad#27 - maybe a solution could cover both use-cases? Possibly a base 'generic options' class that could be used directly by hypnotoad and a subclass with extra more BOUT-specific stuff like evaluating expressions?

TomNicholas · 2020-04-27T11:42:11Z

I imagine #97 could be made general enough to cover both. That PR is on the backburner for me now though. If someone wants to pick it up the comments and tests should make that easier.

johnomotani · 2020-04-27T12:07:45Z

On thinking a bit more, a shared options class between hypnotoad (boutproject/hypnotoad#27) and xBOUT may not be such a good idea:

hypnotoad wants immutable options (i.e. a given list, defined in the class definition), with functionality to set defaults and (possibly) check allowed values
xBOUT wants to be able to read and make available to the user whatever's in the input file, and doesn't need default values (if anything needed is not present, then it should be an error)

TomNicholas added the enhancement New feature or request label Dec 16, 2019

TomNicholas self-assigned this Dec 17, 2019

TomNicholas mentioned this issue Dec 19, 2019

Replacement Options File Parser #97

Open

johnomotani mentioned this issue Jan 27, 2020

XZInterpolation rename, and passing Options to XZInterpolation objects boutproject/BOUT-dev#1896

Merged

johnomotani mentioned this issue Apr 27, 2020

Options class replacement boutproject/hypnotoad#27

Closed

johnomotani mentioned this issue May 22, 2020

save and reload #124

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement BoutOptionsFile #94

Implement BoutOptionsFile #94

TomNicholas commented Dec 15, 2019 •

edited

Loading

TomNicholas commented Dec 16, 2019 •

edited

Loading

TomNicholas commented Dec 19, 2019

johnomotani commented Dec 19, 2019

TomNicholas commented Dec 19, 2019

johnomotani commented Dec 19, 2019

dschwoerer commented Dec 19, 2019

bendudson commented Dec 19, 2019

johnomotani commented Dec 19, 2019 •

edited

Loading

TomNicholas commented Dec 20, 2019

dschwoerer commented Dec 20, 2019 via email

TomNicholas commented Dec 20, 2019

bendudson commented Dec 20, 2019

johnomotani commented Apr 27, 2020

TomNicholas commented Apr 27, 2020

johnomotani commented Apr 27, 2020

Implement BoutOptionsFile #94

Implement BoutOptionsFile #94

Comments

TomNicholas commented Dec 15, 2019 • edited Loading

TomNicholas commented Dec 16, 2019 • edited Loading

TomNicholas commented Dec 19, 2019

johnomotani commented Dec 19, 2019

TomNicholas commented Dec 19, 2019

johnomotani commented Dec 19, 2019

dschwoerer commented Dec 19, 2019

bendudson commented Dec 19, 2019

johnomotani commented Dec 19, 2019 • edited Loading

TomNicholas commented Dec 20, 2019

dschwoerer commented Dec 20, 2019 via email

TomNicholas commented Dec 20, 2019

bendudson commented Dec 20, 2019

johnomotani commented Apr 27, 2020

TomNicholas commented Apr 27, 2020

johnomotani commented Apr 27, 2020

TomNicholas commented Dec 15, 2019 •

edited

Loading

TomNicholas commented Dec 16, 2019 •

edited

Loading

johnomotani commented Dec 19, 2019 •

edited

Loading