Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement BoutOptionsFile #94

Open
TomNicholas opened this issue Dec 15, 2019 · 15 comments
Open

Implement BoutOptionsFile #94

TomNicholas opened this issue Dec 15, 2019 · 15 comments
Assignees
Labels
enhancement New feature or request

Comments

@TomNicholas
Copy link
Collaborator

TomNicholas commented Dec 15, 2019

We want to fully integrate a class structure storing all input file options.

We can't just use boutdata.BoutOptionsFile though, because it uses boututils.DataFile in places, so it will need at least a partial rewrite to work with xBOUT.

I actually tried to see if there was a way to rewrite BoutOptionsFile completely (aiming for a much shorter, less home-grown implementation), but the unique structure of BOUT.inp files foiled me. BOUT.inp files are similar to .ini files and YAML files, but have distinguishing properties which make handling them with variations of standard python library config classes difficult:

  1. Missing initial section header. This is annoying but solvable - you can still read and write .inp files with configparser.ConfigParser but you have to special-case the first line.

  2. Nested sections. This rules out easily subclassing configparser.ConfigParser - I thought maybe you could override ConfigParser.sections() with a recursive lookup method but as ConfigParser inherits its structure from collections.MutableMapping I couldn't see any easy way to make this work.

  3. Specifying nesting in header names. The [section:subsection] syntax isn't shared by any commonly-used config file structure I could find. I think this rules out using configobj.ConfigObj, which is a shame because that one supports nesting (and nested substitution).

While trying out these things I did write some tests though, so we could use them (boutdata.BoutOptionsFile currently has no direct tests).

Therefore I suggest what we should do is:

  1. Write some tests defining the expected behaviour of the options file class (I have some already, which I will submit later),

  2. Copy over the basic parts of the boutdata.BoutOptionsFile code to xBOUT,

  3. Reimplement the methods which rely on boututils.DataFile.

This should hopefully allow us to improve the boutdata.BoutOptionsFile code if needed but without breaking backwards compatibility.

@TomNicholas TomNicholas added the enhancement New feature or request label Dec 16, 2019
@TomNicholas
Copy link
Collaborator Author

TomNicholas commented Dec 16, 2019

Alternatively, if we are okay with breaking backwards compatibility, there are lots of ways in which BoutOptionsFile could be improved...

(By breaking backwards compatibility I don't mean that the BOUT.inp structure would have to be changed, I just mean you wouldn't be able to just drop in xbout.BoutOptionsFile for boutdata.BoutOptionsFile and not have to change the syntax of any calls.)

Here's a list of things which I think an updated version of BoutOptionsFile could feature, or at least should do differently from the current boutdata.BoutOptionsFile:

I/O

  • Should have the option to read grid size data from a xarray.Dataset object provided by xBOUT.

  • Should use pathlib.Path() under the hood. (pathlib is so much better than os, it even gives automatic compatibility with Windows-style paths)

  • Make casting to lowercase optional. BOUT in general should not do this in my opinion, in case one of the values in the input file is a path.

  • Ability to read from and write to a nested dictionary format. Potentially useful for making tests which don't have to read or write files, and testing eval features.

  • __str__ could have an option to print out as a nested dictionary (potentially by using prettyprint.PrettyPrinter). This would then be a serializable representation of the object.

Evaluation and substitution

  • A .get() method with optional arguments for evaluation. The default should be to evaluate the function in the file, then opt-out with .get(key, eval=False).

  • Also store in-line comments, and then optionally strip them when fetching with .get(key, comment=False) (defaulting to True). That would allow near-roundtripping: writing the BoutOptionsFile back out to a new .inp file would then preserve the original inline comments. It would be super-helpful when finding an old dataset for it to still have the comments in the input file. (You might even consider a scheme to keep the lone comments as well as the inline ones...)

  • configparser has classes for this, which could be used as inspiration (they call it "interpolation").

Indexing

  • The .keys() method returns a counterintuitive result: it returns a list where the first half is sections and the second half is keys. It should just return the keys for that section.

  • __iter__ is similar, it first yields the sections, then yields the keys. BoutOptions fundamentally has a dictionary structure: __iter__ should yield (key, section) pairs.

  • There should be an .items(section=None) method, which returns key, value pairs in a section.

  • The getSection() method should not also be a setter - currently if it doesn't find a section in the input file it will create one. This should instead throw a KeyError, and require a new set_section() method. Also getSection() -> get_section() for PEP8 compliance.

  • Better error handling when indexing: configparser has lots of more specific errors which could be imported and thrown when appropriate.

Code structure

  • It needs tests, including of evaluation of different types.

  • It should have a __repr__ method, which I think should just return BoutOptionsFile('/path/to/BOUT.inp')

  • .path() is a potentially-confusing method name for an object which also has an attribute storing the path to a file. (Could call it .route() or .branch()?)

  • boutdata.BoutOptions implements nesting by storing itself, but it would be clearer to have a top-level BoutOptionsParser class, which could read from dicts, and is subclassed as BoutOptionsFile. This parser would contain a nested dictionary of instances of a Section class.

I'm not saying we should do all this, or do any of it now, but I wanted to write down all the ways I think it could be improved in one place, in case they are helpful in future.

(xref boutproject/BOUT-dev#304 and boutproject/BOUT-dev#1066)

@TomNicholas TomNicholas self-assigned this Dec 17, 2019
@TomNicholas
Copy link
Collaborator Author

I've just read BOUT's documentation on the input file in detail, and apparently BOUT++ does several weird things when parsing that I was not aware of. These all will have to also be in any xbout.OptionsFile parser otherwise your analysis might not be consistent with your data.

  • Detect escaped arithmetic symbols (+-*/^), dot (.), or brackets ((){}[]) in option names
  • Detect escaping through backquotes in option names (`2ndvalue`)
  • Evaluate numletter as num*letter (and numbracket as num*bracket)
  • Substitute unicode representation of pi
  • Ensure option names don't contain : or =
  • Ensure all expressions are rounded to nearest integer. I'm very suprised by this, I did not think dx = Lx/nx evaluated to an integer internally... Does BOUT actually do this @johnomotani ??

@johnomotani
Copy link
Collaborator

Ensure all expressions are rounded to nearest integer. I'm very suprised by this, I did not think dx = Lx/nx evaluated to an integer internally... Does BOUT actually do this

I didn't think so... is it only in options that are being read as ints in the C++ code?

@TomNicholas
Copy link
Collaborator Author

I'm not sure I'm understanding the documentation correctly, what it says here is

All expressions are calculated in floating point and then converted to an integer when read inside BOUT++. The conversion is done by rounding to the nearest integer, but throws an error if the floating point value is not within (1e-3) of an integer. This is to minimise unexpected behaviour. If you want to round any result to an integer, use the round function

@johnomotani
Copy link
Collaborator

I think the documentation needs re-phrasing. I believe it means:

  • all expressions are calculated as floating point
  • if an expression is read into an int, it is automatically rounded, as long as the value is within 1e-3 of an integer. If the value is more than 1.e-3 from an integer, an error is thrown.
    • but, if you actually want to put in something like nxpe=nx/42 for an integer parameter nxpe where the expression might not give an integer result, you can round it using nxpe=round(nx/42) to avoid the error.

@dschwoerer
Copy link
Contributor

Seems it might be easier to use boutcore :-)

@bendudson
Copy link
Contributor

I think this depends on what level you want to be able to reproduce the inputs (and at what point an input become a calculated result). The BOUT.inp is programmable, with a pretty complete parser. This functionality is really useful when doing MMS testing or complex inputs.
The file can be read as a pretty standard INI file (with subsections not being standardised). The inputs however would then just be stored as strings if they are expressions. To fully duplicate the parsing and evaluation which BOUT++ applies to its input might be quite a bit of work. I think @dschwoerer is probably right that at that point it becomes easier to just use 'boutcore'.

@johnomotani
Copy link
Collaborator

johnomotani commented Dec 19, 2019

It is useful to be able to get at least some settings evaluated from the input file, for things like normalisations. One could argue that the PhysicsModel code should just save everything necessary to the dump files, and I wouldn't argue with that, but it would be nice to have less code to write... would it make sense to save at least all the scalar options to the output files (which would then give us access to them in evaluated form)?

Edit: I guess the problem would be that the Options objects should not need to know about a Datafile dump global object, or PhysicsModel-specific object ideally, but maybe PhysicsModel could set a dump file for the Options during initialisation?

@TomNicholas
Copy link
Collaborator Author

at that point it becomes easier to just use 'boutcore'.

But you would have to import BOUT's c++ code for that then right? So xBOUT would no longer be a pure python package.

The file can be read as a pretty standard INI file (with subsections not being standardised).

I don't know if it will ever be worth it to change the format (BOUT v5 maybe?), but I think YAML would have been the best choice. It's clear and human-readable, supports nesting, and so parsers could be used out of the box in both python (pyyaml) and C++ (yaml-cpp). You can do nested sections with indentation like this:

mesh:
  - nx: 484 
  - dx: Lx/(nx-4)
  ddx:
    - 'first': 'C2'

there is a nice example of a .yml file used in a similar way here.

To fully duplicate the parsing and evaluation which BOUT++ applies to its input might be quite a bit of work.

I'm not sure it's that bad - string manipulation in python is pretty comprehensive and it's a task very amenable to unit testing.

It is useful to be able to get at least some settings evaluated from the input file, for things like normalisations.

Yes, even just my mesh: dx = Lx / nx example justifies doing this in my opinion.

@dschwoerer
Copy link
Contributor

dschwoerer commented Dec 20, 2019 via email

@TomNicholas
Copy link
Collaborator Author

Thanks for the clarification David.

I assume a
C/C++ compiler is anyway needed, as the netcdf python package needs also
a compiler.

Well the user doesn't have to worry about this if they have installed netcdf-python via conda or some other way.

It will make however installation more difficult, as you not only need a
netcdf library (among others) but also the bout++ library for boutcore.

This was what I meant - we don't want people to have to download the bout++ library or install anything beyond just pip install xbout, in order to simply open up their data.

If you would rely on boutcore as external package

It would be nice to include boutcore in xbout, but it should be as an optional dependency.

@bendudson
Copy link
Contributor

Almost all the things that are hard about reading a BOUT.inp file are to do with evaluating expressions, which is the point where the escaping and special handling is needed. Reading the INI file is pretty trivial, and I think easier for both humans and machines to read and write than YAML: what I've read about YAML is not generally very positive.

@johnomotani
Copy link
Collaborator

hypnotoad has some different issues with an Options class boutproject/hypnotoad#27 - maybe a solution could cover both use-cases? Possibly a base 'generic options' class that could be used directly by hypnotoad and a subclass with extra more BOUT-specific stuff like evaluating expressions?

@TomNicholas
Copy link
Collaborator Author

I imagine #97 could be made general enough to cover both. That PR is on the backburner for me now though. If someone wants to pick it up the comments and tests should make that easier.

@johnomotani
Copy link
Collaborator

On thinking a bit more, a shared options class between hypnotoad (boutproject/hypnotoad#27) and xBOUT may not be such a good idea:

  • hypnotoad wants immutable options (i.e. a given list, defined in the class definition), with functionality to set defaults and (possibly) check allowed values
  • xBOUT wants to be able to read and make available to the user whatever's in the input file, and doesn't need default values (if anything needed is not present, then it should be an error)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants