(re-)Implement GMX RTP parser using lark-parser #359

pckroon · 2021-04-08T10:07:13Z

This PR (re) implements the RTP parser using the lark-parser parsing framework, where the file grammar is separated from the interpretation.
If we like this style we'll use this to parse FF files as well, after we redo that file format.

This PR is a draft because it's main purpose is to foster some discussion. It also requires more extensive testing. On top of that there is one TODO left regarding the generation of 1-4 pairs between hydrogens, and docstrings are missing.

@fgrunewald what's your opinion/take on this?
The main advantages/disadvantages that I see are the following:
+ The grammar and syntax are explicitly defined, which helps documentation
+ It's faster than the SectionLineParser (but I haven't quantified this)
+ When we implement the FF format as well we can start inheriting/including grammar from the other files, which makes the syntax more uniform
+ It'll allow us to write more complex things for e.g. the JSON parts of the FF format, relaxing the need for quotes and such. See #175
- We need to redo the parsers. Again.
- The interpreter logic is not always as clearly depicted as with the sectionlineparser, but that may also be because I'm not good at working with abstract syntax trees.

fgrunewald · 2021-04-20T07:57:55Z

@pckroon

Thanks for sharing this LARK version of the RTP parser. In general it seems more complicated to use than the SectionLineParser, but also I find it more clear in general as a language. However, I think the deciding factor will be if we can do the gromacs specific syntax with LARK as well. That we have to see because RTP ironically is the most easy to parse file format of gromacs in my opinion.

Having said that, could you run one or two atomsitic FFs on lysozyme using PDB2GMX and compare to what you get with martinize using this LARK parser? I think it is good to get a feeling on how much is missing.

pckroon · 2021-04-20T08:05:19Z

because RTP ironically is the most easy to parse file format of gromacs in my opinion

I don't agree I think. The semantics of the format are super unclear due to missing documentation. It's also the format that we currently can't do with the SLP.

Having said that, could you run one or two atomsitic FFs on lysozyme using PDB2GMX and compare to what you get with martinize using this LARK parser? I think it is good to get a feeling on how much is missing.

Good idea, I'll see if I can get around to that this week.

(re-)Implement GMX RTP parser using lark-parser

5eb12e8

pckroon requested a review from fgrunewald April 8, 2021 10:07

fgrunewald force-pushed the master branch from 99a936f to 1fba29e Compare April 25, 2023 13:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(re-)Implement GMX RTP parser using lark-parser #359

(re-)Implement GMX RTP parser using lark-parser #359

pckroon commented Apr 8, 2021

fgrunewald commented Apr 20, 2021

pckroon commented Apr 20, 2021

(re-)Implement GMX RTP parser using lark-parser #359

Are you sure you want to change the base?

(re-)Implement GMX RTP parser using lark-parser #359

Conversation

pckroon commented Apr 8, 2021

fgrunewald commented Apr 20, 2021

pckroon commented Apr 20, 2021