-
Notifications
You must be signed in to change notification settings - Fork 230
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New style adjacency list and multiplicity label #217
Closed
bbuesser
wants to merge
52
commits into
ReactionMechanismGenerator:master
from
bbuesser:new-style-adjacency-list
Closed
New style adjacency list and multiplicity label #217
bbuesser
wants to merge
52
commits into
ReactionMechanismGenerator:master
from
bbuesser:new-style-adjacency-list
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
- remove multiplicity from GroupAtom - add multiplicity to Group - update __gainRadical() and __loseRadical() - update equivalent() and isSpecificCaseOf()
saveSpeciesDictionary()
- remove multiplicity from Atom - add multiplicity to Molecule
- instead of triplet assumption
- add condition that given multiplicity must be possible for given number of unpaired electrons (radicals), throws SpeciesError if not - include multiplicity in isIsomorphic()
- add multiplicity attribute to class Species - add multiplicity to QM thermo database - update makeNewSpecies()
allowed reactions -__generateProductStructures uses angular moment addition theorem to find spin allowed reactions, further improvement and validation might be required
And set printMultiplicity in toAdjacencyList() to False to not print multiplicity as part of the adjacency list
Since the introduction of the multiplicity inside adjacency lists for the kinetics libraries we have to split the multiplicity line (multiplicity n) into its two components where the second component ([1]) is the actual multiplicity number.
This makes it easier to understand the difference to other saveEntry() functions
todjacencyList(): -pass printMultiplicity argument actually to toAdjacencyList() from adjlist.py fromAdjacencyList(): - remove newStyleAdjacencyListMatcher(), all adjacency list are now new style - remove maxMultiplicity argument, multiplicity has now to be explicitely defined
- multiplicity has now to be explicitely defined
- multiplicity has to be defined for each entry which makes maxMultiplicity unnecessary
- this also updates all the functions based on fromEDKitMol() like fromSMILES() - all these molecule input functions are based on the assumption of maximum multiplicity is the most stable multiplicity (multiplicity=radcial count + 1)
for molecules a missing lone pair card (L) leads to the assumption that there are no lone pairs, therefore L0 for groups a missing L leads to the assumption that it was not defined therefore None
multiplicity to class representation string
Atoms and molecule classes
and add multiplicity to pickling functions
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
!!! Do not merge yet !!!
These are the necessary changes to RMG-Py and its unit tests to handle the new adjacency list format. At the bottom I have copied the same explanation of the new adjacency list format as I have used for the RMG database pull request #37. Before that I am adding some comments on how RMG-Py handles the new adjacency list up to this pull request, further improvements are expected.
I would like to mention the following functions.
toAdjacencyList(): prints always the complete adjacency list with U, L and (currently) calculated E. It accepts a new argument "printMultiplicity=False/True" which defines if the multiplicity should be printed as part of the adjacency list or not.
fromAdjacencyList(): It requires always the U label. It can read the L label, if not defined it assumes L0 for molecules and None (not defined and not to be compared) for groups. E is currently read but not used for anything. The formal charges are currently calculated from the U label and the number of bonds for each atom. The reason for this is that it was easy to implement the code for reading the E label but because I didn't have any test cases so far where I could collect experiences how important the E label is I thought it might be safer for this pull request to satisfy U and the bonds and create a neutral species. Therefore E is currently overwritten assuming a neutral species. fromAdjacencyList() accepts wild cards Ux, Lx and Ex representing any number possible.
fromRDKitMol(): all function based on from RDKitMol() like from SMILES assume maximum multiplicity is given, therefore multiplicity=2*spin+1=number of unpaired electrons+1
open topics for future development:
Explanation of new adjacency list style:
This is the first version of the RMG database with the new adjacency list format and multiplicity as a species/molecule property. Kinetics libraries store the multiplicity as part of the adjacency list where everywhere else it is a separate argument. I think multiplicity should not be part of the adjacency list because it does not depend on its details, e.g. there can be many adjacency list for the same species (resonance isomers) all having necessarily the same multiplicity. As soon as we continue with our efforts of having a separate structure library for kinetics rules, this difference of storing multiplicity in the adjacency list for kinetics will fall away.
The new adjacency list format is (e.g. nitromethane, CH3NO2):
1 C U0 L0 E0 {2,S} {3,S} {4,S} {5,S}
2 N U0 L0 E+1 {1,S} {6,D} {7,S}
3 H U0 L0 E0 {1,S}
4 H U0 L0 E0 {1,S}
5 H U0 L0 E0 {1,S}
6 O U0 L2 E0 {2,D}
7 O U0 L3 E-1 {2,S}
where
U: the flag for unpaired electrons (formerly radicals). There are two reasons to abandon the R for radicals. First we are using R to represent unspecified groups in the elements column. Second it would be confusing in a future publication to use "radical" at the same time for the species with unpaired electrons and the unpaired electron itself.
L: the flag for the number of lone electron pairs. The reason against P as flag was the possible future introduction of phosphorus that would bring P as an element.
E: the flag for formal charges. The sum of all E is equal to the total charge of the species. Currently only neutral species are reactive in RMG, therefore sum(E)=0 is required for reactivity. E has been chosen as the capital letter of e representing an electronic charge. E+1 means one electron less, E-2 means two additional electrons on that atom. C was not used as flag for formal charge because it is used to represent carbon
There are no more 2T and 2S or any other combination accepted by RMG to represent multiplicity.
Adjacency list in kinetics libraries or rules look like the following:
HCO
multiplicity 2
1 C U1 L0 E0 {2,D} {3,S}
2 O U0 L2 E0 {1,D}
3 H U0 L0 E0 {1,S}
It can contain in its first line as always a label. The second line contains the label "multiplicity" followed by a space and a number representing the multiplicity of that species.
For groups the multiplicity label is always a separate argument (thermo and kinetics) and is defined as a list containing all accepted multiplicities where that group will be applicable.
Groups only require the U flag for unpaired electrons, L is optional and will be compared if defined while E is not read at the moment.