Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New style adjacency list and multiplicity label #217

Conversation

bbuesser
Copy link
Contributor

!!! Do not merge yet !!!

These are the necessary changes to RMG-Py and its unit tests to handle the new adjacency list format. At the bottom I have copied the same explanation of the new adjacency list format as I have used for the RMG database pull request #37. Before that I am adding some comments on how RMG-Py handles the new adjacency list up to this pull request, further improvements are expected.

I would like to mention the following functions.

toAdjacencyList(): prints always the complete adjacency list with U, L and (currently) calculated E. It accepts a new argument "printMultiplicity=False/True" which defines if the multiplicity should be printed as part of the adjacency list or not.

fromAdjacencyList(): It requires always the U label. It can read the L label, if not defined it assumes L0 for molecules and None (not defined and not to be compared) for groups. E is currently read but not used for anything. The formal charges are currently calculated from the U label and the number of bonds for each atom. The reason for this is that it was easy to implement the code for reading the E label but because I didn't have any test cases so far where I could collect experiences how important the E label is I thought it might be safer for this pull request to satisfy U and the bonds and create a neutral species. Therefore E is currently overwritten assuming a neutral species. fromAdjacencyList() accepts wild cards Ux, Lx and Ex representing any number possible.

fromRDKitMol(): all function based on from RDKitMol() like from SMILES assume maximum multiplicity is given, therefore multiplicity=2*spin+1=number of unpaired electrons+1

open topics for future development:

  • currently multiplicity is stored in the Species class and the Molecule class which has reasons in the RMG development history. In future in might be better to decide for one place to store it, preferably the Species class. Further the Conformer class already had a multiplicity label to calculate accurate thermochemistry, this might as well be combined with the species multiplicity.
  • currently the TransitionState class does not have a multiplicity label although it should have one too

Explanation of new adjacency list style:

This is the first version of the RMG database with the new adjacency list format and multiplicity as a species/molecule property. Kinetics libraries store the multiplicity as part of the adjacency list where everywhere else it is a separate argument. I think multiplicity should not be part of the adjacency list because it does not depend on its details, e.g. there can be many adjacency list for the same species (resonance isomers) all having necessarily the same multiplicity. As soon as we continue with our efforts of having a separate structure library for kinetics rules, this difference of storing multiplicity in the adjacency list for kinetics will fall away.

The new adjacency list format is (e.g. nitromethane, CH3NO2):

1 C U0 L0 E0 {2,S} {3,S} {4,S} {5,S}
2 N U0 L0 E+1 {1,S} {6,D} {7,S}
3 H U0 L0 E0 {1,S}
4 H U0 L0 E0 {1,S}
5 H U0 L0 E0 {1,S}
6 O U0 L2 E0 {2,D}
7 O U0 L3 E-1 {2,S}

where

U: the flag for unpaired electrons (formerly radicals). There are two reasons to abandon the R for radicals. First we are using R to represent unspecified groups in the elements column. Second it would be confusing in a future publication to use "radical" at the same time for the species with unpaired electrons and the unpaired electron itself.

L: the flag for the number of lone electron pairs. The reason against P as flag was the possible future introduction of phosphorus that would bring P as an element.

E: the flag for formal charges. The sum of all E is equal to the total charge of the species. Currently only neutral species are reactive in RMG, therefore sum(E)=0 is required for reactivity. E has been chosen as the capital letter of e representing an electronic charge. E+1 means one electron less, E-2 means two additional electrons on that atom. C was not used as flag for formal charge because it is used to represent carbon

There are no more 2T and 2S or any other combination accepted by RMG to represent multiplicity.

Adjacency list in kinetics libraries or rules look like the following:

HCO
multiplicity 2
1 C U1 L0 E0 {2,D} {3,S}
2 O U0 L2 E0 {1,D}
3 H U0 L0 E0 {1,S}

It can contain in its first line as always a label. The second line contains the label "multiplicity" followed by a space and a number representing the multiplicity of that species.

For groups the multiplicity label is always a separate argument (thermo and kinetics) and is defined as a list containing all accepted multiplicities where that group will be applicable.

Groups only require the U flag for unpaired electrons, L is optional and will be compared if defined while E is not read at the moment.

bbuesser added 30 commits May 19, 2014 11:58
- remove multiplicity from GroupAtom
- add multiplicity to Group
- update __gainRadical() and __loseRadical()
- update equivalent() and isSpecificCaseOf()
- remove multiplicity from Atom
- add multiplicity to Molecule
- add condition that given multiplicity must be possible for given
number of unpaired electrons (radicals), throws SpeciesError if not
- include multiplicity in isIsomorphic()
- add multiplicity attribute to class Species
- add multiplicity to QM thermo database
- update makeNewSpecies()
allowed reactions

-__generateProductStructures uses angular moment addition theorem to
find spin allowed reactions, further improvement and validation might be
required
And set printMultiplicity in toAdjacencyList() to False to not print
multiplicity as part of the adjacency list
bbuesser added 22 commits May 19, 2014 12:10
Since the introduction of the multiplicity inside adjacency lists for
the kinetics libraries we have to split the multiplicity line
(multiplicity n) into its two components where the second component
([1]) is the actual multiplicity number.
This makes it easier to understand the difference to other saveEntry()
functions
todjacencyList():
-pass printMultiplicity argument actually to toAdjacencyList() from
adjlist.py

fromAdjacencyList():
- remove newStyleAdjacencyListMatcher(), all adjacency list are now new
style
- remove maxMultiplicity argument, multiplicity has now to be
explicitely defined
- multiplicity has now to be explicitely defined
- multiplicity has to be defined for each entry which makes
maxMultiplicity unnecessary
- this also updates all the functions based on fromEDKitMol() like
fromSMILES()
- all these molecule input functions are based on the assumption of
maximum multiplicity is the most stable multiplicity
(multiplicity=radcial count + 1)
for molecules a missing lone pair card (L) leads to the assumption that
there are no lone pairs, therefore L0

for groups a missing L leads to the assumption that it was not defined
therefore None
@rwest rwest mentioned this pull request May 20, 2014
@rwest
Copy link
Member

rwest commented May 20, 2014

Closed in favor of the replacement #218 (so that other authors besides @bbuesser can contribute to it before it is merged)

@rwest rwest closed this May 20, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants