Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output labeled adjacency lists for each reaction #72

Open
rwest opened this issue Apr 1, 2019 · 17 comments
Open

Output labeled adjacency lists for each reaction #72

rwest opened this issue Apr 1, 2019 · 17 comments
Assignees

Comments

@rwest
Copy link
Collaborator

rwest commented Apr 1, 2019

Eric Hermes @ehermes would like, in order to test Sella, would like labeled adjacency lists for each (important) reaction.

@rwest
Copy link
Collaborator Author

rwest commented Apr 8, 2019

This was mentioned again on the April 8 call.
Eric needs to know how each reaction will be described.

@rwest rwest self-assigned this Apr 8, 2019
@rwest
Copy link
Collaborator Author

rwest commented Apr 9, 2019

@mazeau please could you try and tackle this? Ask @nateharms for help as he knows the inner working of reaction center atom labels. What we want is: when a reaction has been created (but before the labels have been wiped) we should save (somehow) a representation of the adjacency lists of the reaction, with the relevant atom labels. We could perhaps immediately output these to a separate text (or yaml) file for each reaction. Or could save the atom labels and create the output later.
To do the latter, we may need to hack it a bit like cell [16] in this notebook.

I'd be happy to work on this, but Sandia are getting impatient and it seems I don't have much spare time to actually do it myself.

@mazeau
Copy link

mazeau commented Apr 9, 2019

Do you want this to be an option in the input file? Similar to generateOutputHTML?

@rwest
Copy link
Collaborator Author

rwest commented Apr 9, 2019

Ooh, Having it something we can turn on and off using the input file is a good idea. But having it just on all the time is a useful start so we can give Eric something

@nateharms
Copy link

nateharms commented Apr 9, 2019

Are you looking for something like this?

Reaction(reactants=[Molecule(SMILES="CC"), Molecule(SMILES="[O]O")], products=[Molecule(SMILES="C[CH2]"), Molecule(SMILES="OO")]) matches H_Abstraction reaction family. 

Molecule(SMILES="CC.[O]O")
multiplicity 2
1     O u0 p2 c0 {2,S} {11,S}
2  *3 O u1 p2 c0 {1,S}
3  *1 C u0 p0 c0 {4,S} {5,S} {6,S} {7,S}
4     C u0 p0 c0 {3,S} {8,S} {9,S} {10,S}
5  *2 H u0 p0 c0 {3,S}
6     H u0 p0 c0 {3,S}
7     H u0 p0 c0 {3,S}
8     H u0 p0 c0 {4,S}
9     H u0 p0 c0 {4,S}
10    H u0 p0 c0 {4,S}
11    H u0 p0 c0 {1,S}

Molecule(SMILES="OO.[CH2]C")
multiplicity 2
1     O u0 p2 c0 {2,S} {10,S}
2  *1 O u0 p2 c0 {1,S} {11,S}
3     C u0 p0 c0 {4,S} {5,S} {6,S} {7,S}
4  *3 C u1 p0 c0 {3,S} {8,S} {9,S}
5     H u0 p0 c0 {3,S}
6     H u0 p0 c0 {3,S}
7     H u0 p0 c0 {3,S}
8     H u0 p0 c0 {4,S}
9     H u0 p0 c0 {4,S}
10    H u0 p0 c0 {1,S}
11 *2 H u0 p0 c0 {2,S}

@rwest
Copy link
Collaborator Author

rwest commented Apr 9, 2019

Yes! Something like that

@nateharms
Copy link

nateharms commented Apr 9, 2019

Okay, this chunk of code should do what you need. It'll take in a list of reactions and write the reactant and product complexes to adjacency lists in an adjacency_lists.py file. Let me know if you need any explanation of the code.

import rmgpy
from rmgpy.molecule import Molecule
from rmgpy.species import Species
from rmgpy.reaction import Reaction
from rmgpy.data.rmg import RMGDatabase

########## Change this section for the reaction and families of interest ##########
rxn1 = Reaction(
    reactants = [Molecule(SMILES="CC"), Molecule(SMILES="[O]O")],
    products = [Molecule(SMILES="[CH2]C"), Molecule(SMILES="OO")]
)

rxn2 = Reaction(
    reactants = [Molecule(SMILES="C"), Molecule(SMILES="[O]O")],
    products = [Molecule(SMILES="[CH2]C"), Molecule(SMILES="OO")]
)

rxn3 = Reaction(
    reactants = [
        Species(molecule=[Molecule(SMILES="CC")]), 
        Species(molecule=[Molecule(SMILES="[O]O")])
    ],
    products = [
        Species(molecule=[Molecule(SMILES="[CH2]C")]), 
        Species(molecule=[Molecule(SMILES="OO")])
    ]
)
reaction_list = [rxn1, rxn2, rxn3]

families_of_interest = [
    "H_Abstraction"
]

###################################################################################

rmg_database = RMGDatabase()
database_path = rmgpy.settings['database.directory']
rmg_database.load(
    database_path,
    kineticsFamilies=families_of_interest,
    transportLibraries=[],
    reactionLibraries=[],
    seedMechanisms=[],
    thermoLibraries=[
        'primaryThermoLibrary',
        'thermo_DFT_CCSDTF12_BAC',
        'CBS_QB3_1dHR'],
    solvation=False,
)

families = rmg_database.kinetics.families
write_string = ""
for reaction in reaction_list:
    match = False
    rmg_reactants = []
    rmg_products = []
    for react in reaction.reactants:
        if isinstance(react, Species):
            rmg_reactants.append(react.molecule)
        elif isinstance(react, Molecule):
            rmg_reactants.append([react])

    for prod in reaction.products:
        if isinstance(prod, Species):
            rmg_products.append(prod.molecule)
        elif isinstance(prod, Molecule):
            rmg_products.append([prod])

    test_reactants = []
    test_products = []
    if len(rmg_reactants) == 1:
        test_reactants = rmg_reactants
    elif len(rmg_reactants) == 2:
        l1, l2 = rmg_reactants
        for i in l1:
            for j in l2:
                test_reactants.append([i, j])
    elif len(rmg_products) == 3:
        l1, l2, l3 = rmg_reactants
        for i in l1:
            for j in l2:
                for k in l3:
                    test_reactants.append([i,j,k])

    if len(rmg_products) == 1:
        test_reactants = rmg_products
    elif len(rmg_products) == 2:
        l1, l2 = rmg_products
        for i in l1:
            for j in l2:
                test_products.append([i, j])
    elif len(rmg_products) == 3:
        l1, l2, l3 = rmg_products
        for i in l1:
            for j in l2:
                for k in l3:
                    test_products.append([i,j,k])
    for name, family in families.items():
        if match:
            continue
            
        labeled_reactants, labeled_products = (None, None)
        for test_reactant in test_reactants:
            for test_products in test_products:
                if (labeled_reactants and labeled_products):
                    continue
                labeled_reactants, labeled_products = family.getLabeledReactantsAndProducts(
                    reactants = test_reactant,
                    products = test_products
                )

                
                    
        if (labeled_reactants) and (labeled_products):
            print "{} matches {} reaction family.".format(reaction.__repr__(), name)
            print
            reactant_complex = Molecule()
            for reactant in labeled_reactants:
                reactant_complex = reactant_complex.merge(reactant)
            reactant_complex.updateMultiplicity()
            write_string += reactant_complex.__repr__()
            write_string += "\n"
            write_string += reactant_complex.toAdjacencyList()
            write_string += "\n"
            
            product_complex = Molecule()
            for product in labeled_products:
                product_complex = product_complex.merge(product)
            product_complex.updateMultiplicity()
            write_string += product_complex.__repr__()
            write_string += "\n"
            write_string += product_complex.toAdjacencyList()
            write_string += "\n"
            match = True
    if not match:
        print "WARNING: Could not match {} to any provided reaction family...".format(reaction.__repr__())
        print

f = open("adjacency_lists.py", "w")
f.write(write_string)
f.close()

@ehermes
Copy link

ehermes commented Apr 9, 2019

Right now I don't have the ability to run RMG-cat, so I can't test the above code. I just wanted an example of what the reaction specification would look like, so I can write a parser that will turn it into 3D structures.

Remember that the purpose of this exercise is to begin automating the translation of RMG-cat reactions into 3D structures of the saddle point. With this in mind, I have a couple of comments regarding the example you provided:

I see that your atom indices are moving around -- e.g. the C atom participating in the reaction moves from index 3 to index 4, and the H atom moves from index 5 to index 11. The labels *1 *2 *3 will help me identify which atoms are participating in the reaction, but if the other atom indices are also being changed from reactants to products, it will be much more difficult for me to find a 3D structure for the saddle point. Would it be possible to reorder the atom indices so they correspond 1:1 in the reactants and products? I feel this would be easier to accomplish with molecular graphs than 3D structures, and the only work I've done with molecular graphs is to translate them into 3D structures...

Related to this first point, what about more complicated reactions where multiple bonds are being made and broken? Based on the example you gave, I understand that the bond between *1 and *2 in the reactants (*3 and *2 in the products) is being broken and a bond between *1 and *2 in the products (*3 and *2 in the reactants) is being made. It is not clear to me how this would extend to more complex reactions. I think it is important that we settle on a notation that is general enough to extend to more complicated reactions.

The reaction that I am initially going to be testing is a simple catalytic dehydrogenation reaction: CH3* + * -> CH2* + H*. Given the notation you provided, would this be an appropriate way to specify this reaction?

CH3
multiplicity ??? # <- I don't think this is meaningful for catalytic systems
1  *1 C u0 p0 c0 {2,S} {3,S} {4,S} {5,S}
2     H u0 p0 c0 {1,S}
3     H u0 p0 c0 {1,S}
4  *2 H u0 p0 c0 {1,S}
5     X u0 p0 c0 {1,S}
6  *3 X u1 p0 c0

CH2+H
multiplicity ???
1  *3 C u1 p0 c0 {2,S} {3,S} {5,S}
2     H u0 p0 c0 {1,S}
3     H u0 p0 c0 {1,S}
4  *2 H u0 p0 c0 {6,S}
5     X u0 p0 c0 {1,S}
6  *1 X u0 p0 c0 {4,S}

Thanks for the work on this issue, and apologies for the long comment.

Edit: Corrected the label numbers in my paragraph about more complex reaction types.

@mazeau
Copy link

mazeau commented Apr 9, 2019

Note to self:
in data/kinetics/family.py at the end of applyRecipe adding these two lines

        print reactantStructure.toAdjacencyList(label='reactant')
        print productStructure.toAdjacencyList(label='product')
        return productStructures

prints what we need. (It just prints it far too often and for everything, without us knowing which is which. But it's a start)

@mazeau
Copy link

mazeau commented Apr 11, 2019

Now it will print reactant adj lists and prod and reactant adj lists (or just the combined reactant adj list if it is the same as the product one), as well as some will print the reaction itself. It's just far too large to keep all of these until the end of model generation.

label.txt

@rwest
Copy link
Collaborator Author

rwest commented Apr 11, 2019

These changes on a branch somewhere?

mazeau added a commit to mazeau/RMG-Py that referenced this issue Apr 11, 2019
@mazeau
Copy link

mazeau commented Apr 11, 2019

@rwest
Copy link
Collaborator Author

rwest commented Apr 17, 2019

Progress from a different approach.
I think what's currently on my catlabel branch: https://github.com/rwest/rmg-py/tree/catlabel (currently commit 86e9f63)
manages to save the labeled adjacency lists for the reactants complex and products complex, without reordering, onto every TemplateReaction object.

What's needed next is to find the most appropriate place to print them out.

Could be in an i/o thing that's done for all reactions each iteration, like how we print the chemkin and html files. Or it could be done just once per reaction when a reaction gets added to the core (or even edge). eg. makeNewReaction (rmg/model.py line 486)

@mazeau
Copy link

mazeau commented Apr 17, 2019

I merged your changes and now have all the reactions print out to a text file

mazeau@fb0d7bf

@rwest
Copy link
Collaborator Author

rwest commented Apr 18, 2019

Thanks. I took your (relevant) changes to the catlabel branch.
Looks like it's working.

Though I just noticed one small problem...
the reactant and product adjacancy lists are always the same!

============================================================================
28
vacantX(3) + O=C[Ni](35) <=> HX(5) + OCX(14)
Surface_Dissociation
reactant
multiplicity -187
1 *2 H u0 p0 c0 {4,S}
2 *4 X u0 p0 c0
3    O u0 p2 c0 {4,D}
4 *1 C u0 p0 c0 {1,S} {3,D} {5,S}
5 *3 X u0 p0 c0 {4,S}

product
multiplicity -187
1 *2 H u0 p0 c0 {4,S}
2 *4 X u0 p0 c0
3    O u0 p2 c0 {4,D}
4 *1 C u0 p0 c0 {1,S} {3,D} {5,S}
5 *3 X u0 p0 c0 {4,S}

============================================================================
29
O=C[Ni](35) + CHX(17) <=> OCX(14) + CH2X(16)
Surface_Abstraction
reactant
multiplicity -187
1 *3 C u0 p0 c0 {3,S} {4,T}
2 *4 H u0 p0 c0 {6,S}
3    H u0 p0 c0 {1,S}
4 *5 X u0 p0 c0 {1,T}
5    O u0 p2 c0 {6,D}
6 *1 C u0 p0 c0 {2,S} {5,D} {7,S}
7 *2 X u0 p0 c0 {6,S}

product
multiplicity -187
1 *3 C u0 p0 c0 {3,S} {4,T}
2 *4 H u0 p0 c0 {6,S}
3    H u0 p0 c0 {1,S}
4 *5 X u0 p0 c0 {1,T}
5    O u0 p2 c0 {6,D}
6 *1 C u0 p0 c0 {2,S} {5,D} {7,S}
7 *2 X u0 p0 c0 {6,S}

============================================================================

@rwest
Copy link
Collaborator Author

rwest commented Apr 18, 2019

Think this is now funcitoning.
@ehermes does this look like it should be useful format?

Any tweaks to make it easier on downstream processing? should be easy enough to apply a template to make it yaml or something.

25
vacantX(3) + HOCOX(37) <=> HOX(8) + OCX(14)
Surface_Dissociation
reactant
multiplicity -187
1 *2 O u0 p2 c0 {2,S} {3,S}
2    H u0 p0 c0 {1,S}
3 *4 X u0 p0 c0 {1,S}
4    O u0 p2 c0 {5,D}
5 *1 C u0 p0 c0 {4,D} {6,D}
6 *3 X u0 p0 c0 {5,D}

product
multiplicity -187
1 *2 O u0 p2 c0 {2,S} {5,S}
2    H u0 p0 c0 {1,S}
3 *4 X u0 p0 c0
4    O u0 p2 c0 {5,D}
5 *1 C u0 p0 c0 {1,S} {4,D} {6,S}
6 *3 X u0 p0 c0 {5,S}

============================================================================
26
CH2X(16) + O=C[Ni](35) <=> CH3X(7) + OCX(14)
Surface_Abstraction
reactant
multiplicity -187
1 *3 C u0 p0 c0 {2,S} {3,S} {4,S} {5,S}
2 *4 H u0 p0 c0 {1,S}
3    H u0 p0 c0 {1,S}
4    H u0 p0 c0 {1,S}
5 *5 X u0 p0 c0 {1,S}
6    O u0 p2 c0 {7,D}
7 *1 C u0 p0 c0 {6,D} {8,D}
8 *2 X u0 p0 c0 {7,D}

product
multiplicity -187
1 *3 C u0 p0 c0 {3,S} {4,S} {5,D}
2 *4 H u0 p0 c0 {7,S}
3    H u0 p0 c0 {1,S}
4    H u0 p0 c0 {1,S}
5 *5 X u0 p0 c0 {1,D}
6    O u0 p2 c0 {7,D}
7 *1 C u0 p0 c0 {2,S} {6,D} {8,S}
8 *2 X u0 p0 c0 {7,S}

============================================================================

Each reaction family (eg Surface_Abstraction) has a recipe that defines the *1, *2, etc. labels in terms of bonds being maken, broken, etc, eg lines 21-26 of this example

@rwest
Copy link
Collaborator Author

rwest commented Apr 18, 2019

OK, it's now yaml.

  - index: 6
    reaction: HOX(8) + CH2X(16) <=> OX(6) + CH3X(7)
    reaction_family: Surface_Abstraction
    reactant: |
        multiplicity -187
        1 *3 C u0 p0 c0 {2,S} {3,S} {4,S} {5,S}
        2 *4 H u0 p0 c0 {1,S}
        3    H u0 p0 c0 {1,S}
        4    H u0 p0 c0 {1,S}
        5 *5 X u0 p0 c0 {1,S}
        6 *1 O u0 p2 c0 {7,D}
        7 *2 X u0 p0 c0 {6,D}
    product: |
        multiplicity -187
        1 *3 C u0 p0 c0 {3,S} {4,S} {5,D}
        2 *4 H u0 p0 c0 {6,S}
        3    H u0 p0 c0 {1,S}
        4    H u0 p0 c0 {1,S}
        5 *5 X u0 p0 c0 {1,D}
        6 *1 O u0 p2 c0 {2,S} {7,S}
        7 *2 X u0 p0 c0 {6,S}

adjacency_lists.txt

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants