Add GRIT model #777

pweigel · 2024-12-17T20:16:04Z

GRIT: "Graph Inductive Biases in Transformers without Message Passing"

This PR includes a new model based on the GRIT transformer. It uses novel methods for encoding graph information for use in sparse multi-head attention blocks. It uses a learned position encoding based on random walk probabilities, which enhances the model's expressivity.

PMLR: https://proceedings.mlr.press/v202/ma23c.html
Paper pre-print: https://arxiv.org/abs/2305.17589

Many layers/functions are adapted from the original repository: https://github.com/LiamMa/GRIT/tree/main. The original code uses graphgym to set up most of its modules, so I refactored some things to fit into graphnet. Many of the arguments have been relabeled to be more self-explanatory. In principle, other graph attention mechanisms could be used by replacing the GRIT MHA block.

Since there are a lot of changes, I will quickly summarize the significant new additions and modifications to existing files:

src/graphnet/models/components/embedding.py: Adds four encoding modules, which apply linear and RRWP encodings to nodes and edges.
src/graphnet/models/components/layers.py: Adds GRIT sparse multi-head attention, the core of the GRIT model, and the GRIT transformer layer.
src/graphnet/models/gnn/grit.py: Implementation of GRIT as a GNN model.
src/graphnet/models/graphs/edges/edges.py: Added a variant of the KNNEdge (KNNDistanceEdges) to include edge values corresponding to the distance between the node pair.
src/graphnet/models/graphs/graphs.py: Added a variant of the KNNGraph to use the KNNDistanceEdges and compute the RRWP values.
src/graphnet/models/utils.py: Added new utilities needed for the RRWP calculation and other miscellaneous tools used by the GRIT transformer layers.
examples/04_training/08_train_grit_model.py: Example training script using GRIT.

This model has many hyperparameters, but the defaults should provide a good starting point. It should be noted that the GPU memory required to train this model is quite high due to the use of global attention.

pweigel added 15 commits December 14, 2024 08:05

Add GRIT layers and model

b2a8b25

Move linear encoders to embedding.py

c7b8dba

Add GRIT to __init__.py

4d6ad04

Fix imports, add RRWP utils and graph

8a8c9fb

Cleaning up grit model/layers

3d52f70

Cleaning up GRIT layers

4fecd5b

Remove duplicate pyg_softmax function

02d4a0d

Merge the attention calc into forward, reduce amount of saved data

924d165

Significant improvements to naming, docstrings

7ccb6f0

Remove TODO

bb232e9

Fix normalization layers

1a2d80c

Updating new graph/edge definitions

5c4b0aa

Added example training script, bug fixes

c861d9d

Improving SANGraphHead and simplify dims

e540b0e

Merge branch 'graphnet-team:main' into grit

1e46b75

pweigel requested review from RasmusOrsoe and Aske-Rosted December 17, 2024 20:16

Add newline to make flake8 happy

48d5e85

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add GRIT model #777

Add GRIT model #777

pweigel commented Dec 17, 2024 •

edited

Loading

Add GRIT model #777

Are you sure you want to change the base?

Add GRIT model #777

Conversation

pweigel commented Dec 17, 2024 • edited Loading

GRIT: "Graph Inductive Biases in Transformers without Message Passing"

pweigel commented Dec 17, 2024 •

edited

Loading