Skip to content

Latest commit

 

History

History
39 lines (33 loc) · 3.15 KB

GPT-GNN.md

File metadata and controls

39 lines (33 loc) · 3.15 KB

Key ideas

  • Task-specific labeled data is usually required for GNNs which is arduous to obtain.
  • Pre-training expressive GNN model on unlabeled data with self-supervision and the transfer the learned model to downstream tasks with only a few labels.
  • Likelihood of graph generation into two components: attribute generation and edge generation

Introduction

  • GNNs for semi-supervised node classification, recsys and knowledge graph inference
  • Input: graph with attributes - conv filters generate node-level representations layer by layer
  • Similar to BERT pre-trained model, you can train in an unlabeled corpus and then transfer the model to downstream tasks with fea labels.
  • NN-generation techniques don't work for GNNs because they generate graph structure without attributes. Also their scale is limited.

Screenshot 2022-04-04 at 13 52 33

* Attribute generation and edge generation joint optimization == maximizing probability likelihood of the whoel attributed graph * OAG: open academic grpah of 179M nodes & 2B edges successfully done.

Preliminaries and related work

  • Assuming H_t is the node representation of node t at lth GNN layer.
  • N_t are all of the source nodes of node t. E(s,t) all edges from s to t.

Screenshot 2022-04-04 at 13 55 29

  • Aggregate: aggregation from neighborhood information (mean, max, sum)
  • Extract: neighborhood info extractor
  • Variational Graph Auto-Encoders for reconstructing the graph structure OR (Velickovic et al: Graph Infomax)
  • InfoGraph: maximizes mutual information between graph-level representations from GNNs

Generative pretraining of GNNs:

  • Input to GNN: G=(V,E,X) node, edge, node-feature matrix
  • GPT framework: the problem is how to design an unsupervised learning task over the graph for pre-training the GNN model
  • How to get θ^* = max_θ p(G; θ)
  • Most existing graph generation methods follow auto-regressive manner to factorize the probability objective:
    • i.e: nodes come in an order and edges are generated by connecting new arriving nodes to existing nodes
    • If we have a permutation vector pi, the target graph distribution can be equivalent to the expected likelihood over all permutations
    • Screenshot 2022-04-04 at 14 17 02
    • Given a permutated order, we cna factorize the log-likelihood autoregressively generating one node per iteration:
    • Screenshot 2022-04-04 at 14 17 54
  • Screenshot 2022-04-04 at 14 18 34