deep_graph_1.html

<!DOCTYPE html>
<html>
  <head>
    <title>Deep Learning on Graphs  [Marc Lelarge]</title>
   <meta http-equiv="Content-Type" content="text/html; charset=UTF-8"/>
   <link rel="stylesheet" href="./assets/katex.min.css">  
   <link rel="stylesheet" type="text/css" href="./assets/slides.css">
   <link rel="stylesheet" type="text/css" href="./assets/grid.css">    
  </head>
  <body>

<textarea id="source">

class: center, middle, title-slide
count: false

# Deep Learning on Graphs<br/>
# (1/3)


<br/><br/>
.bold[Marc Lelarge]

.bold[[www.dataflowr.com](https://www.dataflowr.com)]

---

# (1) Node embedding

## Language model

### one fixed graph, no signal. Ex: community detection


# .gray[(2) Signal processing on graphs]

## .gray[Fourier analysis on graphs]

### .gray[one fixed graph, various signals. Ex: classification of signals]

# .gray[(3) Graph embedding]

## .gray[Graph Neural Networks]

### .gray[various graphs. Ex: classification of graphs]

---

# Node embedding

## Language model

### one fixed graph, no signal. Ex: community detection

- [DeepWalk](https://arxiv.org/abs/1403.6652)

 - hierarchical softmax

- [node2vec](https://snap.stanford.edu/node2vec/)

 - negative sampling

---

# Language model

- goal: to estimate the likelihood of a specific sequence of words appearing in a corpus.

- formally: to maximize $p(w\_{n} \| w\_{1}, ... , w\_{n-1})$ where $w\_i$ are in the vocabulary $V$.

--
count: false

## skip-gram model

.center.width-50[![](images/graphs/skipgram.png)]

- from the text, construct $D$ the multiset of all word and context pairs: $(w,c)$.

---
# skip-gram model

.center.width-50[![](images/graphs/skipgram.png)]

- from the text, construct $D$ the multiset of all word and context pairs: $(w,c)$.

- goal: maximize $\prod\_{(w,c)\in D} p(c\| w ; \theta)$ in the parameter $\theta$.

- parametrization of the conditional probability thanks to a softmax:

$$
p(c\|w; \theta) = \frac{e^{u\_c \cdot u\_w}}{\sum\_{c'\in C}e^{u\_{c'} \cdot u\_w}},
$$
where $u\_c$ and $u\_w$ are vector representations for $c$ and $w$ and $C$ is the set of all available contexts.


---
# Hierarchical softmax

.center.width-30[![](images/graphs/hierarchical.png)]


- Assign the contexts to the leaves of a binary tree and use the parametrization:

$$
p(c\|w; \theta) = \prod\_{b\in \pi(c)} \sigma(h\_b \cdot u\_w),
$$
where $\pi(c)$ is the path from the root to the leaf $c$ and $\sigma$ is the sigmoid function:

$$
\sigma(x) = \frac{1}{1+e^{-x}}.
$$

- reduce computational complexity from $O(|C|)$ to $O(\log|C|)$.

---

# DeepWalk

- a sentence = a random walk on the graph
- Hierarchical softmax speed up: assign shorter paths to frequent contexts using Huffman coding.

--
count: false

<img align="left" width="500" src="images/graphs/deep_walk1.png"><img align="right" width="500" src="images/graphs/deep_walk2.png">


---

# Negative sampling

.center.width-50[![](images/graphs/skipgram.png)]

- introduce $B = \mathbf{1}((w,c)\in D)$ with the parameterization

$$
p(b=1|w,c,\theta) = \sigma(u\_w \cdot u\_c)=\frac{1}{1+e^{-u\_w \cdot u\_c}} .
$$

- instead of maximizing $\prod\_{(w,c)\in D} p(c\| w ; \theta)$, we now maximize $\prod\_{(w,c)\in D} p(b=1 |c, w ; \theta)$

---
count: false

# Negative sampling

- introduce $B = \mathbf{1}((w,c)\in D)$ with the parameterization

$$
p(b=1|w,c,\theta) = \sigma(u\_w \cdot u\_c)=\frac{1}{1+e^{-u\_w \cdot u\_c}} .
$$

- instead of maximizing $\prod\_{(w,c)\in D} p(c\| w ; \theta)$, we now maximize $\prod\_{(w,c)\in D} p(b=1 |c, w ; \theta)$

.red[Problem, this has a trivial solution! We only have positive examples!]

--
count: false

- create negative examples $D'$ and maximize in $\theta$:

$$
\sum\_{(w,c)\in D}\log \sigma(u\_w \cdot u\_c) + \sum_{(w,c)\in D'}\log \sigma(-u\_w \cdot u\_c)
$$

- in [Mikolov et al.](http://papers.nips.cc/paper/5021-distributed-representations-of-words-andphrases) $D'$ is $k$ times larger than $D$, and for each $(w,c)\in D$, we construct $k$ samples $(w,c\_1) ... (w,c\_k)$ where $c\_i$ is drawn according to its empirical distribution raised to the $3/4$ power.

---

# node2vec

- parameterization of the skip-gram model approximated thanks to negative sampling

- notion of context obtained thanks to biased random walk.

.center.width-50[![](images/graphs/node2vec_rw.png)]

---

# some results

.center.width-40[![](images/graphs/node2vec_miserables.png)]

---
# some results

In the multi-label classification setting, every node is assigned one or more labels from a finite set. During training phase, a certain fraction of nodes with all their labels is observed. The task is to predict the labels for the remaining nodes.

.center.width-50[![](images/graphs/node2vec_res.png)]

--
count: false

Guess from which paper these results are taken from?

.center.width-10[![](images/graphs/smiley.png)]

---

# (1) Node embedding

## Language model

### one fixed graph, no signal. Ex: community detection


# .gray[(2) Signal processing on graphs]

## .gray[Fourier analysis on graphs]

### .gray[one fixed graph, various signals. Ex: classification of signals]

# .gray[(3) Graph embedding]

## .gray[Graph Neural Networks]

### .gray[various graphs. Ex: classification of graphs]

---
class: end-slide, center
count: false

The end.

.bold[[www.dataflowr.com](https://www.dataflowr.com)]

</textarea> 
  <script src="./assets/remark-latest.min.js"></script>
  <script src="./assets/auto-render.min.js"></script>
  <script src="./assets/katex.min.js"></script>
  <script type="text/javascript">
    function getParameterByName(name, url) {
          if (!url) url = window.location.href;
          name = name.replace(/[\[\]]/g, "\\$&");
          var regex = new RegExp("[?&]" + name + "(=([^&#]*)|&|#|$)"),
              results = regex.exec(url);
          if (!results) return null;
          if (!results[2]) return '';
          return decodeURIComponent(results[2].replace(/\+/g, " "));
      }

      // var options = {sourceUrl: getParameterByName("p"),
      //                highlightLanguage: "python",
      //                // highlightStyle: "tomorrow",
      //                // highlightStyle: "default",
      //                highlightStyle: "github",
      //                // highlightStyle: "googlecode",
      //                // highlightStyle: "zenburn",
      //                highlightSpans: true,
      //                highlightLines: true,
      //                ratio: "16:9"};
      var options = {sourceUrl: getParameterByName("p"),
                     highlightLanguage: "python",
                     highlightStyle: "github",
                     highlightSpans: true,
                     highlightLines: true,
                     ratio: "16:9",
                     slideNumberFormat: (current, total) => `
        <div class="progress-bar-container">${current}/${total}&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br/><br/>
          <div class="progress-bar" style="width: ${current/total*100}%"></div>
        </div>
      `};

       var renderMath = function() {
          renderMathInElement(document.body, {delimiters: [ // mind the order of delimiters(!?)
              {left: "$$", right: "$$", display: true},
              {left: "$", right: "$", display: false},
              {left: "\\[", right: "\\]", display: true},
              {left: "\\(", right: "\\)", display: false},
          ]});
      }
    var slideshow = remark.create(options, renderMath);
  </script>
  </body>
</html>