Squash function #32

puhach · 2019-10-29T19:43:07Z

It's not quite clear how vectors from primary capsules are squashed. As far as I can see from the code, each primary capsule outputs a vector of size 32 * 6 * 6. Then these vectors are stacked and, considering the batch dimension, we get a tensor of the shape

(batch_size, num_nodes_in_capsule = 32 * 6 * 6, num_capsules = 8)

Finally, these vectors are normalized, i.e. their magnitudes are squashed to be in the range from 0 to 1. If I understand correctly, the paper is referring to the magnitude of the (32 * 6 * 6)-dimensional vectors. So if we want to ensure that the length of these vectors is in range [0; 1], we would have to divide each of the (36 * 6 * 6) coordinates by the square root of the sum of squares of these coordinates. Right? In fact, the implementation divides each coordinate by the magnitude of a vector comprised of the coordinates in the same positions of all capsule vectors (dim is set to -1 when calculating squared_norm, i.e. it sums up same coordinates/features, but from different capsules).

Please, consider the following example:

import torch
import numpy as np

def squash(input_tensor):
    '''Squashes an input Tensor so it has a magnitude between 0-1.
        param input_tensor: a stack of capsule inputs, s_j
        return: a stack of normalized, capsule output vectors, v_j
        '''
    squared_norm = (input_tensor ** 2).sum(dim=-1, keepdim=True)    
    scale = squared_norm / (1 + squared_norm) # normalization coeff
    output_tensor = scale * input_tensor / torch.sqrt(squared_norm)    
    return output_tensor

np.random.seed(1)
torch.manual_seed(1)
batch_size = 15
dim=13
n_caps = 7
u = [torch.tensor(np.random.rand(batch_size, dim, 1)) for i in range(n_caps) ] 
#print(u)
u = torch.cat(u, dim=-1)
print("u:", u)

u_squash = squash(u)
print("u_squash:", u_squash)

mag = torch.sqrt( (u_squash **2).sum(dim=-2) )
print("mag: ", mag)

Here I create a randomly filled tensor of shape (batch_size, dim, n_caps), i.e. similar to those produced by the primary capsules, just smaller for illustration. The tensor is squashed by the same squash function. It can be seen from the output that the magnitudes of the vectors exceeds the range [0; 1]:

mag:  tensor([[0.6629, 1.0954, 0.9715, 0.7817, 1.0211, 0.7117, 0.8847],
        [1.0202, 0.9313, 0.8816, 0.8383, 1.0355, 0.9926, 1.0803],
        [0.8864, 1.0694, 0.7617, 0.9194, 0.8355, 0.9432, 1.0051],
        [0.9630, 0.9198, 0.9078, 1.0516, 0.8845, 0.7888, 0.9238],
        [0.6996, 1.0998, 1.1319, 0.6556, 0.8243, 0.9571, 0.9614],
        [0.9705, 0.9879, 0.8915, 0.8308, 1.0063, 1.0607, 0.9306],
        [1.0569, 1.0294, 0.9268, 1.0508, 0.9768, 0.9505, 0.8103],
        [0.9545, 0.9655, 0.9052, 1.0720, 0.7246, 0.9666, 0.9669],
        [1.1237, 0.9768, 0.9749, 0.8128, 0.8935, 0.9216, 0.7607],
        [0.8785, 0.7155, 0.8306, 0.8913, 0.9764, 0.9692, 1.0892],
        [0.9691, 0.8658, 1.0399, 0.9774, 0.9309, 0.8950, 0.8872],
        [0.7124, 1.1386, 0.8535, 1.0913, 0.8478, 0.8779, 0.9850],
        [0.8909, 0.9851, 0.9247, 1.0239, 0.7927, 0.9618, 0.7925],
        [0.8764, 0.9524, 0.9294, 0.8517, 0.8385, 0.9380, 1.0824],
        [1.0076, 0.8668, 1.0051, 0.9030, 1.0067, 0.8850, 0.9519]],
       dtype=torch.float64)

It actually enforces the magnitudes of vectors comprised of particular coordinates from different capsule outputs to be in that range. But is it what was intended?

The text was updated successfully, but these errors were encountered:

guotong1988 · 2021-02-15T12:55:25Z

Same question.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Squash function #32

Squash function #32

puhach commented Oct 29, 2019 •

edited

Loading

guotong1988 commented Feb 15, 2021

Squash function #32

Squash function #32

Comments

puhach commented Oct 29, 2019 • edited Loading

guotong1988 commented Feb 15, 2021

puhach commented Oct 29, 2019 •

edited

Loading