Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Squash function #32

Open
puhach opened this issue Oct 29, 2019 · 1 comment
Open

Squash function #32

puhach opened this issue Oct 29, 2019 · 1 comment

Comments

@puhach
Copy link

puhach commented Oct 29, 2019

It's not quite clear how vectors from primary capsules are squashed. As far as I can see from the code, each primary capsule outputs a vector of size 32 * 6 * 6. Then these vectors are stacked and, considering the batch dimension, we get a tensor of the shape

(batch_size, num_nodes_in_capsule = 32 * 6 * 6, num_capsules = 8)

Finally, these vectors are normalized, i.e. their magnitudes are squashed to be in the range from 0 to 1. If I understand correctly, the paper is referring to the magnitude of the (32 * 6 * 6)-dimensional vectors. So if we want to ensure that the length of these vectors is in range [0; 1], we would have to divide each of the (36 * 6 * 6) coordinates by the square root of the sum of squares of these coordinates. Right? In fact, the implementation divides each coordinate by the magnitude of a vector comprised of the coordinates in the same positions of all capsule vectors (dim is set to -1 when calculating squared_norm, i.e. it sums up same coordinates/features, but from different capsules).

Please, consider the following example:

import torch
import numpy as np

def squash(input_tensor):
    '''Squashes an input Tensor so it has a magnitude between 0-1.
        param input_tensor: a stack of capsule inputs, s_j
        return: a stack of normalized, capsule output vectors, v_j
        '''
    squared_norm = (input_tensor ** 2).sum(dim=-1, keepdim=True)    
    scale = squared_norm / (1 + squared_norm) # normalization coeff
    output_tensor = scale * input_tensor / torch.sqrt(squared_norm)    
    return output_tensor

np.random.seed(1)
torch.manual_seed(1)
batch_size = 15
dim=13
n_caps = 7
u = [torch.tensor(np.random.rand(batch_size, dim, 1)) for i in range(n_caps) ] 
#print(u)
u = torch.cat(u, dim=-1)
print("u:", u)

u_squash = squash(u)
print("u_squash:", u_squash)

mag = torch.sqrt( (u_squash **2).sum(dim=-2) )
print("mag: ", mag)

Here I create a randomly filled tensor of shape (batch_size, dim, n_caps), i.e. similar to those produced by the primary capsules, just smaller for illustration. The tensor is squashed by the same squash function. It can be seen from the output that the magnitudes of the vectors exceeds the range [0; 1]:

mag:  tensor([[0.6629, 1.0954, 0.9715, 0.7817, 1.0211, 0.7117, 0.8847],
        [1.0202, 0.9313, 0.8816, 0.8383, 1.0355, 0.9926, 1.0803],
        [0.8864, 1.0694, 0.7617, 0.9194, 0.8355, 0.9432, 1.0051],
        [0.9630, 0.9198, 0.9078, 1.0516, 0.8845, 0.7888, 0.9238],
        [0.6996, 1.0998, 1.1319, 0.6556, 0.8243, 0.9571, 0.9614],
        [0.9705, 0.9879, 0.8915, 0.8308, 1.0063, 1.0607, 0.9306],
        [1.0569, 1.0294, 0.9268, 1.0508, 0.9768, 0.9505, 0.8103],
        [0.9545, 0.9655, 0.9052, 1.0720, 0.7246, 0.9666, 0.9669],
        [1.1237, 0.9768, 0.9749, 0.8128, 0.8935, 0.9216, 0.7607],
        [0.8785, 0.7155, 0.8306, 0.8913, 0.9764, 0.9692, 1.0892],
        [0.9691, 0.8658, 1.0399, 0.9774, 0.9309, 0.8950, 0.8872],
        [0.7124, 1.1386, 0.8535, 1.0913, 0.8478, 0.8779, 0.9850],
        [0.8909, 0.9851, 0.9247, 1.0239, 0.7927, 0.9618, 0.7925],
        [0.8764, 0.9524, 0.9294, 0.8517, 0.8385, 0.9380, 1.0824],
        [1.0076, 0.8668, 1.0051, 0.9030, 1.0067, 0.8850, 0.9519]],
       dtype=torch.float64)

It actually enforces the magnitudes of vectors comprised of particular coordinates from different capsule outputs to be in that range. But is it what was intended?

@guotong1988
Copy link

Same question.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants