You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It's not quite clear how vectors from primary capsules are squashed. As far as I can see from the code, each primary capsule outputs a vector of size 32 * 6 * 6. Then these vectors are stacked and, considering the batch dimension, we get a tensor of the shape
Finally, these vectors are normalized, i.e. their magnitudes are squashed to be in the range from 0 to 1. If I understand correctly, the paper is referring to the magnitude of the (32 * 6 * 6)-dimensional vectors. So if we want to ensure that the length of these vectors is in range [0; 1], we would have to divide each of the (36 * 6 * 6) coordinates by the square root of the sum of squares of these coordinates. Right? In fact, the implementation divides each coordinate by the magnitude of a vector comprised of the coordinates in the same positions of all capsule vectors (dim is set to -1 when calculating squared_norm, i.e. it sums up same coordinates/features, but from different capsules).
Please, consider the following example:
importtorchimportnumpyasnpdefsquash(input_tensor):
'''Squashes an input Tensor so it has a magnitude between 0-1. param input_tensor: a stack of capsule inputs, s_j return: a stack of normalized, capsule output vectors, v_j '''squared_norm= (input_tensor**2).sum(dim=-1, keepdim=True)
scale=squared_norm/ (1+squared_norm) # normalization coeffoutput_tensor=scale*input_tensor/torch.sqrt(squared_norm)
returnoutput_tensornp.random.seed(1)
torch.manual_seed(1)
batch_size=15dim=13n_caps=7u= [torch.tensor(np.random.rand(batch_size, dim, 1)) foriinrange(n_caps) ]
#print(u)u=torch.cat(u, dim=-1)
print("u:", u)
u_squash=squash(u)
print("u_squash:", u_squash)
mag=torch.sqrt( (u_squash**2).sum(dim=-2) )
print("mag: ", mag)
Here I create a randomly filled tensor of shape (batch_size, dim, n_caps), i.e. similar to those produced by the primary capsules, just smaller for illustration. The tensor is squashed by the same squash function. It can be seen from the output that the magnitudes of the vectors exceeds the range [0; 1]:
It actually enforces the magnitudes of vectors comprised of particular coordinates from different capsule outputs to be in that range. But is it what was intended?
The text was updated successfully, but these errors were encountered:
It's not quite clear how vectors from primary capsules are squashed. As far as I can see from the code, each primary capsule outputs a vector of size 32 * 6 * 6. Then these vectors are stacked and, considering the batch dimension, we get a tensor of the shape
Finally, these vectors are normalized, i.e. their magnitudes are squashed to be in the range from 0 to 1. If I understand correctly, the paper is referring to the magnitude of the (32 * 6 * 6)-dimensional vectors. So if we want to ensure that the length of these vectors is in range [0; 1], we would have to divide each of the (36 * 6 * 6) coordinates by the square root of the sum of squares of these coordinates. Right? In fact, the implementation divides each coordinate by the magnitude of a vector comprised of the coordinates in the same positions of all capsule vectors (dim is set to -1 when calculating squared_norm, i.e. it sums up same coordinates/features, but from different capsules).
Please, consider the following example:
Here I create a randomly filled tensor of shape (batch_size, dim, n_caps), i.e. similar to those produced by the primary capsules, just smaller for illustration. The tensor is squashed by the same squash function. It can be seen from the output that the magnitudes of the vectors exceeds the range [0; 1]:
It actually enforces the magnitudes of vectors comprised of particular coordinates from different capsule outputs to be in that range. But is it what was intended?
The text was updated successfully, but these errors were encountered: