Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any error about the softmax operation? #10

Open
donnyyou opened this issue Jan 2, 2018 · 5 comments
Open

Is there any error about the softmax operation? #10

donnyyou opened this issue Jan 2, 2018 · 5 comments

Comments

@donnyyou
Copy link

donnyyou commented Jan 2, 2018

"The coupling coefficients between capsule i and all the capsules in the layer above sum to 1", softmax should be computed along the channel of capsules, and you computed along the channel of route nodes.

@BoPang1996
Copy link

I think here is some problem too. The "dim" parameter for softmax should be 0 in my view of point.

@janericlenssen
Copy link

janericlenssen commented Apr 3, 2018

I second that. I think it should be dim=0. However, it does not train successfully if i change it.

@zzzz94
Copy link

zzzz94 commented May 7, 2018

I agree with @mrjel . But when I let dim=0, changed line 55 to
self.route_weights = nn.Parameter(0.01 * torch.randn(num_capsules, num_route_nodes, in_channels, out_channels))
and removed line 108(maybe not necessary ), I got 99.27% acc on the test set (epoch 5).
default

@tengteng95
Copy link

@zzzz94 hi, Thanks for you nice suggestions. I wonder why we need to set route_weights a relatively lower values by multiplying 0.01?
When I set dim=0, and remove line 108, the net works like a random guess and the accuracy is ~10%. However, it works well with a lower weight for route_weights.

Looking forward for your response!

@CoderHHX
Copy link

@zzzz94 Thanks for your solution firstly. @h982639009 notice that before setting dim = 0, the dimension of c_ij is 1132 where setting dim = 1, the dimension is 10. This small trick can make the input weights have similar magnitude. I think if you enlarge the learning rate at the start, this problem can be also solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants