Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about methods Cosine and Capsule in the paper #26

Open
yypurpose opened this issue Dec 13, 2020 · 3 comments
Open

Question about methods Cosine and Capsule in the paper #26

yypurpose opened this issue Dec 13, 2020 · 3 comments

Comments

@yypurpose
Copy link

Hi @KaihuaTang , thank you for doing such an inspiring job and opening the source code. Would you mind telling some details of the methods Cosine and Capsule in the Table 2 of the paper.

  1. I got some confused because I don't know the cosine similarity is used in the training phase or just test phase. I got no performance improvement when using it in the training phase for both cifar100-LT and ImageNet-LT.

  2. For Capsule, I also don't know how to do it.

Thanks a lot!

@KaihuaTang
Copy link
Owner

As far as I know, the Cosine Classifier is firstly used by https://github.com/zhmiao/OpenLongTailRecognition-OLTR in the Long-Tailed Classification field. If your cosine classifier didn't improve the performance, it probably because you didn't add a scale parameter to the logits (16.0 in OLTR and 16.0 / Num_Head in our project). The scale parameter will change the distribution after the softmax activation, and thus accelerating the gradient descent.

@KaihuaTang
Copy link
Owner

KaihuaTang commented Dec 13, 2020

As to the Capsule Classifier, it just replaces the cosine norm with another normalization introduced in the https://arxiv.org/abs/1710.09829

@yypurpose
Copy link
Author

As far as I know, the Cosine Classifier is firstly used by https://github.com/zhmiao/OpenLongTailRecognition-OLTR in the Long-Tailed Classification field. If your cosine classifier didn't improve the performance, it probably because you didn't add a scale parameter to the logits (16.0 in OLTR and 16.0 / Num_Head in our project). The scale parameter will change the distribution after the softmax activation, and thus accelerating the gradient descent.

Thank you for your prompt reply! I added a scale actually, but may not be the good one. Anyway, I'll try again.
Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants