-
Notifications
You must be signed in to change notification settings - Fork 657
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat
] Implementing SmoothAP loss
#721
base: dev
Are you sure you want to change the base?
Conversation
Are there any plans on merging this or would you like me to make some changes? |
|
||
# Implementation is based on the original repository: | ||
# https://github.com/Andrew-Brown1/Smooth_AP/blob/master/src/Smooth_AP_loss.py#L87 | ||
def compute_loss(self, embeddings, labels, iices_tuple, ref_emb, ref_labels): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo, but also is there any possibility of supporting mining, or reference embeddings (ref_emb)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure I cannot add reference embeddings as the loss uses in-batch similarities for calculation. I've supported mining similar to the FastAP loss.
mask = 1.0 - torch.eye(batch_size) | ||
mask = mask.unsqueeze(dim=0).repeat(batch_size, 1, 1) | ||
|
||
sims = F.cosine_similarity( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be nice to use self.distance, and loosen the restriction on self.distance being CosineSimilarity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to use euclidean distance, however, I have not seen a "by the book" approach for this. Since the original exclusively provides an implementation using cosine similarity, I've got two ideas that look like they could work:
- Multiplying the distances by
-1
- (prior to the first step in the figure) - Subtracting the predicted ranks from the highest rank value - (after the final step in the figure)
Do you prefer any of these ideas or have a third idea?
EDIT: I've put some more thought into this, and I don't think either of these will work out. The first option seems problematic because H(x)
is replaced by the sigmoid function which means all the values (except one) will be negative, resulting in very small output values. The problem with the second option is opposite, as the values going into the sigmoid will all be positive, outputting values close to 1. Cosine similarity seems great here as you can get both positive and negative values and extract the most out of the sigmoid. I would rather not reinvent the wheel.
"loss": { | ||
"losses": loss, | ||
"indices": None, | ||
"reduction_type": "already_reduced", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any possibility of returning an un-reduced loss?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added.
I think the changes resolve your concerns. Let me know if I missed anything. |
Hi,
this PR implements the loss presented in the paper Smooth-AP: Smoothing the Path Towards Large-Scale Image Retrieval. The default hyperparameters are chosen using the values that resemble a heavyside step function, as suggested by the authors.
Aside from the implementation, I also added a section in the docs and a test case which uses the original implementation to calculate the loss.