You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In this way, we might aggregate the attention weight and the convolution weight together. However, this may cause another problem. If batch size ($\mathcal{B}$) is larger than 1, the attention weight would be a matrix with $\mathcal{B} \times K$, I think we can use torch.mean(attention_weight, dim=0) or torch.max(attention_weight, dim=0) since they are calculated within the batch, in which the range is very close.
I am not sure whether this calculation is equivalent to the line :)
The text was updated successfully, but these errors were encountered:
Thank you for providing this reproduction!
I have a question on the grouped convolution: in this line you use the grouped convolution to solve the mini-batch training problem.
Could we use the
torch.Tensor.expand
to replace the grouped convolution, like:In this way, we might aggregate the attention weight and the convolution weight together. However, this may cause another problem. If batch size ($\mathcal{B}$ ) is larger than 1, the attention weight would be a matrix with $\mathcal{B} \times K$ , I think we can use
torch.mean(attention_weight, dim=0)
ortorch.max(attention_weight, dim=0)
since they are calculated within the batch, in which the range is very close.I am not sure whether this calculation is equivalent to the line :)
The text was updated successfully, but these errors were encountered: