Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-Equivariant BatchNorm #16

Open
oriondollar opened this issue Feb 10, 2023 · 2 comments
Open

Non-Equivariant BatchNorm #16

oriondollar opened this issue Feb 10, 2023 · 2 comments

Comments

@oriondollar
Copy link

Hi there,

I've been playing with your codebase to see how equivariant features propagate through different layer types and I think there might be an error in your code. The AttentionInteractionBlockVN normalizes the vector representation with a standard nn.LayerNorm layer which breaks the equivariance of the vector representations inside the encoder. Was this intended? I'm not sure how much of an effect it will have on the rest of the model as the ligand and pocket are jointly encoded. Similarly, a standard nn.Linear layer is used to embed the initial atomic vector representation which also breaks the initial maintenance of equivariance between the atomic coordinates and the machine learned embeddings.

@pengxingang
Copy link
Owner

Hi, thanks for your interest in our work.

First, the layer normalization is defined as
$$y=\frac{x - \mathbf{E}[x]}{\sqrt{\mathbf{Var}[x] - \epsilon}} * \gamma + \beta.$$ The layer norm for the vector features were defined as self.layernorm_vec = LayerNorm([hidden_channels[1], 3]), and the normalization operate on the feature dimension [hidden_channels[1], 3]. The operation $$\frac{x - \mathbf{E}[x]}{\sqrt{\mathbf{Var}[x] - \epsilon}}$$ did not violate the equivarance because $\mathbf{Var}[R\circ x]=\mathbf{Var}[x]$ and $R\circ(x-\mathbf{E}[x]) = R\circ x-\mathbf{E}[R\circ x]$ where $R$ is an operation in E3 group. Then if $\gamma$ and $\beta$ are scalar values, this layer normalization will keep equivarance. Unfortunately, the default settings of torch.nn.LayerNorm defines $\gamma$ and $\beta$ as tensors with the same shape as the feature dimensions so the original code did violate the equivarance. But this can be solved by setting the parameter elementwise_affine as False for the torch.nn.LayerNorm.

Actually the layernorm layers were added to the model to stabilize the training process. We did not notice that $\gamma$ and $\beta$ of the layers were not scalars. The impact on the equivarance of the whole model can possible be speculated from the values of $\gamma$ and $\beta$ of the trained model.

Second, for the linear layer for the initial vector embedding, it may not be equivarant to the translation operation. This is easy to solve by translating all the center of mass of pocket to the origin of the coordinate. Besides, we speculated that this non-equivariant layer may have minor effect because the substraction of two vectors remained unaffected and equivariant. The model had a chance to learn from the equivariant parts,such as adapted the weights to focus on the substraction of two vector features.

Overall, these two layers had impact on the equivariance of the model. The solutions are easy as discussed above, and the influence on the model performance might not be large. Thanks again for your careful reading and pointing out this problem.

@yangxiufengsia
Copy link

yangxiufengsia commented May 15, 2023

Hi, nice paper published. Retarding to your E(3) model, I think your model might be not translational equivariant. I have similar questions with @oriondollar.
(1) In your model, absolute coordinates were directly used as input vector features. So, I am wondering your model can not work for other test data having different absolute coordinates. As you mentioned the solution above, that we can translate test coordinates to the center of mass of training data (absolute coordinates). But In real practice, this is often impossible.
(2) You also mentioned that the embedding layer has minor effect on the translational equivariance and the model still learns the translational equivariance since the substraction of vectors remain unchanged. But what is the purpose of embedding layer? Since from my test, it doesn't work if I used a different protein-ligand data outside training data coordinate space.

Do you have any ideas to solve this issue? Looking forward to your reply. Thank you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants