Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parameter Efficiency of ConvE #64

Open
liu-jc opened this issue Jun 20, 2020 · 3 comments
Open

Parameter Efficiency of ConvE #64

liu-jc opened this issue Jun 20, 2020 · 3 comments
Labels

Comments

@liu-jc
Copy link

liu-jc commented Jun 20, 2020

In your paper, you claim ConvE uses less parameter compared with DistMult. But I think in your code DistMult only uses O(num_entitiesemdding_dim + num_relsembedding_dim) and ConvE uses more parameters. I am a bit confused about your claim. I am afraid I missed something. Can you point out how to verify this claim? Thanks!

@TimDettmers
Copy link
Owner

It is a bit surprising that DistMult is so large even though it scales linearly but the issue is that knowledge graphs can be large and it scales with the size of the knowledge graph while the convolution and the projection matrix in ConvE scale independently from the knowledge graph.

If you run the models the parameter size is printed, but let me recalculate it by hand for some numbers in the paper to convince you about this claim. In the paper, I claim an embedding size of 128 for DistMult and 96 for ConvE is roughly equivalent in parameters for FB15k-237 (14541 entities and 237 relationships):
DistMult: (14541+237)*128 = 1891584 ~ 1.89M
ConvE: (14541+237)*96 + 3*3*8 + 4224*96 = 1824264 ~ 1.82M

For ConvE I did not include the bias terms and I used a 2D embedding of size 12x8 which is stacked to 12x16 via [e1;rel]. 3*3*8 are the convolution parameters and 4224*96 the output projection parameters. Note that the output matrix is just the transpose of the entity embedding matrix and does not add any new parameters.

@liu-jc
Copy link
Author

liu-jc commented Jun 20, 2020

Really appreciate your reply! I got it. So the point is that ConvE can use only fewer parameters to achieve a similar performance compared with DistMult, right?
I still have a question about 2D conv. When a 3x3 filter is applied on the left part of 12x16 matrix, like [:,:3], it cannot model the interaction between e_s and r_r as the data here only contain the information of e_s. So the 3x3 filter only works for modelling the interaction around the intersection of concatenation. Is it correct? Can you provide some insights about this? Thanks!

@TimDettmers
Copy link
Owner

Yes, that is correct! I also tried to have an alternating, checker pattern between both embeddings which takes the idea to the extreme, but this did not help more than just concatenating. My intuition is that sometimes you just want to model an entity or relationship on its own, meaning you want to model information that is relationship/entity independent. Having a separate region for this could help with modeling this kind of information.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants