Parameter Efficiency of ConvE #64

liu-jc · 2020-06-20T07:07:56Z

In your paper, you claim ConvE uses less parameter compared with DistMult. But I think in your code DistMult only uses O(num_entitiesemdding_dim + num_relsembedding_dim) and ConvE uses more parameters. I am a bit confused about your claim. I am afraid I missed something. Can you point out how to verify this claim? Thanks!

TimDettmers · 2020-06-20T13:21:44Z

It is a bit surprising that DistMult is so large even though it scales linearly but the issue is that knowledge graphs can be large and it scales with the size of the knowledge graph while the convolution and the projection matrix in ConvE scale independently from the knowledge graph.

If you run the models the parameter size is printed, but let me recalculate it by hand for some numbers in the paper to convince you about this claim. In the paper, I claim an embedding size of 128 for DistMult and 96 for ConvE is roughly equivalent in parameters for FB15k-237 (14541 entities and 237 relationships):
DistMult: (14541+237)*128 = 1891584 ~ 1.89M
ConvE: (14541+237)*96 + 3*3*8 + 4224*96 = 1824264 ~ 1.82M

For ConvE I did not include the bias terms and I used a 2D embedding of size 12x8 which is stacked to 12x16 via [e1;rel]. 3*3*8 are the convolution parameters and 4224*96 the output projection parameters. Note that the output matrix is just the transpose of the entity embedding matrix and does not add any new parameters.

liu-jc · 2020-06-20T14:03:56Z

Really appreciate your reply! I got it. So the point is that ConvE can use only fewer parameters to achieve a similar performance compared with DistMult, right?
I still have a question about 2D conv. When a 3x3 filter is applied on the left part of 12x16 matrix, like [:,:3], it cannot model the interaction between e_s and r_r as the data here only contain the information of e_s. So the 3x3 filter only works for modelling the interaction around the intersection of concatenation. Is it correct? Can you provide some insights about this? Thanks!

TimDettmers · 2020-07-01T20:27:51Z

Yes, that is correct! I also tried to have an alternating, checker pattern between both embeddings which takes the idea to the extreme, but this did not help more than just concatenating. My intuition is that sometimes you just want to model an entity or relationship on its own, meaning you want to model information that is relationship/entity independent. Having a separate region for this could help with modeling this kind of information.

TimDettmers added the question label Jun 20, 2020

TimDettmers mentioned this issue Sep 10, 2020

Best Hyperparameter Settings #66

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameter Efficiency of ConvE #64

Parameter Efficiency of ConvE #64

liu-jc commented Jun 20, 2020

TimDettmers commented Jun 20, 2020

liu-jc commented Jun 20, 2020

TimDettmers commented Jul 1, 2020

Parameter Efficiency of ConvE #64

Parameter Efficiency of ConvE #64

Comments

liu-jc commented Jun 20, 2020

TimDettmers commented Jun 20, 2020

liu-jc commented Jun 20, 2020

TimDettmers commented Jul 1, 2020