Graph4KG is a flexible framework to learn embeddings of entities and relations in KGs, which supports training on massive KGs. The features are as follows:
- Batch Pre-loading. This overlaps the time of loading batch data for next step and GPU computations of current step.
- Storage and Computation Separation. Entity embeddings are stored on the disk and loaded in the mmap mode, while computations are conducted with GPUs.
- Asynchroneous Gradient Update. This also overlaps the computation time and gradient update time. In this case, there is at most four-step delay for gradient update. As KGs are always sparse, this asynchrony will not hurt performance.
Besides, it provides the 1st place solution in KDD Cup 2021.
- paddlepaddle-gpu>=2.3rc
- pgl
- ogb==1.3.1 (optional for wikikg2 and WikiKG90M)
paddle建议使用最新develop版本。
- TransE
- DistMult
- ComplEx
- RotatE
- OTE
You can implement your score function in models/score_func.py
. Besides shallow methods, CNN and GNN based methods are coming soon.
Negative Sampling: Negative samples are constructed by randomly replacing head or tail entities with other entities. Here we implement three uniform sampling strategies.
full
: Randomly sample entities from all entities in KGs.batch
: Randomly sample entities from entities arises in the same batch.chunk
: Randomly sample entities from all entities in KGs. Besides, triplets in a batch are divided into K chunks and each chunk shares the same collection of negative samples.
Dimension: embed_dim
in config.py
denoteds the dimension of real embeddings. Graph4KG will assign entity embeddings' dimension as embed_dim * 2
for complex methods like RotatE and ComplEx, and as embed_dim * 4
for quaternion methods like QuatE.
- FB15k
- FB15k-237
- WN18
- WN18RR
- ogbl-wikikg2
- WikiKG90M
Furthermore, other datasets formated as follows per line are also supported. You can add such new dataset in dataset/reader.py
.
HEAD_ENTITY\tRELATION\tTAIL_ENTITY\n
Scripts of different training settings are provided, including
- single-GPU
- mix-CPU-GPU + async-update
# download datasets
sh examples/download.sh
# FB15k
sh examples/fb15k.sh
# FB15k-237
sh examples/fb15k237.sh
# WN18
sh examples/wn18.sh
# WN18RR
sh examples/wn18rr.sh
# WikiKG90M
sh examples/wikikg90m.sh
Model | FB15k | FB15k-237 | WN18 | WN18RR |
---|---|---|---|---|
TransE | 0.655 | 0.316 | 0.571 | 0.189 |
DistMult | 0.746 | 0.322 | 0.823 | 0.441 |
ComplEx | 0.808 | 0.324 | 0.922 | 0.464 |
RotatE | 0.736 | 0.225 | 0.947 | 0.469 |
OTE | 0.617 | 0.299 | 0.812 | 0.466 |
Model | FB15k | FB15k-237 | WN18 | WN18RR |
---|---|---|---|---|
TransE | 0.648 | 0.315 | 0.568 | 0.187 |
DistMult | 0.744 | 0.305 | 0.822 | 0.441 |
ComplEx | 0.789 | 0.312 | 0.925 | 0.464 |
RotatE | 0.589 | 0.286 | 0.943 | 0.463 |
OTE | 0.512 | 0.297 | 0.656 | 0.302 |
Model | MRR |
---|---|
TransE | 0.85 |
RotatE | 0.88 |
OTE | 0.89 |