linkedpaperswithcode/embeddings-generation at main · metaphacts/linkedpaperswithcode

History

Name		Name	Last commit message	Last commit date
parent directory ..
evaluation-results		evaluation-results
validation-early-stopping		validation-early-stopping
.gitkeep		.gitkeep
01_preprocessing_lpwc_triples_embeddings.py		01_preprocessing_lpwc_triples_embeddings.py
02_transe_kge_training.py		02_transe_kge_training.py
03_distmult_kge_training.py		03_distmult_kge_training.py
04_complex_kge_training.py		04_complex_kge_training.py
05_rotate_kge_training.py		05_rotate_kge_training.py
README.md		README.md

README.md

Entity Embeddings for Linked Papers With Code

Pre-processing and Embedding training

01: Extracts triples from full RDF dump of Linked Papers with Code, delete auxiliary classes and map all URIs to integers.
The pre-processing steps lead to a dataset with 1,454,103 triples, 527,817 entities (from 11 different classes) and 15 relations.

02: Train TransE entity embeddings and save the embeddings for the entites and relations in csv files.

03: Train DistMult entity embeddings.

04: Train ComplEx entity embeddings.

05: Train RotatE entity embeddings.

Evaluation results for LPWC version v1

We spilt triples in the dataset into a training set with 80%, a validation set with 10% and a test set with 10% of the total triples. We trained a maximum of 900 epochs using early-stopping based on the mean rank on the validation sets, calculated every 300 epochs. The validation mean rank for the validation set can be seen in the validation-early-stopping folder. For DistMult and RotatE training was stopped after 300 epochs. TransE and ComplEx trained 900 epochs. For TransE, DistMult and RotatE the Adam Optimizer was used, for ComplEx Adagrad. The final hyperparamters used for the training are provided in the Table below. For DistMult and ComplEx furthermore a weight decay of 1e-6 is used.

Hyperparameter	Value
Embedding Size	256
Optimizer param (Lr)	0.001
Batch size	2000
Negative sampling size	2000

Final evaluation results. The best values for the metrics mean rank (MR) and Hits@N are marked bold.

Metric	TransE	DistMult	ComplEx	RotatE
MR	2239.26	9448.88	25,624.13	8830.03
Hits@1	0.2395	0.1931	0.1655	0.1146
Hits@3	0.3852	0.3204	0.2814	0.1921
Hits@10	0.5425	0.4856	0.4390	0.3133

Evaluation results for LPWC version v2

We trained TransE embeddings. Training was stopped after 900 epochs. We used the same hyperparamters as for LPWC version v1.

Metric	TransE
MR	2412.52
Hits@1	0.2336
Hits@3	0.3717
Hits@10	0.5212

Technical details

All computational tasks were carried out on the bwUniCluster 2.0 infrastructure using a node equipped with an NVIDIA A100 80GB GPU. All scripts for embeddings generation were conducted in an isolated virtual environment running Python 3.9.7, torch 2.0, torch-geometric 2.4 and CUDA 12.0.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

embeddings-generation

embeddings-generation

README.md

Entity Embeddings for Linked Papers With Code

Pre-processing and Embedding training

Evaluation results for LPWC version v1

Evaluation results for LPWC version v2

Technical details

Files

embeddings-generation

Directory actions

More options

Directory actions

More options

Latest commit

History

embeddings-generation

Folders and files

parent directory

README.md

Entity Embeddings for Linked Papers With Code

Pre-processing and Embedding training

Evaluation results for LPWC version v1

Evaluation results for LPWC version v2

Technical details