Multi-task Recommendation in PyTorch

Introduction

MTReclib provides a PyTorch implementation of multi-task recommendation models and common datasets. Currently, we implmented 7 multi-task recommendation models to enable fair comparison and boost the development of multi-task recommendation algorithms. The currently supported algorithms include:

SingleTask：Train one model for each task, respectively
Shared-Bottom: It is a traditional multi-task model with a shared bottom and multiple towers.
OMoE: Adaptive Mixtures of Local Experts (Neural Computation 1991)
MMoE: Modeling Task Relationships in Multi-task Learning with Multi-Gate Mixture-of-Experts (KDD 2018)
PLE: Progressive Layered Extraction (PLE): A Novel Multi-Task Learning (MTL) Model for Personalized Recommendations (RecSys 2020 best paper)
AITM: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning in Targeted Display Advertising (KDD 2021)
MetaHeac: Learning to Expand Audience via Meta Hybrid Experts and Critics for Recommendation and Advertising (KDD 2021)

Datasets

AliExpressDataset: This is a dataset gathered from real-world traffic logs of the search system in AliExpress. This dataset is collected from 5 countries: Russia, Spain, French, Netherlands, and America, which can utilized as 5 multi-task datasets. Original_dataset Processed_dataset Google Drive Processed_dataset Baidu Netdisk

For the processed dataset, you should directly put the dataset in './data/' and unpack it. For the original dataset, you should put it in './data/' and run 'python preprocess.py --dataset_name NL'.

Requirements

Python 3.6
PyTorch > 1.10
pandas
numpy
tqdm

Run

Parameter Configuration:

dataset_name: choose a dataset in ['AliExpress_NL', 'AliExpress_FR', 'AliExpress_ES', 'AliExpress_US', 'Synthetic'], default for AliExpress_NL
dataset_path: default for ./data
model_name: choose a model in ['singletask', 'sharedbottom', 'omoe', 'mmoe', 'ple', 'aitm', 'metaheac'], default for metaheac
epoch: the number of epochs for training, default for 50
task_num: the number of tasks, default for 2 (CTR & CVR)
expert_num: the number of experts for ['omoe', 'mmoe', 'ple', 'metaheac'], default for 8
learning_rate: default for 0.001
batch_size: default for 2048
weight_decay: default for 1e-6
device: the device to run the code, default for cuda:0
save_dir: the folder to save parameters, default for chkpt

You can run a model through:

python main.py --model_name metaheac --num_expert 8 --dataset_name AliExpress_NL

Results

For fair comparisons, the learning rate is 0.001, the dimension of embeddings is 128, and mini-batch size is 2048 equally for all models. We report the mean AUC and Logloss over five random runs. Best results are in boldface.

Methods	AliExpress (Netherlands, NL)				AliExpress (Spain, ES)
	CTR		CTCVR		CTR		CTCVR
	AUC	Logloss	AUC	Logloss	AUC	Logloss	AUC	Logloss
SingleTask	0.7222	0.1085	0.8590	0.00609	0.7266	0.1207	0.8855	0.00456
Shared-Bottom	0.7228	0.1083	0.8511	0.00620	0.7287	0.1204	0.8866	0.00452
OMoE	0.7254	0.1081	0.8611	0.00614	0.7253	0.1209	0.8859	0.00452
MMoE	0.7234	0.1080	0.8606	0.00607	0.7285	0.1205	0.8898	0.00450
PLE	0.7292	0.1088	0.8591	0.00631	0.7273	0.1223	0.8913	0.00461
AITM	0.7240	0.1078	0.8577	0.00611	0.7290	0.1203	0.8885	0.00451
MetaHeac	0.7263	0.1077	0.8615	0.00606	0.7299	0.1203	0.8883	0.00450

Methods	AliExpress (French, FR)				AliExpress (America, US)
	CTR		CTCVR		CTR		CTCVR
	AUC	Logloss	AUC	Logloss	AUC	Logloss	AUC	Logloss
SingleTask	0.7259	0.1002	0.8737	0.00435	0.7061	0.1004	0.8637	0.00381
Shared-Bottom	0.7245	0.1004	0.8700	0.00439	0.7029	0.1008	0.8698	0.00381
OMoE	0.7257	0.1006	0.8781	0.00432	0.7049	0.1007	0.8701	0.00381
MMoE	0.7216	0.1010	0.8811	0.00431	0.7043	0.1006	0.8758	0.00377
PLE	0.7276	0.1014	0.8805	0.00451	0.7138	0.0992	0.8675	0.00403
AITM	0.7236	0.1005	0.8763	0.00431	0.7048	0.1004	0.8730	0.00377
MetaHeac	0.7249	0.1005	0.8813	0.00429	0.7089	0.1001	0.8743	0.00378

File Structure

.
├── main.py
├── README.md
├── models
│   ├── layers.py
│   ├── aitm.py
│   ├── omoe.py
│   ├── mmoe.py
│   ├── metaheac.py
│   ├── ple.py
│   ├── singletask.py
│   └── sharedbottom.py
└── data
    ├── preprocess.py         # Preprocess the original data
    ├── AliExpress_NL         # AliExpressDataset from Netherlands
    	├── train.csv
	└── test.py
    ├── AliExpress_ES         # AliExpressDataset from Spain
    ├── AliExpress_FR         # AliExpressDataset from French
    └── AliExpress_US         # AliExpressDataset from America

Contact

If you have any problem about this library, please create an issue or send us an Email at:

[email protected]

Reference

If you use this repository, please cite the following papers:

@inproceedings{zhu2021learning,
  title={Learning to Expand Audience via Meta Hybrid Experts and Critics for Recommendation and Advertising},
  author={Zhu, Yongchun and Liu, Yudan and Xie, Ruobing and Zhuang, Fuzhen and Hao, Xiaobo and Ge, Kaikai and Zhang, Xu and Lin, Leyu and Cao, Juan},
  booktitle={Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery \& Data Mining},
  pages={4005--4013},
  year={2021}
}

@inproceedings{xi2021modeling,
  title={Modeling the sequential dependence among audience multi-step conversions with multi-task learning in targeted display advertising},
  author={Xi, Dongbo and Chen, Zhen and Yan, Peng and Zhang, Yinger and Zhu, Yongchun and Zhuang, Fuzhen and Chen, Yu},
  booktitle={Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery \& Data Mining},
  pages={3745--3755},
  year={2021}
}

Some model implementations and util functions refers to these nice repositories.

pytorch-fm: This package provides a PyTorch implementation of factorization machine models and common datasets in CTR prediction.
MetaHeac: This is an official implementation for Learning to Expand Audience via Meta Hybrid Experts and Critics for Recommendation and Advertising.

Name		Name	Last commit message	Last commit date
Latest commit History 65 Commits
callbacks		callbacks
datasets		datasets
dev-doc		dev-doc
losses		losses
models		models
multi_balancer		multi_balancer
optimizers		optimizers
runs		runs
sampler		sampler
trainers		trainers
utils		utils
实验		实验
软件测试		软件测试
.gitignore		.gitignore
.gitmodules		.gitmodules
LICENSE		LICENSE
README.md		README.md
automl_exp.ipynb		automl_exp.ipynb
automl_exp.py		automl_exp.py
draw_metrics.py		draw_metrics.py
draw_tree.py		draw_tree.py
environment.yaml		environment.yaml
global_vars.py		global_vars.py
mtreclib.png		mtreclib.png
nohup.out		nohup.out
requirements.txt		requirements.txt
sweep.py		sweep.py
train.py		train.py
val.py		val.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-task Recommendation in PyTorch

Introduction

Datasets

Requirements

Run

Results

File Structure

Contact

Reference

About

Releases

Packages

Languages

License

2catycm/Multitask-Recommendation-Library

Folders and files

Latest commit

History

Repository files navigation

Multi-task Recommendation in PyTorch

Introduction

Datasets

Requirements

Run

Results

File Structure

Contact

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages