Computational platform: PyTorch 1.4.0, NVIDIA Geforce GTX 3090 (GPU), Inter i9-10900X (CPU), CUDA Toolkit 10.0
Development language: Python 3.6/C++
Libraries are listed as follow, which can be installed via the command pip install -r requirements.txt
numpy, scipy, tqdm, scikit-learn, sentencepiece=0.1.91, transformers, tensorboardX, nltk, os, sys, collections, itertools, argparse, subprocess, pickle, cudatoolkit=10.0, pytorch==1.4.0
We provide all the data sets (profession data set, hobby data set, and 20News data set) in the folder data/datasets/
Profession data set(obtained from the authors of [2])
atribute values: 71; user utterances: 5747
used by the previous work: CHARM DSCGN
Hobby data set (obtained from the authors of [2])
atribute values: 149; user utterances: 5787
used by the previous work: CHARM DSCGN
Note that we follow the same task setting as previous personal attribute prediction papers[2-4], where attribute values are NOT explicitly mentioned in utterances and the given candidate attribute values are ranked based on the underlying semantics of utterances.
20News data set(obtained from [1])
classes: 5; documents: 17871
used by the previous work: X-Class
Note that PEARL is tested on the weakly supervised text classification task to verify its universality, flexibility and effectiveness.
CUDA_VISIBLE_DEVICES = [gpu_id] python --dataset_name profession
CUDA_VISIBLE_DEVICES = [gpu_id] python --dataset_name profesion
Similarly, the hobby (resp. 20News) data set can be preprocessed by replacing "profession" as "hobby" (resp. "20News").
Similarly, PEARL can run on the hobby (resp. 20News) data set via the command "python" (resp. "python").
[1] Lang K. Newsweeder. Learning to filter netnews. Machine Learning Proceedings 1995, 331-339.
[2] Tigunova A, Yates A, Mirza P, et al. CHARM: Inferring personal attributes from conversations. EMNLP'20, 5391-5404.
[3] Liu Y, Chen H, Shen W. Personal Attribute Prediction from Conversations. WWW'2022, 223-227.
[4] Tigunova A, Yates A, Mirza P, et al. Listening between the lines: Learning personal attributes from conversations. WWW'2019, 1818-1828.