The paper has been accepted in NLPCC 2023 (long paper).
CCPC is the first PCL dataset in the Chinese field, with over 11k hierarchical annotation data targeting vulnerable groups on the Weibo and Zhihu platforms.
PCL(Patronizing and Condescending Language) is a form of implicitly toxic speech aimed at vulnerable groups with the potential to cause them long-term harm. Please note that PCL comments may be offensive and cause discomfort!
We have released the final CCPC1.0 dataset. Please do not use it for purposes other than academic research.
The dataset for binary classification is now available.
If you use this dataset, please cite the following paper.
@inproceedings{wang2023ccpc,
title={CCPC: A Hierarchical Chinese Corpus for Patronizing and Condescending Language Detection},
author={Wang, Hongbo and Li, Mingda and Lu, Junyu and Yang, Liang and Xia, Hebin and Lin, Hongfei},
booktitle={CCF International Conference on Natural Language Processing and Chinese Computing},
pages={640--652},
year={2023},
organization={Springer}
}