Skip to content

CCPC is the first high-level dataset in the Chinese language sector, with over 11k hierarchical annotation data targeting vulnerable populations on the Weibo and Zhihu platforms.

License

Notifications You must be signed in to change notification settings

dut-laowang/CCPC

Repository files navigation

CCPC: A Hierarchical Chinese Corpus for Patronizing and Condescending Language Detection

The paper has been accepted in NLPCC 2023 (long paper).

📌introduce

CCPC is the first PCL dataset in the Chinese field, with over 11k hierarchical annotation data targeting vulnerable groups on the Weibo and Zhihu platforms.

PCL(Patronizing and Condescending Language) is a form of implicitly toxic speech aimed at vulnerable groups with the potential to cause them long-term harm. Please note that PCL comments may be offensive and cause discomfort!

📌Currently Updated

We have released the final CCPC1.0 dataset. Please do not use it for purposes other than academic research.

📌Full Dataset

The dataset for binary classification is now available.

📌Cite

If you use this dataset, please cite the following paper.

@inproceedings{wang2023ccpc,
  title={CCPC: A Hierarchical Chinese Corpus for Patronizing and Condescending Language Detection},
  author={Wang, Hongbo and Li, Mingda and Lu, Junyu and Yang, Liang and Xia, Hebin and Lin, Hongfei},
  booktitle={CCF International Conference on Natural Language Processing and Chinese Computing},
  pages={640--652},
  year={2023},
  organization={Springer}
}

About

CCPC is the first high-level dataset in the Chinese language sector, with over 11k hierarchical annotation data targeting vulnerable populations on the Weibo and Zhihu platforms.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published