We collected 60,000 Stack Overflow questions from 2016-2020 and classified them into three categories:
- HQ: High-quality posts without a single edit.
- LQ_EDIT: Low-quality posts with a negative score, and multiple community edits. However, they still remain open after those changes.
- LQ_CLOSE: Low-quality posts that were closed by the community without a single edit.
- Questions are sorted according to Question Id.
- Question body is in HTML format.
- All dates are in UTC format.
- The dataset is also accessible at https://www.kaggle.com/imoore/60k-stack-overflow-questions-with-quality-rate
This is an original dataset, published under MIT License. Please cite the dataset for your usage as the following:
@article{annamoradnejad2022multiview,
title={Multi-View Approach to Suggest Moderation Actions in Community Question Answering Sites},
author={Annamoradnejad, Issa and Habibi, Jafar and Fazli, Mohammadamin},
journal = {Information Sciences},
volume = {600},
pages = {144-154},
year = {2022},
issn = {0020-0255},
doi = {https://doi.org/10.1016/j.ins.2022.03.085},
url = {https://www.sciencedirect.com/science/article/pii/S0020025522003127}
}