Text classification algorithms are at the heart of a variety of software systems that process text data at scale. Email software uses text classification to determine whether incoming mail is sent to the inbox or filtered into the spam folder. Discussion forums use text classification to determine whether comments should be flagged as inappropriate.
These are two examples of topic classification, categorizing a text document into one of a predefined set of topics. In many topic classification problems, this categorization is based primarily on keywords in the text.
A Large-scale Vietnamese News Text Classification Corpus
Level 1: 10 topics, 33,759 documents for training and 50,373 documents for testing
Model | Score | Paper / Source | Code |
---|---|---|---|
NGRAM | 97.1 | Vu et al. RIVF'07 | |
SVM Multi | 93.4 | Vu et al. RIVF'07 |
Level 2: 27 topics, 14375 documents for training and 12076 documents for testing
Model | Score | Paper / Source | Code |
---|---|---|---|
SVM Multi | 96.21 | Vu et al. RIVF'07 |
📜 Papers
- Le et al. NICS'18. A Comparative Study of Neural Network Models for Sentence Classification
- Zhu et al. CCC'15, Nguyet et al. KSE'15, Vu et al. FDSE'15
💫 Services
📁 Open sources