An insincere question is defined as a question intended to make a statement rather than look for helpful answers. Some characteristics that can signify that a question is insincere:
- Has a non-neutral tone
- Has an exaggerated tone to underscore a point about a group of people
- Is rhetorical and meant to imply a statement about a group of people
- Is disparaging or inflammatory
- Suggests a discriminatory idea against a protected class of people, or seeks confirmation of a stereotype
- Makes disparaging attacks/insults against a specific person or group of people
- Based on an outlandish premise about a group of people
- Disparages against a characteristic that is not fixable and not measurable
- Isn't grounded in reality
- Based on false information, or contains absurd assumptions
- Uses sexual content (incest, bestiality, pedophilia) for shock value, and not to seek genuine answers The training data includes the question that was asked, and whether it was identified as insincere (target = 1). The ground-truth labels contain some amount of noise: they are not guaranteed to be perfect.
- qid - unique question identifier
- question_text - Quora question text
- target - a question labeled "insincere" has a value of 1, otherwise 0
Total text in dataset is around 13 lakh of which sincere(93%) and insincere(7%)
- 0=>sincere
- 1=>insincere
Our project involves creating a model that utilizes BERT, which is a transformer-based model known for its exceptional ability to encode text data bidirectionally, capturing intricate contextual relationships in language. This BERT model is combined with Artificial Neural Network (ANN) layers and dropout regularization techniques to enhance its predictive capabilities and prevent overfitting.
Furthermore, we employ advanced techniques such as transfer learning and fine-tuning. Transfer learning involves leveraging knowledge gained from pre-trained models like BERT and applying it to our specific task, accelerating the training process and potentially improving performance. Fine-tuning, on the other hand, involves fine-tuning the parameters of the pre-trained BERT model to adapt it to our specific task, optimizing its performance further.