Skip to content

classification of sincere/Insincere questions in quora using pretrained model BERT with techniques Finetuning and transfer learning

License

Notifications You must be signed in to change notification settings

Krish-2505/Finetuning-BERT-For-classification-of-text

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

12 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Finetuning-BERT-For-classification-of-text

Dataset classification of quora questions sincere/insincere

An insincere question is defined as a question intended to make a statement rather than look for helpful answers. Some characteristics that can signify that a question is insincere:

  • Has a non-neutral tone
  • Has an exaggerated tone to underscore a point about a group of people
  • Is rhetorical and meant to imply a statement about a group of people
  • Is disparaging or inflammatory
  • Suggests a discriminatory idea against a protected class of people, or seeks confirmation of a stereotype
  • Makes disparaging attacks/insults against a specific person or group of people
  • Based on an outlandish premise about a group of people
  • Disparages against a characteristic that is not fixable and not measurable
  • Isn't grounded in reality
  • Based on false information, or contains absurd assumptions
  • Uses sexual content (incest, bestiality, pedophilia) for shock value, and not to seek genuine answers The training data includes the question that was asked, and whether it was identified as insincere (target = 1). The ground-truth labels contain some amount of noise: they are not guaranteed to be perfect.

Data fields

  • qid - unique question identifier
  • question_text - Quora question text
  • target - a question labeled "insincere" has a value of 1, otherwise 0

size of the Dataset

Total text in dataset is around 13 lakh of which sincere(93%) and insincere(7%)

  • 0=>sincere
  • 1=>insincere
image

sample of dataset

image

model

Our project involves creating a model that utilizes BERT, which is a transformer-based model known for its exceptional ability to encode text data bidirectionally, capturing intricate contextual relationships in language. This BERT model is combined with Artificial Neural Network (ANN) layers and dropout regularization techniques to enhance its predictive capabilities and prevent overfitting.

Furthermore, we employ advanced techniques such as transfer learning and fine-tuning. Transfer learning involves leveraging knowledge gained from pre-trained models like BERT and applying it to our specific task, accelerating the training process and potentially improving performance. Fine-tuning, on the other hand, involves fine-tuning the parameters of the pre-trained BERT model to adapt it to our specific task, optimizing its performance further.

model architecture

image

demo of application

Screenshot 2024-03-19 221815 image image

About

classification of sincere/Insincere questions in quora using pretrained model BERT with techniques Finetuning and transfer learning

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published