Skip to content

Latest commit

 

History

History
32 lines (24 loc) · 809 Bytes

README.md

File metadata and controls

32 lines (24 loc) · 809 Bytes

Spam Classifier Made using SVM model

Dataset: https://www.kaggle.com/uciml/sms-spam-collection-dataset/data

Pre Processing done:

  • Stemming - Porter Stemer
  • Removed all the stop words
  • Used regulatr exression to replace all the email address in sms to string 'email', all the web address to string 'httpadr' and all the number to string 'number'
  • Removed all the sms string length equal to one.

Test data/Train data ratio = 0.33

Total No of SMS - 5573

Model used: SVM - Support Vector machine Kernel - Gaussian Kernel

Best Model: C = 600 Train Accuracy 0.997321 Test Accuracy 0.985318 Test Recall 0.900794 Test Precision 0.991266

Confusion Matrix:

Predicted 0 Predicted 1
Actual 0 1584 3
Actual 1 24 228