GitHub - DikshaC/Spam-Ham_Email_Classification

This python project implements Naive Bayes and Logistic Regression for text/Email Classfication into two categories spam and ham. This is made assuming we have two directories spam and ham. All the files in the spam folder are spam messages and all the files in the ham folder are legitimate (non-spam) messages.

For naive Bayes, add-one laplace smoothening is used to make sure no probabilties turn out to be zero.
For logistic regression, L2 regularisation is used and it is tried with different values of Lambda. Gradient ascent is used for learning the weights. Hard limit is assigned to the number of iterations to speed-up the convergence process.
Next, both naive bayes and logistic regression are implemented by throwing away stop words(a,an,the, is, are, etc) and again the accuracies are calculated.

TO run the file: python3 spam_ham.py <spam_train folder's path> <ham_train folder's path> <spam_test folder's path> <ham_test folder's path> <num_iterations>

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
README.md		README.md
spam_ham.py		spam_ham.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

About

Releases

Packages

Languages

DikshaC/Spam-Ham_Email_Classification

Folders and files

Latest commit

History

Repository files navigation

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages