Email Spam Filter using Sci-kit, IPython and Naive Bayes Classifier

The developed tool is capable of the following:

Output the most occurring words with their frequency based on user preference
Feature extraction processing and simplifying raw data
Training the used naive bayes classifier
Outputting ham/spam confusion matrix

Due to the limited capabilities of the used machine, the used sets for testing and training was a smaller portion of a much bigger dataset. The filter was optimized to work on the smaller dataset, but it also can run the larger one, given the correct number of files in each label vector and identifying spam email in that vector.

Link to the smaller Test-Train dataset used

Link to the [whole 50MB dataset]

Main Parts

Part 1 - Most Common Words Extraction
Part 2 - Feature Extraction
Part 3 - Extracting Labeled Feature Vector per Training Email to One Single Two-Dimensional Matrix
Part 4 - Defining and Training Naive Bayes Classifier
Part 5 - Testing the Trained Model using the Test Set Defined

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Email_spam_filter.ipynb		Email_spam_filter.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Email Spam Filter using Sci-kit, IPython and Naive Bayes Classifier

Main Parts

About

Releases

Packages

Languages

HossamElghamry/Email-Spam-Filter

Folders and files

Latest commit

History

Repository files navigation

Email Spam Filter using Sci-kit, IPython and Naive Bayes Classifier

Main Parts

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages