Skip to content

HossamElghamry/Email-Spam-Filter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 

Repository files navigation

Email Spam Filter using Sci-kit, IPython and Naive Bayes Classifier

The developed tool is capable of the following:

  • Output the most occurring words with their frequency based on user preference
  • Feature extraction processing and simplifying raw data
  • Training the used naive bayes classifier
  • Outputting ham/spam confusion matrix

Due to the limited capabilities of the used machine, the used sets for testing and training was a smaller portion of a much bigger dataset. The filter was optimized to work on the smaller dataset, but it also can run the larger one, given the correct number of files in each label vector and identifying spam email in that vector.

Link to the smaller Test-Train dataset used

Link to the [whole 50MB dataset]

Main Parts

  • Part 1 - Most Common Words Extraction
  • Part 2 - Feature Extraction
  • Part 3 - Extracting Labeled Feature Vector per Training Email to One Single Two-Dimensional Matrix
  • Part 4 - Defining and Training Naive Bayes Classifier
  • Part 5 - Testing the Trained Model using the Test Set Defined

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published