Email_spam_classifier for issue #237 #238
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Email Spam Classifier
This Python script utilizes machine learning techniques to classify emails as either spam or legitimate (ham). The classifier is built using logistic regression and employs the TF-IDF (Term Frequency-Inverse Document Frequency) vectorizer for feature extraction. Here's a brief overview of the workflow:
Data Preparation: The script loads email data from a CSV file and preprocesses it, handling missing values and converting categorical labels (spam, ham) to binary values.
Feature Extraction: TF-IDF vectorization is used to transform the text data into numerical features, capturing the importance of words in the documents.
Model Training: A logistic regression model is trained on the extracted features to learn patterns distinguishing between spam and ham emails.
Evaluation: The model's performance is evaluated on both training and testing datasets using accuracy metrics.
Prediction: Users can input an email message, and the model predicts whether it's spam or legitimate.
This classifier enables efficient identification of spam emails, contributing to a more organized and secure inbox experience.
Kindly assign it GSSoC label and mark it as a contribution under it
Addressing issue : #237 @sanjay-kv