Skip to content

Latest commit

 

History

History
51 lines (34 loc) · 2.49 KB

README.md

File metadata and controls

51 lines (34 loc) · 2.49 KB

GMail Organizer

MultiOutput Classification of Gmails

A menu based command-line application that automatically applies labels in the user's Gmail using classification.

Description

It uses Gmail API for reading and modifying gmail messages. Support Vector Classifier is used to perform the classification (with OvR), although some other classifiers like RandomForestClassifier also work fine. For Natural Language Processing of email text, I have used term frequency - inverse document frequency (TF-IDF) Vectorizer for vectorizing the corpus. It provides the advantage that

a high weight of the tf-idf calculation is reached when we have a high term frequency(tf) in the given document and a low document frequency of the term in the whole collection.

The goal of developing this application was to let users quickly sort their emails, thus saving them time. The model evaluation and other details are described in detail in this notebook.

Usage

The application can be set up very quickly, with the only tricky part being the Gmail API key generation. For usage, please see the user guide

Challenges faced

I tried to use KMeans Clustering to cluster the email, but the performance was poor. Due to this reason, I also discarded my plan to use semi-supervised ML.

All the work I did to attempt to use different ML techniques is described in the jupyter notebooks.

Also, as the size of the training dataset was small, I avoided using complex NLP and classification techniques as that wouldn't yield a huge improvement in performance.

Limitations

The models were trained on a small dataset of ~1200 emails I labeled manually. I did not label every mail category, but only a few categories that seemed important to me. As a result, the classifier also doesn't label many emails (as most of them are similar to those which were not important to me)

The whole application has limited features because of time constraints.

Data Privacy

Your emails or personal data, whenever used, is always stored on your local storage only.
Never upload sensitive files, especially credentials.json, token.json, and your Gmail API key.

References

GMail API reference
Rich library for colorful display
TFIDF vectorizer