Skip to content

anshuman-goel/Prediction-of-Communication-between-Employees-using-Supervised-Learning-Methods

Repository files navigation

ALDA_Project

Alda team project

Authors: Rohit Nambisan,Tyler Cannon, Anshuman Goel

The main objecive of this project was to analyze the Enron Email Dataset and to predict the communication between the Enron Employees by putting in the email dataset into prediction algorithms. The Naive Bayes classification algorithm was used as a baseline and was compared with Social Network Analysis.

File Descriptions:-

(1) names.py

The Python script was used to generate a hashmap, to map between the employee name, and his/her various alternative names and email addresses used in the dataset. It generates the alter_names.csv file.

(2) mail_list.py

The Python code was used to perform final processing of the dataset. It removes all records that contained multiple addresses, or names, from both the sender side and recipient side. It also generalizes the names of the employees inorder to make the prediction algorithms efficient. It reads the alter_names.csv file to implement the hashmap and generates the email_processed.csv file.

(3) alter_names.csv

The file that contains the hashmap-like structure. The data is present in the form, 'employee name':'alternative name_1|alternative_name_2|...|alternative_name_n'

(4) email_processed.csv

It is the final processed dataset. The data is present in the form, 'to_address'|'from_address'|Date

(5) SocialNetwork.R

Builds the Bayesian baseline as well as the social network analysis. Can be run as a whole to do both or separated at the comments requires email_processed.csv

(6) BaggedSNA.R

Builds the bagged iteration of the social network analysis. Can be run in its entirity to produce accuracy as the end result. Requires email_processed.csv

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published