ALDA_Project

Alda team project

Authors: Rohit Nambisan,Tyler Cannon, Anshuman Goel

The main objecive of this project was to analyze the Enron Email Dataset and to predict the communication between the Enron Employees by putting in the email dataset into prediction algorithms. The Naive Bayes classification algorithm was used as a baseline and was compared with Social Network Analysis.

File Descriptions:-

(1) names.py

The Python script was used to generate a hashmap, to map between the employee name, and his/her various alternative names and email addresses used in the dataset. It generates the alter_names.csv file.

(2) mail_list.py

The Python code was used to perform final processing of the dataset. It removes all records that contained multiple addresses, or names, from both the sender side and recipient side. It also generalizes the names of the employees inorder to make the prediction algorithms efficient. It reads the alter_names.csv file to implement the hashmap and generates the email_processed.csv file.

(3) alter_names.csv

The file that contains the hashmap-like structure. The data is present in the form, 'employee name':'alternative name_1|alternative_name_2|...|alternative_name_n'

(4) email_processed.csv

It is the final processed dataset. The data is present in the form, 'to_address'|'from_address'|Date

(5) SocialNetwork.R

Builds the Bayesian baseline as well as the social network analysis. Can be run as a whole to do both or separated at the comments requires email_processed.csv

(6) BaggedSNA.R

Builds the bagged iteration of the social network analysis. Can be run in its entirity to produce accuracy as the end result. Requires email_processed.csv

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

ALDA_Project

About

Releases

Packages

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
.Rhistory		.Rhistory
Accuracy		Accuracy
BaggedSNA.R		BaggedSNA.R
README.md		README.md
SocialNetwork.R		SocialNetwork.R
alter_names.csv		alter_names.csv
email_processed.csv		email_processed.csv
mail_list.py		mail_list.py
names.py		names.py

anshuman-goel/Prediction-of-Communication-between-Employees-using-Supervised-Learning-Methods

Folders and files

Latest commit

History

Repository files navigation

ALDA_Project

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages