Detecting Hateful Speech in Social Media Comments

In this project, we apply machine learning to unstructured data to detect hate speech in comments from the Civil Comments dataset, with labeling informed by the Online Hate Index Research Project at D-Lab, University of California, Berkeley.

Goal

Our goal is to classify comments as hateful or not hateful. Historically, attempts to do similar classifications misidentify comments that mention identify groups that could be attacked with hate speech as hateful. We hope to develop more nuanced models that correctly categorize both hateful speech and non-hateful identity references.

Team Members

Technologies

Python:

Amazon Web Services:

Google Cloud Services:

Files & Notebooks

Final Models

NB_final.ipynb Naive Bayes Model (698 lines)
SVM_final.ipynb Support Vector Machines Model (1818 lines)
neural_network.ipynb Two Layer Neural Network (536 lines)
final_lstm.ipynb Three Layer Bidirectional Long Short-Term Memory Recurrent Neural Network (7514 lines)

Feature Generation

feature_generation_functions.py: Contains modules and functions used to generate text and numerical features for model. (273 lines)
feature_generation.ipynb: Python 3 notebook used to run functions from feature_generation_functions.py and pickle_functions.py. Generates features, pickles data frames, and sends to s3 bucket. (160 lines)

Helper Functions

model_functions.py: Contains modules and functions to generate and test Naive Bayes and SVM models; run metrics on models. (226 lines)
pickle_functions.py: Contains modules and functions used to read/write data from/to pickle files hosted in AWS s3 bucket. (60 lines)
exploration/exploration_functions.py: Contains modules and functions used to explore dataset. (103 lines)

Intermediate Models

Stepping_Stones: Iterations of each model that was built prior to the final model design and assessment
- Initial_Models_Exploration.ipynb (1697 lines)
- NB_iter1.ipynb (726 lines)
- NB_iter2.ipynb (626 lines)
- NB_iter3.ipynb (865 lines)
- SVM_iter1.ipynb (657 lines)
- SVM_iter2.ipynb (691 lines)
- SVM iter3.ipynb (644 lines)
- initial_lstm.ipynb (1920 lines)
- exec_lstm (587 lines) and rcc_run_model.sh (27 lines)

If there are any issues opening a notebook, please enter the link into the renderer at the following site: https://nbviewer.jupyter.org/

Name		Name	Last commit message	Last commit date
Latest commit History 244 Commits
exploration		exploration
stepping_stones		stepping_stones
.gitignore		.gitignore
A Machine Learning Approach to Intervening on Toxic Comments in Online Forums.pdf		A Machine Learning Approach to Intervening on Toxic Comments in Online Forums.pdf
AML final presentation (1).pdf		AML final presentation (1).pdf
AML final presentation.pdf		AML final presentation.pdf
NB_final.ipynb		NB_final.ipynb
README.md		README.md
SVM_final.ipynb		SVM_final.ipynb
data_exploration.ipynb		data_exploration.ipynb
feature_generation.ipynb		feature_generation.ipynb
feature_generation_functions.py		feature_generation_functions.py
final_lstm.ipynb		final_lstm.ipynb
lstm_results.json		lstm_results.json
model_functions.py		model_functions.py
neural_network.ipynb		neural_network.ipynb
pickle_functions.py		pickle_functions.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Detecting Hateful Speech in Social Media Comments

Goal

Team Members

Technologies

Files & Notebooks

Final Models

Feature Generation

Helper Functions

Intermediate Models

About

Releases

Packages

Contributors 2

Languages

natashamathur/no_hate_all_love

Folders and files

Latest commit

History

Repository files navigation

Detecting Hateful Speech in Social Media Comments

Goal

Team Members

Technologies

Files & Notebooks

Final Models

Feature Generation

Helper Functions

Intermediate Models

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages