Re:Summary

Deep learning project on summarizing multiple Amazon reviews of a product into keywords and presenting them in a word cloud.

Authors

Authors are ordered alphabetically by last name.

Yuying Fan (@fyy26)
Rehaan Furniturewala (@r3khaan)
Andrew Raine (@andrewlr09)

Problem of Interest

Even though online shoppers have thousands of reviews to learn about each product, they lack the time to read each review and make the optimal purchasing decision.

Dataset

We used a dataset of Amazon Reviews from a research group at the University of California San Diego. The dataset that we are able to access, a “small” subset of the full set, contains 75,257,650 reviews that we can train on.

Each review is of the format: (overall_score, verified_purchase, review_time, reviewer_id, product_id, style, reviewer_name, review_text, summary).

We used a BERT tokenizer across all our models for generating word embeddings.

Methods

Build a classification model that predicts the star-rating (1-5) of a review based on its text.
Run part-of-speech tagging on each review using the NLTK library.
For each adjective in a given review, replace the adjective with BERT's [MASK] token to produce a masked review.
Run each masked review through the star-rating prediction model and compute the difference in the predicted star-rating between the original and masked review.
Collect the few adjectives with the biggest difference to generate a summary word cloud.

Non-DL Benchmark for Start-Rating Prediction

Random forest with 100 estimators; validation accuracy of 20.93% after training on 90% of the data (no better than a random classifier).

forest = sklearn.ensemble.RandomForestClassifier(n_estimators=100)

DL Benchmark for Star-Rating Prediction

Stacked RNN with 5 layers and a hidden size of 50, followed by a linear output layer; best validation accuracy of 39.65%.

Our DL Model for Star-Rating Prediction

Built on top of a pre-trained BERT; validation accuracy in the low 60% after removing bias.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
Technical_Report		Technical_Report
522_Project.ipynb		522_Project.ipynb
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Re:Summary

Authors

Problem of Interest

Dataset

Methods

Non-DL Benchmark for Start-Rating Prediction

DL Benchmark for Star-Rating Prediction

Our DL Model for Star-Rating Prediction

Read More

Technical Report

Colab Doc

About

Languages

fyy26/Re-Summary

Folders and files

Latest commit

History

Repository files navigation

Re:Summary

Authors

Problem of Interest

Dataset

Methods

Non-DL Benchmark for Start-Rating Prediction

DL Benchmark for Star-Rating Prediction

Our DL Model for Star-Rating Prediction

Read More

Technical Report

Colab Doc

About

Resources

Stars

Watchers

Forks

Languages