This project involves sentiment analysis of Amazon reviews, focusing on home textiles and casual clothing. The objective is to classify reviews by sentiment, allowing for enhanced product insights and sentiment prediction for future reviews.
By analyzing customer reviews, the company aims to:
- Improve product features and customer satisfaction.
- Increase sales by addressing customer feedback and identifying areas of improvement.
The dataset, provided in an Excel file (amazon.xlsx
), contains reviews for specific product groups with the following fields:
- Review: Content of the review.
- Title: Short title or comment for the review.
- Helpful: Number of users who found the review helpful.
- Star: Star rating given to the product.
The project includes the following key steps:
- Text Preprocessing: Prepare text data for analysis by cleaning and structuring.
- Text Visualization: Visualize word frequency in reviews to identify common themes.
- Sentiment Modeling: Label and classify reviews based on sentiment.
- Model Evaluation: Evaluate the model's performance.
The following libraries are used in this project:
pandas
numpy
re
string
sklearn
matplotlib
seaborn
nltk
To prepare the Review
text data for analysis, the following preprocessing steps are applied:
- Convert text to lowercase.
- Remove punctuation.
- Remove numbers.
- Remove stopwords.
- Lemmatize words.
Calculate the frequency of words in the processed reviews and create a bar plot to display the top 20 most common words.
- Label Reviews: Label each review as positive, neutral, or negative based on its star rating.
- Data Splitting: Divide the data into training and testing sets.
- Vectorization: Vectorize the text data.
- Model Training: Train a logistic regression model to classify sentiment.
Evaluate the model’s performance by:
- Calculating accuracy.
- Printing a classification report.
- Displaying a confusion matrix.
This project delivers insights into customer sentiment on Amazon, providing actionable feedback for product improvement. By preprocessing text data, visualizing key words, building a sentiment classification model, and evaluating its performance, the company gains a valuable tool for predicting sentiment in future reviews. The logistic regression model serves as a robust predictor based on historical data patterns.