This project aims to perform sentiment analysis on tweets about Dell's products and services. Utilizing the advanced features of XLNet, we aim to accurately classify sentiments as positive, negative, or neutral. This kind of analysis is a crucial aspect of understanding customer feedback and adjusting business strategies accordingly.
- Data Preprocessing: Includes cleaning of tweets to remove noise like emojis, URLs, mentions, and non-ASCII characters.
- Exploratory Data Analysis (EDA): Visualizes the dataset's characteristics, such as sentiment distribution, word and sentence lengths, common stopwords, named entity recognition, and part-of-speech tagging.
- Deep Learning Model: Utilizes the pre-trained XLNet model for sentiment classification, with fine-tuning to adapt to our specific dataset.
- Hyperparameter Tuning: Employs Ray Tune for optimizing model parameters, ensuring the best possible performance.
- Evaluation: Assesses model accuracy, F1 scores, and provides a confusion matrix to understand prediction quality.
To run this project, ensure you have the following installed:
- Python 3.8 or newer
- Relevant Python packages as listed in
requirements.txt
To set up the project environment:
- Clone the repository to your local machine:
git clone https://github.com/AnnaTz/tweet-sentiment-classification
- Install the required Python packages:
pip install -r requirements.txt
Navigate to the project directory and launch the Jupyter notebook:
jupyter notebook sentiment_classification.ipynb
- The XLNet model was proposed in XLNet: Generalized Autoregressive Pretraining for Language Understanding by Yang, Z., Dai, Z., Yang, Y., Carbonell, J., Salakhutdinov, R., & Le, Q. V.
- The dataset used for this project is sourced from Kaggle: Sentiment and Emotions of Tweets.