Hey! This repository corresponds to my first virtual internship as a data scientist for British Airways. This project aims to better understand customers and their experience with the airline. For this, a collection of user reviews was done by web scraping and then a sentiment analysis of these reviews.
The status of the project is finalized! Here you will find some that summarize the analysis very well!
This model was made on a jupyter notebook, and I used the following libraries: Request, NLTK, BeautifulSoup, Pandas, RegEx and VADER. The dataset used was collected by scraping and stored in 'BA_reviews.csv'.
Steps:
-
Scrape the 3564 British Airways reviews avaible on the AirQuality website link.
-
Cleaning of the data, leaving the text column of the reviews ready for analysis.
-
Sentiment analysis using VADER polarity scores, where the sentiment compound is composed of the sum of negative, neutral and positive sentiment.
-
Data viz: Plotting the key data!
The frequency chart illustrates that the reviews are primarily focused on certain themes, including "service", "seat", "crew", and "staff", indicating that customers are discussing their experiences and interactions with the staff. Also it is important to note how the word 'good' is in third position, which could indicate a good user experience for British Airways.