we use a several different clustering algorithms (both HC and NHC) to assess AirBnB listings in the Cape Town region. The idea was to use a variety of features to cluster properties without knowing their spatial location.
This project aims to uncover trends and hidden relationships from the property listings (data points) in the Airbnb data set through the use of clustering algorithms. The main methods used in this project include hierarchical and non-hierarchical clustering analysis and self-organizing maps.
The data used for this project is the Airbnb property listings data in Cape Town. The data includes information about the properties such as location, price, and amenities. The data was scarped from AirBnB.
The following methods were used for this project:
- Hierarchical Clustering Analysis
- A type of clustering algorithm that creates a tree-like structure, called dendrogram, to represent the relationships between the data points. It starts with each data point being its own cluster and merges them into bigger clusters based on their similarity.
- Non-Hierarchical Clustering Analysis
- A type of clustering algorithm that does not create a dendrogram to represent the relationships between the data points. Instead, it divides the data into a pre-defined number of clusters based on the similarity between the data points.
- Self-Organizing Maps
- A type of artificial neural network used for clustering and dimensionality reduction. It creates a 2-dimensional grid of nodes and trains them to map the high-dimensional data points to the grid, grouping similar data points together.
These methods were used to cluster the properties based on their similarities and find any trends or hidden relationships.
The results of the clustering analysis showed some interesting trends and hidden relationships in the property listings data. The clustering algorithms were able to group similar properties together in the same wards and uncover patterns in the data. Hierarchical clustering resulted in a higher average silhouette width, 0.31 compared to Non-Hierarchical clustering achieving 0.29 with K-means algorithm. We notice that both results suggest weak fits, but Hierarchical clustering was slightly better than Non-Hierarchical clustering, with the clustering structure. Although the silhouette widths were relatively weak, we see there is clear room for improvement and even at this early stage are able to identify trends and patterns across the AirBnB listings in Cape Town region.
This project demonstrates the use of clustering algorithms to uncover trends and hidden relationships in the Airbnb property listings data. The results showed some interesting patterns in the data that can be useful for further analysis and decision making.
This project is implemented in R programming language. To run this project, the following software and packages are required:
- R
- R Studio
- tidyverse
- cluster
- factoextra
To run this project, clone or download the repository and open the R project file in R Studio. The code and data are included in the repository.
If you want to contribute to this project, feel free to submit a pull request or open an issue in the repository.
For any questions or queries, please email: [email protected]