Spotify EDA and Clustering

Project Overview

This projects aims to find out the answer to the question "What makes a song popular ?". Furthermore, the project purpose is to understand which artists and genres are performing highly based on current 'track popularity'. From this, the project will try to identify patterns that set them apart from unpopular songs. This is based of a smaller sample of the entire Spotify Ecosystem.

Business Problem Definition

Scope of Project Spotify is the largest music streaming platform in the world. It has revolutionized music listening with various machine learning applications such as NLP and reinforcement learning. Furthermore, it has become a platform for new and upcoming artists to help them reach and engage with an audience. The scope of this project is to determine what makes a particular song popular on spotify? To do so we will explore the relationship between various audio features, genres, artists, and their respective track popularities to try and uncover what truly makes a popular track. More specifically, we’ll be focusing on the best and the worst of the tracks on Spotify to see if there are common themes among the two.

Motivation (Why does it matter?)

Discovering what specifically makes songs popular will aid upcoming artists in song production. Given knowledge on how to construct a popular song, they will be able to grow faster on the platform. Additionally, the Spotify marketing team can employ top-funnel strategies with this information to drive more users to the app.

Key Features

Extensive EDA on Spotify track data
Utilization of Spotify API for data enrichment
Application of K-means clustering for song categorization
Analysis of factors influencing track popularity

Problem Statement

The primary goal of this project was to understand "What makes a song popular?" using Spotify's Track Popularity metric. We aimed to leverage this understanding to create value for artists and Spotify stakeholders.

Research Questions:

How can Track Popularity be utilized to create value for artists and Spotify stakeholders?
What variables in the dataset are driving Track Popularity?
Is there a relationship between specific audio features and Track Popularity?

Dataset

The dataset combines information from:

An existing Kaggle dataset of ~20,000 Spotify tracks
Additional data retrieved from the Spotify API (track and artist information)
A custom-created blend playlist for team analysis

Methodology

Data Collection: Utilized Spotify API to enrich existing dataset
Data Cleaning and Preprocessing: Handled semi-structured JSON data
Exploratory Data Analysis: Analyzed distributions and relationships between variables
Feature Engineering: Created new features and normalized Track Popularity
Clustering: Applied K-means clustering to categorize songs

Key Findings

Do Certain Audio Features Lead to an Increase in Track Popularity?

Exploring Architypes

Given that we can’t determine any relevant features that drive track popularity from the matrix, it might make sense to look at multiple features at once. We'll use our top / bottom artists and genres to explore this idea.

Interestingly, we see a common trend for the clusters for popular artists and genres. Generally, people enjoy songs that are around the middle range for Valence, Energy, Danceability, and on the lower end for Acousticness, Instrumentalness, Liveliness``.

Furthermore, the unpopular side of genres and artists also seem to share some commonality, but their clusters differ from the popular ones. When plotting the two against each other, you can easily visualize the discrepancy.

Clustering

During our data analysis, we encountered a significant challenge, which revolved around the fact that each song was associated with multiple genres, and a considerable number of these genres were found to be redundant. To address this, we are pursuing a clustering approach, which will enable us to achieve the following objectives:

Cluster 0 : It consists of high energy and danceable songs
Cluster 1 : It consists of songs with high acousticness and low valence so it has songs with a 'sad' vibe
Cluster 2 : It has the highest mean track popularity and consists of highly danceable songs with high energy
Cluster 3 : This cluster also has high popularity and an even mix of audio features, it may consist of mainstream songs

API Work

The project involved extensive work with the Spotify API:

Set up a developer account and created an app to access the API
Utilized the Spotipy library for easier API interactions
Made requests for track, artist, and playlist data
Implemented token management for continuous data retrieval

For detailed information on the API work, refer to add_columns.ipynb and get_playlist.ipynb in the repository.

Tools and Technologies

Python
Jupyter Notebook
Pandas for data manipulation
Matplotlib and Plotly for visualization
Scikit-learn for K-means clustering
Spotify API and Spotipy library

Repository Structure

A01_Decoding_the_Secret_to_Popularity_Spotify.ipynb: Main analysis notebook
add_columns.ipynb: Script for adding columns using Spotify API data
get_playlist.ipynb: Script for retrieving playlist data
images/: Directory containing visualizations
LICENSE: MIT License file
README.md: This file, containing project information

How to Use

Clone the repository
Install required dependencies (list them here or include a requirements.txt file)
Set up a Spotify Developer account and create an app to get API credentials
Run the Jupyter notebooks in the following order:
- get_playlist.ipynb (if you want to analyze a specific playlist)
- add_columns.ipynb
- A01_Decoding_the_Secret_to_Popularity_Spotify.ipynb

License

This project is licensed under the MIT License - see the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Spotify EDA and Clustering

Project Overview

Business Problem Definition

Motivation (Why does it matter?)

Key Features

Table of Contents

Problem Statement

Research Questions:

Dataset

Methodology

Key Findings

Do Certain Audio Features Lead to an Increase in Track Popularity?

Exploring Architypes

Clustering

API Work

Tools and Technologies

Repository Structure

How to Use

License

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
A01_Decoding_the_Secret_to_Popularity_Spotify.ipynb		A01_Decoding_the_Secret_to_Popularity_Spotify.ipynb
LICENSE		LICENSE
README.md		README.md
Spotify Project.pdf		Spotify Project.pdf
add_columns.ipynb		add_columns.ipynb
get_playlist.ipynb		get_playlist.ipynb

License

YashvardhanRanawat7/A-Data-Driven-Exploration-of-Spotify-Hits

Folders and files

Latest commit

History

Repository files navigation

Spotify EDA and Clustering

Project Overview

Business Problem Definition

Motivation (Why does it matter?)

Key Features

Table of Contents

Problem Statement

Research Questions:

Dataset

Methodology

Key Findings

Do Certain Audio Features Lead to an Increase in Track Popularity?

Exploring Architypes

Clustering

API Work

Tools and Technologies

Repository Structure

How to Use

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages