Sparkify-Capstone

Spark for Big Data - Project Description

Imagine you are working on the data team for a popular digital music service similar to Spotify or Pandora. Many of users stream their favourite songs to your service every day either using the free tier that place advertisements between the songs or using the premium subscription model, where they stream music as free but pay a monthly flat rate. Users can upgrade, downgrade or cancel their service at any time. So, it's crucial to make sure your users love the service. Every time a user interacts with the service while they're playing songs, logging out, liking a song with a thumbs up, hearing an ad, or downgrading their service, it generates data. All this data contains the key insights for keeping your users happy and helping your business thrive. It's your job on the data team to predict which users are at risk to churn either downgrading from premium to free tier or cancelling their service altogether. If you can accurately identify these users before they leave, your business can offer them discounts and incentives, potentially saving your business millions in revenue. To tackle this project, we have provided you with a large dataset that contains the events described. You will need to load, explore and clean this dataset with Spark. Based on your explanation, you will create features and build models with Spark to predict which users were churn from your digital music service. This project is all about demonstrating mastery of Spark scalable data manipulation and machine learning. After completing this project, you'll have built a useful model with a massive dataset. You'll be able to apply the same skills with Spark to wrangle data and build models in your role as a data scientist.

Libraries used

pyspark
matplotlib
seaborn

Run jupyter notebook (local)

Start either jupyter lab or jupyter notebook and run

Sparkify.ipynb

This notebook uses a tiny subset (128MB) of the full dataset available (12GB). The data file is 7 zipped - please unzip it before use.

Medium Blog Post

Can be found here: https://towardsdatascience.com/how-to-predict-churns-in-sparkify-ab9a5c3f218d

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
data		data
README.md		README.md
Sparkify.ipynb		Sparkify.ipynb
sparkify.jpg		sparkify.jpg

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sparkify-Capstone

Spark for Big Data - Project Description

Table of Contents

Libraries used

Run jupyter notebook (local)

Medium Blog Post

About

Releases

Packages

Languages

angelaheumann/Sparkify-Capstone

Folders and files

Latest commit

History

Repository files navigation

Sparkify-Capstone

Spark for Big Data - Project Description

Table of Contents

Libraries used

Run jupyter notebook (local)

Medium Blog Post

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages