Skip to content

Big Data Project : The goal of this project is to develop a web application based on the Apache Kafka Stream API for real-time analysis of data, with a specific focus on "predicting customer churn in real-time" for a business. Apache Kafka is a distributed event streaming platform that allows the handling of large-scale data streams efficiently.

Notifications You must be signed in to change notification settings

myyla/Customer-Churn-Prediction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

Project Workflow:

Step 1: Real-time Data Ingestion with Apache Kafka Streams

  1. Launch and stream real-time data from the 'customer_churn.csv' file using Apache Kafka Streams.

Step 2: Data Preprocessing with Machine Learning Libraries

  1. Perform necessary data preprocessing using libraries such as Sklearn, PySpark MLib, or PyTorch.

Step 3: Supervised Machine Learning Training

  1. Train supervised machine learning models (at least 3 models) on the 'customer_churn.csv' training dataset.

Step 4: Model Serialization and Storage

  1. Save the best-performing model in .pkl format.

Step 5: Real-time Prediction using the Trained Model

  1. Utilize the prepared, trained, and saved model to predict in real-time whether a customer will leave the institution or not based on the 'new_customers.csv' test data.

Step 6: Results Presentation with Web Application Dashboard

  1. Present the results in the form of a web application dashboard.

Step 7: Project Upload to GitHub

  1. Upload the entire project to GitHub for collaboration and version control.

Tools and Technologies:

  • Libraries: Apache Kafka Streams, PySpark MLib, Sklearn, PyTorch, Pandas, Matplotlib
  • Frameworks: Flask, Django
  • Languages: Python, Java, JavaScript
  • Editors: IntelliJ IDEA, Eclipse, VsCode
  • Operating Systems: Unix, MacOS, or Windows

Data Description:

  • Name: Name of the latest contact at Company
  • Age: Customer Age
  • Total_Purchase: Total Ads Purchased
  • Account_Manager: Binary 0=No manager, 1= Account manager assigned
  • Years: Total Years as a customer
  • Num_sites: Number of websites that use the service.
  • Onboard_date: Date that the name of the latest contact was onboarded
  • Location: Client HQ Address
  • Company: Name of Client Company
  • Churn: Target (label)

Data Source:

Customer Churn Spark Notebook

About

Big Data Project : The goal of this project is to develop a web application based on the Apache Kafka Stream API for real-time analysis of data, with a specific focus on "predicting customer churn in real-time" for a business. Apache Kafka is a distributed event streaming platform that allows the handling of large-scale data streams efficiently.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages