- Launch and stream real-time data from the 'customer_churn.csv' file using Apache Kafka Streams.
- Perform necessary data preprocessing using libraries such as Sklearn, PySpark MLib, or PyTorch.
- Train supervised machine learning models (at least 3 models) on the 'customer_churn.csv' training dataset.
- Save the best-performing model in .pkl format.
- Utilize the prepared, trained, and saved model to predict in real-time whether a customer will leave the institution or not based on the 'new_customers.csv' test data.
- Present the results in the form of a web application dashboard.
- Upload the entire project to GitHub for collaboration and version control.
- Libraries: Apache Kafka Streams, PySpark MLib, Sklearn, PyTorch, Pandas, Matplotlib
- Frameworks: Flask, Django
- Languages: Python, Java, JavaScript
- Editors: IntelliJ IDEA, Eclipse, VsCode
- Operating Systems: Unix, MacOS, or Windows
- Name: Name of the latest contact at Company
- Age: Customer Age
- Total_Purchase: Total Ads Purchased
- Account_Manager: Binary 0=No manager, 1= Account manager assigned
- Years: Total Years as a customer
- Num_sites: Number of websites that use the service.
- Onboard_date: Date that the name of the latest contact was onboarded
- Location: Client HQ Address
- Company: Name of Client Company
- Churn: Target (label)