This repository holds the materials for the Machine Learning Prague 2017 workshop created by Gauss Algorithmic, focusing on methods and techniques of advanced data analysis we've used in enterprise environments.
The goal of this workshop is to give attendees a blueprint for building an end-to-end enterprise-ready ML solution and demonstrate its usage on typical ML corporate use cases (telco, digital marketing).
Johnson Darkwah - Big Data Solution Architect - Gauss Algorithmic - [email protected]
Karel Vaculik - Data Scientist - Gauss Algorithmic
Jiri Polcar - Chief Data Scientist - Gauss Algorithmic
Balazs Gaspar - Pre-sales Engineer - Cloudera
To successfully run the workshop, we suggest to fork this repo, then clone your fork to a local machine or directly to your cloud instances. If you come across any mistakes, then don't hesitate to come to us or open an issue on GitHub repo.
The workshop material assumes you have knowledge and experience sufficient to:
- Preparing a Linux platform for production use (centOS)
- Python and/or Scala programming skills
- Understanding you cloud provider environment
- Basics of production Hadoop ecosystems.
- Challenges of production data science work.
- Architecture and other concepts.
- Cluster installation.
- Telco churn use case