Skip to content

Latest commit

 

History

History
37 lines (22 loc) · 1.49 KB

README.md

File metadata and controls

37 lines (22 loc) · 1.49 KB

MLP 2017 - Advanced data analysis on Hadoop clusters workshop

This repository holds the materials for the Machine Learning Prague 2017 workshop created by Gauss Algorithmic, focusing on methods and techniques of advanced data analysis we've used in enterprise environments.

Goal of the workshop

The goal of this workshop is to give attendees a blueprint for building an end-to-end enterprise-ready ML solution and demonstrate its usage on typical ML corporate use cases (telco, digital marketing).

Speakers & mentors

Johnson Darkwah - Big Data Solution Architect - Gauss Algorithmic - [email protected]

Karel Vaculik - Data Scientist - Gauss Algorithmic

Jiri Polcar - Chief Data Scientist - Gauss Algorithmic

Balazs Gaspar - Pre-sales Engineer - Cloudera

Setup

To successfully run the workshop, we suggest to fork this repo, then clone your fork to a local machine or directly to your cloud instances. If you come across any mistakes, then don't hesitate to come to us or open an issue on GitHub repo.

Workshop assumptions

The workshop material assumes you have knowledge and experience sufficient to:

  • Preparing a Linux platform for production use (centOS)
  • Python and/or Scala programming skills
  • Understanding you cloud provider environment

Workshop topics

  • Basics of production Hadoop ecosystems.
  • Challenges of production data science work.
  • Architecture and other concepts.
  • Cluster installation.
  • Telco churn use case