Skip to content

gaussalgo/MLP_2017_workshop_hadoop

Repository files navigation

MLP 2017 - Advanced data analysis on Hadoop clusters workshop

This repository holds the materials for the Machine Learning Prague 2017 workshop created by Gauss Algorithmic, focusing on methods and techniques of advanced data analysis we've used in enterprise environments.

Goal of the workshop

The goal of this workshop is to give attendees a blueprint for building an end-to-end enterprise-ready ML solution and demonstrate its usage on typical ML corporate use cases (telco, digital marketing).

Speakers & mentors

Johnson Darkwah - Big Data Solution Architect - Gauss Algorithmic - [email protected]

Karel Vaculik - Data Scientist - Gauss Algorithmic

Jiri Polcar - Chief Data Scientist - Gauss Algorithmic

Balazs Gaspar - Pre-sales Engineer - Cloudera

Setup

To successfully run the workshop, we suggest to fork this repo, then clone your fork to a local machine or directly to your cloud instances. If you come across any mistakes, then don't hesitate to come to us or open an issue on GitHub repo.

Workshop assumptions

The workshop material assumes you have knowledge and experience sufficient to:

  • Preparing a Linux platform for production use (centOS)
  • Python and/or Scala programming skills
  • Understanding you cloud provider environment

Workshop topics

  • Basics of production Hadoop ecosystems.
  • Challenges of production data science work.
  • Architecture and other concepts.
  • Cluster installation.
  • Telco churn use case

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published