Skip to content

Iterative Reduce Programming Guide

jpatanooga edited this page Nov 1, 2012 · 10 revisions

Overview

  • Designed specifically for parallel iterative algorithms on Hadoop
  • Implemented directly on top of YARN
  • Intrinsic Parallelism
  • Easier to focus on problem and less on distributed programming

Lifecycle of an IterativeReduce Application

Client
    Launches the YARN ApplicationMaster
Master
    Computes required resources
    Obtains resources from YARN
    Launches Workers
    Base class: ComputableMaster
Workers
    Computation on partial data (input split)
    Synchronizes with Master
    Base class: ComputableWorker
Clone this wiki locally