Skip to content

Latest commit

 

History

History
151 lines (110 loc) · 6.99 KB

README.md

File metadata and controls

151 lines (110 loc) · 6.99 KB

Amber Prototype based on Orleans

Introduction

Long-running analytic tasks on big data frameworks often provide little or no feedback about the status of the execution. Some big data processing frameworks provide status updates for running jobs, but these systems only allow users to monitor their jobs passively. Even if the users notice anomalies happening during the execution, they can either kill the job or wait for the job to run to its completion.

Amber is a distributed data processing engine build on top of existing actor model implementation. It has a unique capability of supporting responsive debugging during the execution of a dataflow. Users can pause/resume the execution, investigate the state of operators, change the behavior of an operator, and set conditional breakpoints. Amber provides these features along with the support for fault tolerance. In case of a failure, it not only ensures the correctness of the final computation result, but also recovers the same consistent debugging state.

Paper: Amber: A Debuggable Dataflow System Based on the Actor Model(VLDB 2020)

Contributors: Shengquan Ni, Avinash Kumar, Zuozhi Wang, Chen Li.

Affiliation: University of California, Irvine.

Install Frontend

Install Node JS

  • For Windows / Mac

    Download and install the latest LTS version of NodeJS (Version 12)

  • For Linux

    sudo apt-get install curl software-properties-common
    curl -sL https://deb.nodesource.com/setup_12.x | sudo bash -
    sudo apt-get install nodejs
    

Build Frontend

Clone this repo then do the following:

cd AmberOnOrleans/Frontend
npm install
npm run build

Running npm install will take a long time, usually 5 to 10 minutes. You can ignore the vulnerabilities warnings in the end.

Install Amber

  1. Install dotnet-sdk 3.0
  2. Install MySQL and login as admin. Using the following command to create a user with username "orleansbackend" and password "orleans-0519-2019" (this can be changed at Constants.cs)
CREATE USER 'orleansbackend'@'%' IDENTIFIED BY 'orleans-0519-2019';
  1. Create a mysql database called 'amberorleans' and grant all privileges by using the following commands.
CREATE DATABASE amberorleans;
GRANT ALL PRIVILEGES ON amberorleans. * TO 'orleansbackend'@'%';
FLUSH PRIVILEGES;
USE amberorleans;
  1. Run the scripts MySQL-Main.sql, MySQL-Clustering.sql to create the necessary tables and insert entries in the database.

  2. We have generated some sample dataset for you to banchmark Amber, here are 2 datasets you can use:

    Download one dataset from the links above to your local machine.

Run Amber on your local machine:

1.Start MySql Server on local machine.

2.Start Silo:

Slio is a container of actors in Orleans where all the computation takes place. We need to start Silo first so that Amber knows where to allocate actors.

Open terminal and enter:

cd AmberOnOrleans/SiloHost
dotnet run -c Release

You can ignore all the warnings and it takes time to build the connection.

Make sure you see "Silo Started!" before proceeding to step 3.

3.Start Console Application:

Open another terminal and enter:

cd AmberOnOrleans/ConsoleApp
dotnet run

It will prompt you to choose a sample workflow and enter the path of the dataset on your local machine.

After entering all the parameters, the workflow will automatically run and the results will be displayed.

4.Create workflow through Web GUI(Optional):

If you want to checkout the web-based frontend of Amber. This is a step-by-step guide for creating and runnning a sample Workflow using one of the datasets above.

Open another terminal and enter:

cd AmberOnOrleans/WebApp
dotnet run

Go to http://localhost:7070, you can see a web GUI for Amber: web GUI

Drag Source -> Scan operator from left panel and drop it on the canvas: Scan

Then, drag and drop Utilities -> Comparison, LocalGroupBy, GlobalGroupBy and Sort -> Sort respectively. They will automatically be linked with the previous operator. Your workflow should look like this: W1

You can specifiy properties for each operator on the right panel. Each operator should have the following properties:

Scan:

Scan properties

Comparison:

Comparison properties

LocalGroupBy:

LocalGroupBy properties

GlobalGroupBy:

GlobalGroupBy properties

Sort:

Sort properties

Click the "Run" button in upper-right corner to run the workflow. After completion, the following result will pop up from the bottom:

result

Run Amber on a cluster:

1.Clone this repo:

On one cluster machine (name it A) which installed MySql Server and do the following change at Constants.cs:

public static string ClientIPAddress = <Current Machine's IP address>;
...
public volatile static int DefaultNumGrainsInOneLayer = <# of Machines in the cluster - 1>;

2.Start MySql Server on machine A.

3.Copy the edited repo to all other machines in the cluster.

4.Start Silos:

Slio is a container of actors in Orleans where all the computation takes place. We need to start Silo first so that Amber knows where to allocate actors.

Open terminal and enter on all other machines in the cluster:

cd AmberOnOrleans/SiloHost
dotnet run -c Release

You can ignore all the warnings and it takes time to build the connection.

Make sure you see "Silo Started!" on all the machines before proceeding to step 4.

5.On machine A, follow from step 3 or 4 of the tutorial above.

Note: The table file should be stored in HDFS for other machine to access and you will need to use HDFS Restful link as the path of the table file.(e.g. http://128.295.2.45:9870/webhdfs/v1/datasets/lineitem.tbl)