Long-running analytic tasks on big data frameworks often provide little or no feedback about the status of the execution. Some big data processing frameworks provide status updates for running jobs, but these systems only allow users to monitor their jobs passively. Even if the users notice anomalies happening during the execution, they can either kill the job or wait for the job to run to its completion.
Amber is a distributed data processing engine build on top of existing actor model implementation. It has a unique capability of supporting responsive debugging during the execution of a dataflow. Users can pause/resume the execution, investigate the state of operators, change the behavior of an operator, and set conditional breakpoints. Amber provides these features along with the support for fault tolerance. In case of a failure, it not only ensures the correctness of the final computation result, but also recovers the same consistent debugging state.
Paper: Amber: A Debuggable Dataflow System Based on the Actor Model(VLDB 2020)
Contributors: Shengquan Ni, Avinash Kumar, Zuozhi Wang, Chen Li.
Affiliation: University of California, Irvine.
-
For Windows / Mac
Download and install the latest LTS version of NodeJS (Version 12)
-
For Linux
sudo apt-get install curl software-properties-common curl -sL https://deb.nodesource.com/setup_12.x | sudo bash - sudo apt-get install nodejs
Clone this repo then do the following:
cd AmberOnOrleans/Frontend
npm install
npm run build
Running npm install
will take a long time, usually 5 to 10 minutes. You can ignore the vulnerabilities warnings in the end.
- Install dotnet-sdk 3.0
- Install MySQL and login as admin. Using the following command to create a user with username "orleansbackend" and password "orleans-0519-2019" (this can be changed at Constants.cs)
CREATE USER 'orleansbackend'@'%' IDENTIFIED BY 'orleans-0519-2019';
- Create a mysql database called 'amberorleans' and grant all privileges by using the following commands.
CREATE DATABASE amberorleans;
GRANT ALL PRIVILEGES ON amberorleans. * TO 'orleansbackend'@'%';
FLUSH PRIVILEGES;
USE amberorleans;
-
Run the scripts MySQL-Main.sql, MySQL-Clustering.sql to create the necessary tables and insert entries in the database.
-
We have generated some sample dataset for you to banchmark Amber, here are 2 datasets you can use:
Download one dataset from the links above to your local machine.
Slio is a container of actors in Orleans where all the computation takes place. We need to start Silo first so that Amber knows where to allocate actors.
Open terminal and enter:
cd AmberOnOrleans/SiloHost
dotnet run -c Release
You can ignore all the warnings and it takes time to build the connection.
Make sure you see "Silo Started!" before proceeding to step 3.
Open another terminal and enter:
cd AmberOnOrleans/ConsoleApp
dotnet run
It will prompt you to choose a sample workflow and enter the path of the dataset on your local machine.
After entering all the parameters, the workflow will automatically run and the results will be displayed.
If you want to checkout the web-based frontend of Amber. This is a step-by-step guide for creating and runnning a sample Workflow using one of the datasets above.
Open another terminal and enter:
cd AmberOnOrleans/WebApp
dotnet run
Go to http://localhost:7070
, you can see a web GUI for Amber:
Drag Source -> Scan operator from left panel and drop it on the canvas:
Then, drag and drop Utilities -> Comparison, LocalGroupBy, GlobalGroupBy and Sort -> Sort respectively. They will automatically be linked with the previous operator. Your workflow should look like this:
You can specifiy properties for each operator on the right panel. Each operator should have the following properties:
Scan:
Comparison:
LocalGroupBy:
GlobalGroupBy:
Sort:
Click the "Run" button in upper-right corner to run the workflow. After completion, the following result will pop up from the bottom:
On one cluster machine (name it A) which installed MySql Server and do the following change at Constants.cs:
public static string ClientIPAddress = <Current Machine's IP address>;
...
public volatile static int DefaultNumGrainsInOneLayer = <# of Machines in the cluster - 1>;
Slio is a container of actors in Orleans where all the computation takes place. We need to start Silo first so that Amber knows where to allocate actors.
Open terminal and enter on all other machines in the cluster:
cd AmberOnOrleans/SiloHost
dotnet run -c Release
You can ignore all the warnings and it takes time to build the connection.
Make sure you see "Silo Started!" on all the machines before proceeding to step 4.
Note: The table file should be stored in HDFS for other machine to access and you will need to use HDFS Restful link as the path of the table file.(e.g. http://128.295.2.45:9870/webhdfs/v1/datasets/lineitem.tbl
)