Skip to content

Latest commit

 

History

History
272 lines (239 loc) · 12 KB

quickstart_guide.md

File metadata and controls

272 lines (239 loc) · 12 KB

vHive Quickstart

This guide describes how to set up an N-node vHive serverless cluster. See here to learn where to find table of contents.

Table of Contents

  1. Host platform requirements
    1. Hardware
    2. Software
    3. CloudLab Deployment Notes
      1. CloudLab Profile
      2. Nodes to Rent
  2. Setup a Serverless (Knative) Cluster
    1. Setup All Nodes
    2. Setup Worker Nodes
    3. Configure Master Node
    4. Configure Worker Nodes
    5. Finalise Master Node
  3. Setup a Single-Node Cluster
    1. Manual
    2. Clean Up
    3. Using a Script
  4. Deploying and Invoking Functions in vHive
    1. Deploy Functions
    2. Invoke Functions
    3. Delete Deployed Functions

I. Host platform requirements

1. Hardware

  1. Two x64 servers in the same network.
    • We have not tried vHive with Arm but it may not be hard to port because Firecracker supports Arm64 ISA.
  2. Hardware support for virtualization and KVM.
    • Nested virtualization is supported provided that KVM is available.
  3. The root partition of the host filesystem should be mounted on an SSD. That is critical for snapshot-based cold-starts.
    • We expect vHive to work on machines that use HDDs but there could be timeout-related issues with large Docker images (>1GB).

2. Software

  1. Ubuntu/Debian with sudo access and apt package manager on the host (tested on Ubuntu 18.04, v4.15).
    • Other OS-es require changes in our setup scripts, but should work in principle.
  2. Passwordless SSH. Copy the SSH keys that you use to authenticate on GitHub to all the nodes and type eval "$(ssh-agent -s)" && ssh-add to allow ssh authentication in the background.

3. CloudLab Deployment Notes

We suggest renting nodes on CloudLab as their service is available to researchers world-wide.

A. CloudLab Profile

You can use our CloudLab profile RPerf/vHive-cluster-env.

It is recommended to use a base Ubuntu 18.04 image for each node and connect the nodes in a LAN.

B. Nodes to Rent

We tested the following instructions by setting up a 2-node cluster on Cloudlab, using all of the following SSD-equipped machines: xl170 on Utah, rs440 on Mass, m400 on OneLab. xl170 are normally less occupied than the other two, and users can consider other SSD-based machines too.

SSD-equipped nodes are highly recommended. Full list of CloudLab nodes can be found here.

II. Setup a Serverless (Knative) Cluster

1. Setup All Nodes

On each node (both master and workers), execute the following instructions below as a non-root user with sudo rights using bash:

  1. Clone the vHive repository
    git clone --depth=1 https://github.com/ease-lab/vhive.git
  2. Change your working directory to the root of the repository:
    cd vhive
  3. Create a directory for vHive logs:
    mkdir -p /tmp/vhive-logs
  4. Run the node setup script:
    ./scripts/cloudlab/setup_node.sh > >(tee -a /tmp/vhive-logs/setup_node.stdout) 2> >(tee -a /tmp/vhive-logs/setup_node.stderr >&2)

    BEWARE:

    This script can print Command failed when creating the devmapper at the end. This can be safely ignored.

2. Setup Worker Nodes

On each worker node, execute the following instructions below as a non-root user with sudo rights using bash:

  1. Run the script that setups kubelet:

    ./scripts/cluster/setup_worker_kubelet.sh > >(tee -a /tmp/vhive-logs/setup_worker_kubelet.stdout) 2> >(tee -a /tmp/vhive-logs/setup_worker_kubelet.stderr >&2)
  2. Start containerd in a background terminal named containerd:

    sudo screen -dmS containerd bash -c "containerd > >(tee -a /tmp/vhive-logs/containerd.stdout) 2> >(tee -a /tmp/vhive-logs/containerd.stderr >&2)"

    Note:

    screen is a terminal multiplexer similar to tmux but widely available by default.

    Starting long-running daemons in the background using screen allows you to use a single terminal (an SSH session most likely) by keeping it unoccupied and ensures that daemons will not be terminated when you logout (voluntarily, or because of connection issues).

    • To (re-)attach a background terminal:
      sudo screen -rd <name>
    • To detach (from an attached terminal):
      Ctrl+A then D
    • To kill a background terminal:
      sudo screen -XS <name> quit
    • To list all the sessions:
      sudo screen -ls
  3. Start firecracker-containerd in a background named firecracker:

    sudo PATH=$PATH screen -dmS firecracker bash -c "/usr/local/bin/firecracker-containerd --config /etc/firecracker-containerd/config.toml > >(tee -a /tmp/vhive-logs/firecracker.stdout) 2> >(tee -a /tmp/vhive-logs/firecracker.stderr >&2)"
  4. Build vHive host orchestrator:

    source /etc/profile && go build
  5. Start vHive in a background terminal named vhive:

    # EITHER
    sudo screen -dmS vhive bash -c "./vhive > >(tee -a /tmp/vhive-logs/vhive.stdout) 2> >(tee -a /tmp/vhive-logs/vhive.stderr >&2)"
    # OR
    sudo screen -dmS vhive bash -c "./vhive -snapshots > >(tee -a /tmp/vhive-logs/vhive.stdout) 2> >(tee -a /tmp/vhive-logs/vhive.stderr >&2)"
    # OR
    sudo screen -dmS vhive bash -c "./vhive -snapshots -upf > >(tee -a /tmp/vhive-logs/vhive.stdout) 2> >(tee -a /tmp/vhive-logs/vhive.stderr >&2)"

    Note:

    By default, the microVMs are booted, -snapshots enables snapshots after the 2nd invocation of each function.

    If -snapshots and -upf are specified, the snapshots are accelerated with the Record-and-Prefetch (REAP) technique that we described in our ASPLOS'21 paper (extended abstract, full paper).

3. Configure Master Node

On the master node, execute the following instructions below as a non-root user with sudo rights using bash:

  1. Start containerd in a background terminal named containerd:
    sudo screen -dmS containerd bash -c "containerd > >(tee -a /tmp/vhive-logs/containerd.stdout) 2> >(tee -a /tmp/vhive-logs/containerd.stderr >&2)"
  2. Run the script that creates the multinode cluster:
    ./scripts/cluster/create_multinode_cluster.sh > >(tee -a /tmp/vhive-logs/create_multinode_cluster.stdout) 2> >(tee -a /tmp/vhive-logs/create_multinode_cluster.stderr >&2)

    BEWARE:

    The script will ask you the following:

    All nodes need to be joined in the cluster. Have you joined all nodes? (y/n)
    

    Leave this hanging in the terminal as we will go back to this later.

    However, in the same terminal you will see a command in following format:

    kubeadm join 128.110.154.221:6443 --token <token> \
        --discovery-token-ca-cert-hash sha256:<hash>
    

    Please copy the both lines of this command.

4. Configure Worker Nodes

On each worker node, execute the following instructions below as a non-root user with sudo rights using bash:

  1. Add the current worker to the Kubernetes cluster, by executing the command you have copied in step (3.2) using sudo:
    sudo kubeadm join IP:PORT --token <token> --discovery-token-ca-cert-hash sha256:<hash> > >(tee -a /tmp/vhive-logs/kubeadm_join.stdout) 2> >(tee -a /tmp/vhive-logs/kubeadm_join.stderr >&2)

    Note:

    On success, you should see the following message:

    This node has joined the cluster:
    * Certificate signing request was sent to apiserver and a response was received.
    * The Kubelet was informed of the new secure connection details.
    

5. Finalise Master Node

On the master node, execute the following instructions below as a non-root user with sudo rights using bash:

  1. As all worker nodes have been joined, and answer with y to the prompt we have left hanging in the terminal.
  2. As the cluster is setting up now, wait until all pods show as Running or Completed:
    watch kubectl get pods --all-namespaces

Congrats, your Knative cluster is ready!

III. Setup a Single-Node Cluster

1. Manual

In essence, you will execute the same commands for master and worker setups but on a single node.

5 seconds delay has been added between the commands to ensure that components have enough time to initialize.

Execute the following below as a non-root user with sudo rights using bash:

  1. Run the node setup script:
    ./scripts/cloudlab/setup_node.sh;
  2. Start containerd in a background terminal named containerd:
    sudo screen -dmS containerd containerd; sleep 5;

    Note:

    Regarding screen and starting daemons in background terminals, see the note in step 2 of subsection II.2 Setup Worker Nodes.

  3. Start firecracker-containerd in a background named firecracker:
    sudo PATH=$PATH screen -dmS firecracker /usr/local/bin/firecracker-containerd --config /etc/firecracker-containerd/config.toml; sleep 5;
  4. Build vHive host orchestrator:
    source /etc/profile && go build;
  5. Start vHive in a background terminal named vhive:
    sudo screen -dmS vhive ./vhive; sleep 5;
  6. Run the single node cluster setup script:
    ./scripts/cluster/create_one_node_cluster.sh

2. Clean Up

./scripts/github_runner/clean_cri_runner.sh

3. Using a Script

This script stops the existing cluster if any, cleans up and then starts a fresh single-node cluster.

export GITHUB_VHIVE_ARGS="[-dbg] [-snapshots] [-upf]" # specify if to enable debug logs; cold starts: snapshots, REAP snapshots (optional)
scripts/cloudlab/start_onenode_vhive_cluster.sh

IV. Deploying and Invoking Functions in vHive

1. Deploy Functions

On the master node, execute the following instructions below using bash:

  1. Optionally, configure the types and the number of functions to deploy in examples/deployer/functions.json.
  2. Run the deployer client:
    source /etc/profile && go run examples/deployer/client.go

    Note:

    There are runtime arguments that you can specify if necessary.

    The script writes the deployed functions' URLs in a file (urls.txt by default).

2. Invoke Functions

On any node, execute the following instructions below using bash:

  1. Run the invoker client:
    go run examples/invoker/client.go

    Note:

    There are runtime arguments (e.g., RPS or requests-per-second target, experiment duration) that you can specify if necessary.

    After invoking the functions from the input file (urls.txt by default), the script writes the measured latencies to an output file (rps<RPS>_lat.csv by default, where <RPS> is the observed requests-per-sec value) for further analysis.

3. Delete Deployed Functions

On the master node, execute the following instructions below using bash:

  1. Delete all deployed functions:
    kn service delete --all