(If any problem occurs, please refer to the troubleshooting section at the end of this guide)
To provide the best user experience, we decided to open-source VAIF with its codebase + a cloudlab profile that helps set up VAIF and the corresponding experiment's environment.
- VAIF(Pythia) CloudLab Profile User Guide
- Sign in to CloudLab
- Access VAIF’s profile, click “Next”.
- In experiment creation steps
- Parametrize: simply leave the defaults and click “Next”,
- If one wants to create an experiment of more than two machines: change “number of compute nodes” to the desired number. And it needs to update config of Pythia, following the instructions in “config pythia” section.
- One may change other parameters, but we cannot guarantee it works.
- Finalize: name your experiment and select the Utah cluster to run on, and click “Next”
- Schedule: click “Finish” to create the experiment immediately
- Your experiment will initialize and then take approximately 1-2 hours for internal setup scripts to run;
- One will get an email from the system when the installation phase starts
- One will get another email when it completes.
- Then it will be ready to use
- Parametrize: simply leave the defaults and click “Next”,
-
If one wants more than more compute node in openstack and chages “number of compute nodes” during experiment creation to >1, please follow the following steps
- If not, skip this section.
-
One should SSH into ctl node after experiment creation and auto-setup finished Check the config /etc/pythia/controller.toml -- it should reflect the actual ctl and cp nodes. If one has 1 compute node, fix config (delete cp-2 and cp-3).
- SSH into ctl node
- Sudo su geniuser
- Geniuser is a dummy user created by CloudLab itself, which we made use of it to get rid of orgnazition dependencies.
- Cd to geniuser’s directory and run
~/pythia/workloads/offline_profiling.sh
- It takes around an hour to run
- Go to pythia directory and run
pythia manifest ~/offline_traces.txt
- Run continuous workload(an example workload) while pythia continuous loop is up
- Open a new terminal window and ssh into the controller instance
- Go to geniuser’s directory and run
~/pythia/workloads/continuous_workload.sh
- Go to pythia directory and run pythia’s continuous loop
sudo RUST_BACKTRACE=1 cargo run --bin pythia_controller pythia_out 2>&1 | tee pythia_logs
- Pythia: simply exit
- Continous workload:
kill $(ps aux | grep workload | awk '{print $2}')
- Ready problems for NOVA (note that when you apply the patch below, you need to choose one injected problem and change its sleep from 1-> 20
- e.g., time.sleep (random.randint(0,20))
- See the git diff file in the drive (problem_injections_nova_diff)
- Apply this to your instance’s nova repo
- Then do
pip_install
under /local/novasudo systemctl restart nova-compute.service
- Create the dummy dir
sudo mkdir /users/output
sudo chmod ugo+rwx /users/output/
- Do above steps for all nodes
- Take a look at the trace (e.g., server_create), and determine the tracepoint to inject latency (e.g., /local/nova/nova/virt/libvirt/imagebackend.py:355)
- Inject with
import random import time time.sleep(random.randint(0,20))
- Do this for all nodes (i.e., ctl, cp-1 .,..)
- Create the dummy dir in all nodes
- sudo mkdir /users/output
- sudo chmod ugo+rwx /users/output/
- In all nodes, run
pip_install
thensudo systemctl restart nova-compute.service
- Alternatively
restart_openstack_ctl
orrestart_openstack_compute
according to the controller or compute instance
- Alternatively
- Then execute a workload
Max_concurrent_builds: Too low limit on simultaneous server creations throttles performance (Problem 3 from the paper
- change
max_concurrent_builds
option to a low number (e.g., 2) - To do this so, go to nova.conf file (/etc/nova/nova.conf)
- Then comment-in the option
max_concurrent_builds
and set it to 2. - Finally, restart all the services (including nova)
- Pythia will output results
curl --data-binary '{"jsonrpc":"2.0","id":"curltext","method":"read_node_stats","params":[]}' -H 'content-type:application/json' http://cp-1:3030
systemctl --type=service | grep pythia
sudo journalctl -u pythia.service
sudo systemctl restart pythia
- Or stop and start
cargo install --path /local/reconstruction/pythia_server
- Then
sudo systemctl stop pythia
sudo systemctl start pythia
- Do
cargo run --help
, it compiles pythia and runs it with help arg. - Consequently binary file of pythia is generated under target/release.
- Then simply copy that bin ( target/release/pythia ) to /users/geniuser/.cargo/bin/, then it is fixed..