Skip to content

Remote Optimization Example

Robert Carlsen edited this page May 25, 2016 · 1 revision

Performing optimizations with remote execution allows you to farm out the work of running simulations to many machines in parallel. the pswarmdriver tool supports sending its objective evaluations (i.e. Cyclus simulations) to a running cloudlus server. This guide assumes you are familiar with running optimizations locally (i.e. info in Simple-Optimization-Example) and have all the cloudlus tools installed and have the binaries on your $PATH.

Setup the Server

First, you need to set up a cloudlus server running in a location where your workers can reach it over the network (i.e. a cloud virtual server, etc.). Just copy the cloudlus binary to the server and start it running on the port you want:

$ scp $(which cloudlus) your-server.com:./
$ ssh [email protected]
$ cloudlus -addr=0.0.0.0:4242 serve -dblimit 2000 &> server.log &
$ exit

The 0.0.0.0 address tells the server to accept incoming requests from any ip address, and the (arbitrarily) chosen port is 4242. Note - you can use your local machine as the server with external workers, but you need to be sure all your workers have network access to the server (i.e. set up port-mapping, etc.)

Setup the Workers

Then you need to start up some workers. This step depends heavily on what type of computing infrastructure you are using to set up your workers. Each worker needs cyclus and cycobj commands installed and available on the $PATH. Options include:

  • If your cluster has a shared file-system you can install Cyclus and cycobj commands to a location there. Then you will need to make sure you add their install locations to the $PATH of the cloudlus workers.

  • You can use something like cde (http://www.pgbovine.net/cde.html) to package up a cyclus/cycobj environment from your local machine that you copy to each worker.

For a high-throughput HT-Condor environment with a shared file-system, you might create a condor submit file like this:

universe = vanilla
executable = runfile.sh
transfer_input_files = init.sh
should_transfer_files = yes
when_to_transfer_output = ON_EXIT_OR_EVICT
output = worker.$(PROCESS).output
error = worker.$(PROCESS).error
log = workers.log
Disk = 1048576
request_cpus = 1
request_memory = 1024
Rank = KFlops
+is_resumable = true
requirements = OpSys == "LINUX" && Arch == "x86_64" && (OpSysAndVer =?= "SL6") && (IsDedicated == true)  && KFlops >= 300000

queue 300

where init.sh is:

#!/bin/bash

env PATH=$PATH:/path/to/shared/dir/with/cyclus/and/cycobj/ /path/to/shared/dir/with/cloudlus -addr=your-server.com:4242 work -whitelist=cyclus,cycobj

and cloudlus is just the cloudlus binary. Then you would run condor_submit [your-condor-submit-file] to queue up 300 workers.

Running the Optimization

Then you can start an optimization by running something like this:

$ pswarmdriver -addr=your-server.com:4242 -scen=my-scen-file.json &> optim.log

You can follow summary stats/progress of the running jobs/simulations on the server dashboard by visiting http://your-server.com:4242 in a web browser.

Try it All Locally

If you don't have access to external computational resources and just want to test the remote execution setup out locally, you are in luck. All you have to do (assuming Cyclus and cloudlus are installed and on your $PATH) is run:

$ cloudlus serve &> server.log &
$ cloudlus work &> worker1.log &
$ pswarmdriver -addr=127.0.0.1:9875 -scen=my-scen-file.json &> optim.log

Note that 127.0.0.1 means "the local machine" in networking lingo and port 9875 is the port cloudlus defaults to if you don't specify one manually. You can then watch your worker1.log file fill up with output from the cyclus simulations being run by the pswarm optimizer. You can start up as many workers as you like:

$ cloudlus work &> worker2.log &
$ cloudlus work &> worker3.log &
$ cloudlus work &> worker4.log &
$ cloudlus work &> worker5.log &
$ cloudlus work &> worker6.log &
...

and the parallelism will be utilized automatically by the optimizer and cloudlus server.