Skip to content

Commit

Permalink
consul-lock: a pack demonstrating Consul leadership election
Browse files Browse the repository at this point in the history
  • Loading branch information
tgross committed Apr 22, 2022
1 parent 9f8985b commit 5186b9f
Show file tree
Hide file tree
Showing 10 changed files with 463 additions and 0 deletions.
3 changes: 3 additions & 0 deletions packs/consul_lock/CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
# 0.1.0

Initial release
156 changes: 156 additions & 0 deletions packs/consul_lock/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,156 @@
# Consul-Lock

A pack that demonstrates a script for ensuring that a single Nomad
allocation of a job is running at one time. Based on the Consul Learn
Guide for [application leader
elections](https://learn.hashicorp.com/tutorials/consul/application-leader-elections).

This pack runs a prestart sidecar task alongside the main task. The
prestart sidecar runs a script that obtains a lock in Consul and
periodically renews it. If the lock is successful, the script will
write a lock directory into the alloc dir. If it exits it
will release the lock (or the lock's TTL will expire).

The main task waits until this lock directory appears to execute its
application.

To adapt this script for transitioning leader elections back and forth
between allocations based on health checks, we recommend using
something other than shell scripts.

## Variables

* `job_name` (string "example") - The name of the job.
* `datacenters` (list(string) ["dc1"]) - A list of datacenters in the
region which are eligible for task placement.
* `region` (string "global") - The region where the job should be
placed.
* `namespace` (string "default") - The namespace for the job.
* `locker_image` (string "curlimages/curl:latest") - The container
image for the locking script. This image needs to include `curl`.
* `locker_script_path` (string "./templates/script.sh") The path to
the locker script.
* `application_image` (string "busybox:1") The container image for the
main task. This image needs to include a shell at `/bin/sh`.
* `application_args` (string "httpd -v -f -p 8001 -h /local") The
command and args for the main task's application.
* `application_port_name` (string "port") The name of the port the application listens on.
* `application_port` (number 8001) The port the application listens on.

#### `constraints` List of Objects

[Nomad job specification
constraints](https://www.nomadproject.io/docs/job-specification/constraint)
allow restricting the set of eligible nodes on which the tasks will
run. This pack automatically configures a constraint to run the tasks
on Linux hosts only.

You can set additional constraints with the `constraints` variable,
which takes a list of objects with the following fields:

* `attribute` (string) - Specifies the name or reference of the
attribute to examine for the constraint.
* `operator` (string) - Specifies the comparison operator. The
ordering is compared lexically.
* `value` (string) - Specifies the value to compare the attribute
against using the specified operation.

Below is also an example of how to pass `constraints` to the CLI with
with the `-var` argument.

```bash
nomad-pack run -var 'constraints=[{"attribute":"$${meta.my_custom_value}","operator":">","value":"3"}]' packs/consul_lock
```

#### `resources` Object

* `cpu` (number 500) - Specifies the CPU required to run the main task in
MHz.
* `memory` (number 256) - Specifies the memory required by the main
task in MB.

## Demonstration

Run two jobs from this same pack.

```
$ nomad-pack run -var job_name=left .
Evaluation ID: 7c8e6fc2-f0e3-8e2d-2c0a-6e7376d9b003
Job 'left' in pack deployment 'consul_lock' registered successfully
Pack successfully deployed. Use . to manage this this deployed instance with plan, stop,
destroy, or info
$ nomad-pack run -var job_name=right .
Evaluation ID: 404a4ead-8eee-3065-88cc-40a62f94717e
Job 'right' in pack deployment 'consul_lock' registered successfully
Pack successfully deployed. Use . to manage this this deployed instance with plan, stop,
destroy, or info
```

The `left` job will have the lock and its `main` task will be running
the webserver.

```
$ nomad job status left
...
Allocations
ID Node ID Task Group Version Desired Status Created Modified
fec2087e 9a68eb5e group 0 run running 1m13s ago 1m2s ago
$ nomad alloc logs -task block_for_lock fec2087e
...
got session lock 81c6852d-e8a3-4a7b-4725-c4a414b2bc6c
refreshing session every 5 seconds
$ nomad alloc exec -task main fec2087e ps
PID USER TIME COMMAND
1 root 0:00 httpd -v -f -p 8001 -h /local
10 root 0:00 ps
```

The `right` job will have running tasks, but they'll be blocked
waiting for the tasks from the `left` job to exit.

```
$ nomad job status right
...
Allocations
ID Node ID Task Group Version Desired Status Created Modified
676e2426 9a68eb5e group 0 run running 1m57s ago 1m45s ago
$ nomad alloc logs -task block_for_lock 676e2426
...
polling for session to be released every 5 seconds
$ nomad alloc exec -task main 676e2426 ps
PID USER TIME COMMAND
1 root 0:00 /bin/sh local/wait.sh
18 root 0:00 sleep 1
19 root 0:00 ps
```

Now stop the `left` job and we can see that it releases the lock. Or
you can kill the task container and the TTL will expire, which has the
same effect.

```
$ nomad job stop left
$ nomad alloc logs -task block_for_lock fec2087e
...
releasing session 81c6852d-e8a3-4a7b-4725-c4a414b2bc6c
true%
```

And the `right` job will now have the lock:

```
$ nomad alloc logs -task block_for_lock 676e2426
...
got session lock 81d031db-8866-3703-e15b-b8c2a10e26c8
refreshing session every 5 seconds
$ nomad alloc exec -task main 676e2426 ps
PID USER TIME COMMAND
1 root 0:00 httpd -v -f -p 8001 -h /local
253 root 0:00 ps
```
11 changes: 11 additions & 0 deletions packs/consul_lock/metadata.hcl
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
app {
url = "https://learn.hashicorp.com/tutorials/consul/application-leader-elections"
author = "HashiCorp, Inc."
}

pack {
name = "consul_lock"
description = "A pack demonstrating the use of Consul session locks for ensuring that only a single allocation of a job is running at a time."
url = "https://github.com/hashicorp/nomad-pack-community-registry/consul_lock"
version = "0.1.0"
}
Empty file added packs/consul_lock/outputs.tpl
Empty file.
17 changes: 17 additions & 0 deletions packs/consul_lock/templates/_constraints.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
[[- define "constraints" -]]
constraint {
attribute = "${attr.kernel.name}"
value = "linux"
}

[[ range $idx, $constraint := .my.constraints ]]
constraint {
attribute = [[ $constraint.attribute | quote ]]
[[- if $constraint.value ]]
value = [[ $constraint.value | quote ]]
[[- end ]]
[[- if $constraint.operator ]]
operator = [[ $constraint.operator | quote ]]
[[- end ]]
}
[[- end ]][[- end ]]
5 changes: 5 additions & 0 deletions packs/consul_lock/templates/_location.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
[[ define "location" ]]
namespace = "[[ .my.namespace ]]"
region = "[[ .my.region ]]"
datacenters = [[ .my.datacenters | toJson ]]
[[- end -]]
6 changes: 6 additions & 0 deletions packs/consul_lock/templates/_resources.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
[[- define "resources" ]]
resources {
cpu = [[ .my.resources.cpu ]]
memory = [[ .my.resources.memory ]]
}
[[- end -]]
85 changes: 85 additions & 0 deletions packs/consul_lock/templates/example.nomad.tpl
Original file line number Diff line number Diff line change
@@ -0,0 +1,85 @@
job "[[ .my.job_name ]]" {
[[ template "location" . ]]
group "group" {
[[ template "constraints" . ]]
network {
mode = "bridge"
port "[[ .my.application_port_name ]]" {
to = [[ .my.application_port ]]
}
}

service {
port = "[[ .my.application_port_name ]]"
}

task "block_for_lock" {
driver = "docker"
lifecycle {
hook = "prestart"
sidecar = true
}

env {
CONSUL_ADDR = "${attr.unique.network.ip-address}:8500"
}

config {
image = "[[ .my.locker_image ]]"
command = "/bin/sh"
args = ["-c", "apk add bash curl jq; bash local/lock.bash"]
}

template {
data = <<EOT
{{ base64Decode "[[ fileContents .my.locker_script_path | b64enc ]]" }}

EOT

destination = "local/lock.bash"
}

resources {
cpu = 128
memory = 64
}
}

task "main" {
driver = "docker"
config {
image = "[[ .my.application_image ]]"
command = "/bin/sh"
args = ["local/wait.sh"]
ports = ["[[ .my.application_port_name ]]"]
}

[[ template "resources" . ]]

template {
data = <<EOT
while :
do
[ -d "${NOMAD_ALLOC_DIR}/${NOMAD_ALLOC_ID}.lock" ] && break
sleep 1
done
# the directory exists so we have the lock and can exec into the
# main application
exec [[ .my.application_args ]]
EOT
destination = "local/wait.sh"
}

template {
data = "<html>hello from {{ env \"NOMAD_ALLOC_ID\" }}</html>"
destination = "local/index.html"
}


}
}
}
99 changes: 99 additions & 0 deletions packs/consul_lock/templates/script.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,99 @@
#!/usr/bin/env bash

# A script for ensuring that a single Nomad allocation of a job is
# running at one time. Based on the Consul Learn Guide for
# application leader elections:
# https://learn.hashicorp.com/tutorials/consul/application-leader-elections
#
# This script is designed to be run a prestart sidecar. If it exits it
# will release the lock (or the lock's TTL will expire). The main task
# should block waiting for a directory to appear named
# "${NOMAD_ALLOC_DIR}/${NOMAD_ALLOC_ID}.lock"
#
# To adapt this script for transitioning leader elections, we recommend
# using something other than shell scripts. =)

set -e

CONSUL_ADDR=${CONSUL_ADDR:-"http://localhost:8500"}
NOMAD_JOB_ID=${NOMAD_JOB_ID:-example}
NOMAD_ALLOC_ID=${NOMAD_ALLOC_ID:-$(uuidgen)}
NOMAD_ALLOC_DIR=${NOMAD_ALLOC_DIR:-./alloc}
TTL_IN_SEC=${TTL_IN_SEC:-10}
LEADER_KEY=${LEADER_KEY:-leader}
REFRESH_WINDOW=$(( $TTL_IN_SEC / 2))

# obtain a unique session identifier for this allocation. This has the
# name of the job so that operators can easily determine all the open
# sessions across the job
session_body=$(printf '{"Name": "%s", "TTL": "%ss"}' "$NOMAD_JOB_ID" "$TTL_IN_SEC")
session_id=$(curl -s \
-X PUT \
--fail \
-d "$session_body" \
"$CONSUL_ADDR/v1/session/create" | jq -r '.ID')

trap release EXIT

# release the session when this script exits. But we use a TTL on the
# session so that we don't have to rely on this script never failing
# to avoid deadlocking
release() {
echo "releasing session $session_id"
curl --fail -X PUT "$CONSUL_ADDR/v1/kv/$LEADER_KEY?release=$session_id"
}

# try to obtain the lock
try_lock() {
ok=$(curl -s -X PUT \
-d "$NOMAD_ALLOC_ID" \
"$CONSUL_ADDR/v1/kv/$LEADER_KEY?acquire=$session_id")

if [[ "$ok" == "true" ]]; then
echo "got session lock $session_id"
mkdir "${NOMAD_ALLOC_DIR}/${NOMAD_ALLOC_ID}.lock"
refresh
fi
}

# refresh the TTL at half the TTL length
refresh() {
echo "refreshing session every $REFRESH_WINDOW seconds"
while :
do
sleep $REFRESH_WINDOW
curl --fail -s \
-o /dev/null \
-X PUT \
"$CONSUL_ADDR/v1/session/renew/$session_id"
done
}

# we didn't obtain the lock, so poll the key at half the TTL length to
# see if we can get it later
poll() {
index="1"
echo "polling for session to be released every $REFRESH_WINDOW seconds"
while :
do
resp=$(curl -s -X GET \
-H "X-Consul-Index: $index" \
"$CONSUL_ADDR/v1/kv/$LEADER_KEY")
if [[ $(echo "$resp" | jq -r '.[0].Session') == "null" ]];
then
try_lock
fi

# we have to keep our session refreshed
curl --fail -s \
-o /dev/null \
-X PUT \
"$CONSUL_ADDR/v1/session/renew/$session_id"

index=$(echo "$resp" | jq -r '.[0].ModifyIndex')
sleep $REFRESH_WINDOW
done
}

try_lock
poll
Loading

0 comments on commit 5186b9f

Please sign in to comment.