Skip to content

Commit

Permalink
gnatsd: -health, -rank, -lease, -beat
Browse files Browse the repository at this point in the history
options to control health monitoring.

The InternalClient interface offers
a general plugin interface
for running internal clients
within a gnatsd process.

The -health flag to gnatsd starts
an internal client that
runs a leader election
among the available gnatsd
instances and publishes cluster
membership changes to a set
of cluster health topics.

The -beat and -lease flags
control how frequently health
checks are run, and how long
leader leases persist.

The health agent can also be
run standalone as healthcmd.
See the main method in
gnatsd/health/healthcmd.

The -rank flag to gnatsd
adds priority rank assignment
from the command line. The
lowest ranking gnatsd instance wins
the lease on the current
election. The election
algorithm is described in
gnatsd/health/ALGORITHM.md
and is implemented in
gnatsd/health/health.go.

Fixes nats-io#433
  • Loading branch information
glycerine committed Feb 14, 2017
1 parent 37ce250 commit cdb9cf7
Show file tree
Hide file tree
Showing 34 changed files with 4,907 additions and 27 deletions.
148 changes: 148 additions & 0 deletions health/ALGORITHM.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,148 @@
# The ALLCALL leader election algorithm.

Jason E. Aten

February 2017

definition, with example value.
-----------

Let heartBeat = 3 sec. This is how frequently
we will assess cluster health by
sending out an allcall ping.

givens
--------

* Given: Let each server have a numeric integer rank, that is distinct
and unique to that server. If necessary an extremely long
true random number is used to break ties between server ranks, so
that we may assert, with probability 1, that all ranks are distinct.
Server can be put into a strict total order.

* Rule: The lower the rank is preferred for being the leader.


ALLCALL Algorithm
===========================

### I. In a continuous loop

The server always accepts and respond
to allcall broadcasts from other cluster members.

The allcall() ping asks each member of
the server cluster to reply. Replies
provide the responder's own assigned rank
and identity.

### II. Election

Election are computed locally, after accumulating
responses.

After issuing an allcall, the server
listens for a heartbeat interval.

At the end of the interval, it sorts
the respondents by rank. Since
the replies are on a broadcast
channel, more than one simultaneous
allcall reply may be incorporated into
the set of respondents.

The lowest ranking server is elected
as leader.

## Safety/Convergence: ALLCALL converges to one leader

Suppose two nodes are partitioned and so both are leaders on
their own side of the network. Then suppose the network
is joined again, so the two leaders are brought together
by a healing of the network, or by adding a new link
between the networks. The two nodes exchange Ids and
ranks via responding to allcalls, and compute
who is the new leader.

Hence the two leader situation persists for at
most one heartbeat term after the network join.

## Liveness: a leader will be chosen

Given the total order among nodes, exactly one
will be lowest rank and thus be the preferred
leader. If the leader fails, the next
heartbeat of allcall will omit that
server from the candidate list, and
the next ranking server will be chosen.

Hence, with at least one live
node, the system can run for at most one
heartbeat term before electing a leader.
Since there is a total order on
all live (non-failed) servers, only
one will be chosen.

## commentary

ALLCALL does not guarantee that there will
never be more than one leader. Availability
in the face of network partition is
desirable in many cases, and ALLCALL is
appropriate for these. This is congruent
with Nats design as an always-on system.

ALLCALL does not guarantee that a
leader will always be present, but
with live nodes it does provide
that the cluster will have a leader
after one heartbeat term has
been initiated and completed.

By design, ALLCALL functions well
in a cluster with any number of nodes.
One and two nodes, or an even number
of nodes, will work just fine.

Compared to quorum based elections
like raft and paxos, where an odd
number of at least three
nodes is required to make progress,
this can be very desirable.

ALLCALL is appropriate for AP,
rather than CP, style systems, where
availability is more important
than having a single writer. When
writes are idempotent or deduplicated
downstream, this is typically preferred.

prior art
----------

ALLCALL is a simplified version of
the well known Bully Algorithm[1][2]
for election in a distributed system
of arbitrary graph.

ALLCALL does less bullying, and lets
nodes arrive at their own conclusions.
The essential broadcast and ranking
mechanism, however, is identical.

[1] Hector Garcia-Molina, Elections in a
Distributed Computing System, IEEE
Transactions on Computers,
Vol. C-31, No. 1, January (1982) 48–59

[2] https://en.wikipedia.org/wiki/Bully_algorithm

implementation
------------

ALLCALL is implented on top of
the Nats (https://nats.io) system, see the health/
subdirectory of

https://github.com/nats-io/gnatsd

20 changes: 20 additions & 0 deletions health/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
The MIT License (MIT)

Copyright (c) 2017 Jason E. Aten

Permission is hereby granted, free of charge, to any person obtaining a copy of
this software and associated documentation files (the "Software"), to deal in
the Software without restriction, including without limitation the rights to
use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
the Software, and to permit persons to whom the Software is furnished to do so,
subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
69 changes: 69 additions & 0 deletions health/agent.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
package health

import (
"fmt"
"net"
"time"

"github.com/nats-io/gnatsd/server"
)

// Agent implements the InternalClient interface.
// It provides health status checks and
// leader election from among the candidate
// gnatsd instances in a cluster.
type Agent struct {
opts *server.Options
mship *Membership
}

// NewAgent makes a new Agent.
func NewAgent(opts *server.Options) *Agent {
return &Agent{
opts: opts,
}
}

// Name should identify the internal client for logging.
func (h *Agent) Name() string {
return "health-agent"
}

// Start makes an internal
// entirely in-process client that monitors
// cluster health and manages group
// membership functions.
//
func (h *Agent) Start(
info server.Info,
opts server.Options,
logger server.Logger,

) (net.Conn, error) {

cli, srv, err := NewInternalClientPair()
if err != nil {
return nil, fmt.Errorf("NewInternalClientPair() returned error: %s", err)
}

rank := opts.HealthRank
beat := opts.HealthBeat
lease := opts.HealthLease

cfg := &MembershipCfg{
MaxClockSkew: time.Second,
BeatDur: beat,
LeaseTime: lease,
MyRank: rank,
CliConn: cli,
Log: logger,
}
h.mship = NewMembership(cfg)
go h.mship.Start()
return srv, nil
}

// Stop halts the background goroutine.
func (h *Agent) Stop() {
h.mship.Stop()
}
89 changes: 89 additions & 0 deletions health/aloc.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
package health

import (
"encoding/json"
"os"
"time"

"github.com/nats-io/go-nats"
)

// AgentLoc conveys to interested parties
// the Id and location of one gnatsd
// server in the cluster.
type AgentLoc struct {
ID string `json:"serverId"`
Host string `json:"host"`
Port int `json:"port"`

// Are we the leader?
IsLeader bool `json:"leader"`

// LeaseExpires is zero for any
// non-leader. For the leader,
// LeaseExpires tells you when
// the leaders lease expires.
LeaseExpires time.Time `json:"leaseExpires"`

// lower rank is leader until lease
// expires. Ties are broken by ID.
// Rank should be assignable on the
// gnatsd command line with -rank to
// let the operator prioritize
// leadership for certain hosts.
Rank int `json:"rank"`

// Pid or process id is the only
// way to tell apart two processes
// sometimes, if they share the
// same nats server.
//
// Pid is the one difference between
// a nats.ServerLoc and a health.AgentLoc.
//
Pid int `json:"pid"`
}

func (s *AgentLoc) String() string {
by, err := json.Marshal(s)
panicOn(err)
return string(by)
}

func (s *AgentLoc) fromBytes(by []byte) error {
return json.Unmarshal(by, s)
}

func alocEqual(a, b *AgentLoc) bool {
aless := AgentLocLessThan(a, b)
bless := AgentLocLessThan(b, a)
return !aless && !bless
}

func slocEqualIgnoreLease(a, b *AgentLoc) bool {
a0 := *a
b0 := *b
a0.LeaseExpires = time.Time{}
a0.IsLeader = false
b0.LeaseExpires = time.Time{}
b0.IsLeader = false

aless := AgentLocLessThan(&a0, &b0)
bless := AgentLocLessThan(&b0, &a0)
return !aless && !bless
}

// the 2 types should be kept in sync.
// We return a brand new &AgentLoc{}
// with contents filled from loc.
func natsLocConvert(loc *nats.ServerLoc) *AgentLoc {
return &AgentLoc{
ID: loc.ID,
Host: loc.Host,
Port: loc.Port,
IsLeader: loc.IsLeader,
LeaseExpires: loc.LeaseExpires,
Rank: loc.Rank,
Pid: os.Getpid(),
}
}
Loading

0 comments on commit cdb9cf7

Please sign in to comment.