Skip to content

Commit

Permalink
rpc: fix various test bugs
Browse files Browse the repository at this point in the history
TLDR: Prior to this patch, multiple unit tests in the `rpc` package
were misconfiguring their heartbeat interval and timeout. This patch
fixes this.

The issue was that there were two separate locations for these tunable
parameters:

- `RPCHeartbeat{Interval,Timeout}` fields in
  `ContextOptions` (previously via `*base.Config`, recently as
  direct fields in `ContextOptions`)
- also, two private fields `heartbeat{Interval,Timeout}`

The former was copied into the latter during `NewContext()`.
Only the latter (the private fields) matter for the actual behavior.

Prior to this commit, the tests would call `NewContext()` and then
afterwards would override the fields in `ContextOptions`. These
overrides were thus ineffective because the value was already copied
by that time.

In addition to this design bug, there were a couple of additional bugs
lurking: the overrides defined by the tests were problematic in some
cases. If these overrides had been effective, the tests would break.

The problem was discovered after
8d2d2bf merged, because in that
commit some of the overrides from broken tests were taken on board and
became effective. This started causing flakes.

The fix is to remove the ambiguity in configuration and remove the
problematic overrides from offending tests.

Release note: None
  • Loading branch information
knz committed Aug 2, 2023
1 parent a462321 commit 2135554
Show file tree
Hide file tree
Showing 4 changed files with 8 additions and 20 deletions.
15 changes: 5 additions & 10 deletions pkg/rpc/context.go
Original file line number Diff line number Diff line change
Expand Up @@ -213,9 +213,6 @@ type Context struct {
RemoteClocks *RemoteClockMonitor
MasterCtx context.Context // cancel on stopper quiesce

heartbeatInterval time.Duration
heartbeatTimeout time.Duration

rpcCompression bool

localInternalClient RestrictedInternalClient
Expand Down Expand Up @@ -531,13 +528,11 @@ func NewContext(ctx context.Context, opts ContextOptions) *Context {
secCtx.useNodeAuth = opts.UseNodeAuth

rpcCtx := &Context{
ContextOptions: opts,
SecurityContext: secCtx,
rpcCompression: enableRPCCompression,
MasterCtx: masterCtx,
metrics: makeMetrics(),
heartbeatInterval: opts.RPCHeartbeatInterval,
heartbeatTimeout: opts.RPCHeartbeatTimeout,
ContextOptions: opts,
SecurityContext: secCtx,
rpcCompression: enableRPCCompression,
MasterCtx: masterCtx,
metrics: makeMetrics(),
}

rpcCtx.dialbackMu.Lock()
Expand Down
8 changes: 1 addition & 7 deletions pkg/rpc/context_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -876,9 +876,6 @@ func TestOffsetMeasurement(t *testing.T) {
clientClock := &AdvancingClock{time: timeutil.Unix(0, 10)}
clientMaxOffset := time.Duration(0)
clientCtx := newTestContext(clusterID, clientClock, clientMaxOffset, stopper)
// Make the interval shorter to speed up the test.
clientCtx.RPCHeartbeatInterval = 1 * time.Millisecond
clientCtx.RPCHeartbeatTimeout = 1 * time.Millisecond
clientCtx.RemoteClocks.offsetTTL = 5 * clientClock.getAdvancementInterval()
if _, err := clientCtx.GRPCDialNode(remoteAddr, serverNodeID, DefaultClass).Connect(ctx); err != nil {
t.Fatal(err)
Expand Down Expand Up @@ -952,7 +949,7 @@ func TestFailedOffsetMeasurement(t *testing.T) {
clientCtx := newTestContext(clusterID, clock, maxOffset, stopper)
// Remove the timeout so that failure arises from exceeding the maximum
// clock reading delay, not the timeout.
clientCtx.heartbeatTimeout = 0
clientCtx.RPCHeartbeatTimeout = 0
// Allow two heartbeat for initialization. The first ping doesn't report an offset,
// the second one thus doesn't have an offset to work with, so it's only on the
// third one that's fully configured.
Expand Down Expand Up @@ -1022,9 +1019,6 @@ func TestLatencyInfoCleanupOnClosedConnection(t *testing.T) {
clientClock := &AdvancingClock{time: timeutil.Unix(0, 10)}
clientMaxOffset := time.Duration(0)
clientCtx := newTestContext(clusterID, clientClock, clientMaxOffset, stopper)
// Make the interval shorter to speed up the test.
clientCtx.RPCHeartbeatInterval = 1 * time.Millisecond
clientCtx.RPCHeartbeatTimeout = 1 * time.Millisecond

var hbDecommission atomic.Value
hbDecommission.Store(false)
Expand Down
1 change: 0 additions & 1 deletion pkg/rpc/datadriven_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -148,7 +148,6 @@ func setupEnv(t *testing.T) *ddEnv {
maxOffset := time.Duration(250)

clientCtx := newTestContext(clusterID, clock, maxOffset, stopper)
clientCtx.heartbeatInterval = 10 * time.Millisecond

env := &ddEnv{
clusterID: clusterID,
Expand Down
4 changes: 2 additions & 2 deletions pkg/rpc/peer.go
Original file line number Diff line number Diff line change
Expand Up @@ -184,8 +184,8 @@ func (rpcCtx *Context) newPeer(k peerKey) *peer {
dial: func(ctx context.Context, target string, class ConnectionClass) (*grpc.ClientConn, error) {
return rpcCtx.grpcDialRaw(ctx, target, class, rpcCtx.testingDialOpts...)
},
heartbeatInterval: rpcCtx.heartbeatInterval,
heartbeatTimeout: rpcCtx.heartbeatTimeout,
heartbeatInterval: rpcCtx.RPCHeartbeatInterval,
heartbeatTimeout: rpcCtx.RPCHeartbeatTimeout,
}
var b *circuit.Breaker

Expand Down

0 comments on commit 2135554

Please sign in to comment.