Skip to content

Commit

Permalink
[Feature] Require ResignLeadership during upgrade
Browse files Browse the repository at this point in the history
  • Loading branch information
ajanikow committed Oct 4, 2024
1 parent 3ffda22 commit 192dfc5
Show file tree
Hide file tree
Showing 23 changed files with 269 additions and 16 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@
- (Feature) (Scheduler) Additional types
- (Feature) Alternative Upgrade Order Feature
- (Feature) (Scheduler) SchedV1 Integration
- (Feature) Require ResignLeadership during upgrade

## [1.2.42](https://github.com/arangodb/kube-arangodb/tree/1.2.42) (2024-07-23)
- (Maintenance) Go 1.22.4 & Kubernetes 1.29.6 libraries
Expand Down
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -150,6 +150,7 @@ Flags:
--deployment.feature.backup-cleanup Cleanup imported backups if required - Required ArangoDB 3.8.0 or higher
--deployment.feature.deployment-spec-defaults-restore Restore defaults from last accepted state of deployment - Required ArangoDB 3.8.0 or higher (default true)
--deployment.feature.enforced-resign-leadership Enforce ResignLeadership and ensure that Leaders are moved from restarted DBServer - Required ArangoDB 3.8.0 or higher (default true)
--deployment.feature.ensure-secured-resign-leadership Ensures that even if ResignLeadership job timeouted, data is still replicated on other servers - Required ArangoDB 3.8.0 or higher (default true)
--deployment.feature.ephemeral-volumes Enables ephemeral volumes for apps and tmp directory - Required ArangoDB 3.8.0 or higher
--deployment.feature.failover-leadership Support for leadership in fail-over mode - Required ArangoDB 3.8.0 or higher
--deployment.feature.init-containers-copy-resources Copy resources spec to built-in init containers if they are not specified - Required ArangoDB 3.8.0 or higher (default true)
Expand Down
1 change: 1 addition & 0 deletions docs/cli/arangodb_operator.md
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,7 @@ Flags:
--deployment.feature.backup-cleanup Cleanup imported backups if required - Required ArangoDB 3.8.0 or higher
--deployment.feature.deployment-spec-defaults-restore Restore defaults from last accepted state of deployment - Required ArangoDB 3.8.0 or higher (default true)
--deployment.feature.enforced-resign-leadership Enforce ResignLeadership and ensure that Leaders are moved from restarted DBServer - Required ArangoDB 3.8.0 or higher (default true)
--deployment.feature.ensure-secured-resign-leadership Ensures that even if ResignLeadership job timeouted, data is still replicated on other servers - Required ArangoDB 3.8.0 or higher (default true)
--deployment.feature.ephemeral-volumes Enables ephemeral volumes for apps and tmp directory - Required ArangoDB 3.8.0 or higher
--deployment.feature.failover-leadership Support for leadership in fail-over mode - Required ArangoDB 3.8.0 or higher
--deployment.feature.init-containers-copy-resources Copy resources spec to built-in init containers if they are not specified - Required ArangoDB 3.8.0 or higher (default true)
Expand Down
2 changes: 2 additions & 0 deletions docs/generated/actions.md
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ nav_order: 11
| EncryptionKeyRemove | no | 10m0s | no | Enterprise Only | Remove the encryption key to the pool |
| EncryptionKeyStatusUpdate | no | 10m0s | no | Enterprise Only | Update status of encryption propagation |
| EnforceResignLeadership | no | 45m0s | yes | Community & Enterprise | Run the ResignLeadership job on DBServer and checks data compatibility after |
| EnsureSecuredResignLeadership | no | 10m0s | no | Community & Enterprise | Ensures that data is still replicated on other servers |
| Idle | no | 10m0s | no | Community & Enterprise | Define idle operation in case if preconditions are not meet |
| JWTAdd | no | 10m0s | no | Enterprise Only | Adds new JWT to the pool |
| JWTClean | no | 10m0s | no | Enterprise Only | Remove JWT key from the pool |
Expand Down Expand Up @@ -133,6 +134,7 @@ spec:
EncryptionKeyRemove: 10m0s
EncryptionKeyStatusUpdate: 10m0s
EnforceResignLeadership: 45m0s
EnsureSecuredResignLeadership: 10m0s
Idle: 10m0s
JWTAdd: 10m0s
JWTClean: 10m0s
Expand Down
2 changes: 1 addition & 1 deletion internal/actions.config.go.tmpl
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{{- $root := . -}}
//
// Copyright 2023 ArangoDB GmbH, Cologne, Germany
// Copyright 2023-2024 ArangoDB GmbH, Cologne, Germany
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion internal/actions.go.tmpl
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{{- $root := . -}}
//
// Copyright 2016-2023 ArangoDB GmbH, Cologne, Germany
// Copyright 2016-2024 ArangoDB GmbH, Cologne, Germany
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion internal/actions.register.go.tmpl
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{{- $root := . -}}
//
// Copyright 2016-2023 ArangoDB GmbH, Cologne, Germany
// Copyright 2016-2024 ArangoDB GmbH, Cologne, Germany
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand Down
2 changes: 1 addition & 1 deletion internal/actions.register.test.go.tmpl
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{{- $root := . -}}
//
// Copyright 2016-2023 ArangoDB GmbH, Cologne, Germany
// Copyright 2016-2024 ArangoDB GmbH, Cologne, Germany
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand Down
3 changes: 3 additions & 0 deletions internal/actions.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ actions:
description: Run the ResignLeadership job on DBServer and checks data compatibility after
timeout: 45m
optional: true
EnsureSecuredResignLeadership:
description: Ensures that data is still replicated on other servers
timeout: 10m
KillMemberPod:
description: Execute Delete on Pod (put pod in Terminating state)
scopes:
Expand Down
14 changes: 13 additions & 1 deletion pkg/apis/deployment/v1/actions.generated.go
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
//
// Copyright 2016-2023 ArangoDB GmbH, Cologne, Germany
// Copyright 2016-2024 ArangoDB GmbH, Cologne, Germany
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -101,6 +101,9 @@ const (
// ActionEnforceResignLeadershipDefaultTimeout define default timeout for action ActionEnforceResignLeadership
ActionEnforceResignLeadershipDefaultTimeout time.Duration = 2700 * time.Second // 45m0s

// ActionEnsureSecuredResignLeadershipDefaultTimeout define default timeout for action ActionEnsureSecuredResignLeadership
ActionEnsureSecuredResignLeadershipDefaultTimeout time.Duration = 600 * time.Second // 10m0s

// ActionIdleDefaultTimeout define default timeout for action ActionIdle
ActionIdleDefaultTimeout time.Duration = ActionsDefaultTimeout

Expand Down Expand Up @@ -362,6 +365,9 @@ const (
// ActionTypeEnforceResignLeadership in scopes Normal. Run the ResignLeadership job on DBServer and checks data compatibility after
ActionTypeEnforceResignLeadership ActionType = "EnforceResignLeadership"

// ActionTypeEnsureSecuredResignLeadership in scopes Normal. Ensures that data is still replicated on other servers
ActionTypeEnsureSecuredResignLeadership ActionType = "EnsureSecuredResignLeadership"

// ActionTypeIdle in scopes Normal. Define idle operation in case if preconditions are not meet
ActionTypeIdle ActionType = "Idle"

Expand Down Expand Up @@ -601,6 +607,8 @@ func (a ActionType) DefaultTimeout() time.Duration {
return ActionEncryptionKeyStatusUpdateDefaultTimeout
case ActionTypeEnforceResignLeadership:
return ActionEnforceResignLeadershipDefaultTimeout
case ActionTypeEnsureSecuredResignLeadership:
return ActionEnsureSecuredResignLeadershipDefaultTimeout
case ActionTypeIdle:
return ActionIdleDefaultTimeout
case ActionTypeJWTAdd:
Expand Down Expand Up @@ -779,6 +787,8 @@ func (a ActionType) Priority() ActionPriority {
return ActionPriorityNormal
case ActionTypeEnforceResignLeadership:
return ActionPriorityNormal
case ActionTypeEnsureSecuredResignLeadership:
return ActionPriorityNormal
case ActionTypeIdle:
return ActionPriorityNormal
case ActionTypeJWTAdd:
Expand Down Expand Up @@ -969,6 +979,8 @@ func (a ActionType) Optional() bool {
return false
case ActionTypeEnforceResignLeadership:
return true
case ActionTypeEnsureSecuredResignLeadership:
return false
case ActionTypeIdle:
return false
case ActionTypeJWTAdd:
Expand Down
14 changes: 13 additions & 1 deletion pkg/apis/deployment/v2alpha1/actions.generated.go
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
//
// Copyright 2016-2023 ArangoDB GmbH, Cologne, Germany
// Copyright 2016-2024 ArangoDB GmbH, Cologne, Germany
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -101,6 +101,9 @@ const (
// ActionEnforceResignLeadershipDefaultTimeout define default timeout for action ActionEnforceResignLeadership
ActionEnforceResignLeadershipDefaultTimeout time.Duration = 2700 * time.Second // 45m0s

// ActionEnsureSecuredResignLeadershipDefaultTimeout define default timeout for action ActionEnsureSecuredResignLeadership
ActionEnsureSecuredResignLeadershipDefaultTimeout time.Duration = 600 * time.Second // 10m0s

// ActionIdleDefaultTimeout define default timeout for action ActionIdle
ActionIdleDefaultTimeout time.Duration = ActionsDefaultTimeout

Expand Down Expand Up @@ -362,6 +365,9 @@ const (
// ActionTypeEnforceResignLeadership in scopes Normal. Run the ResignLeadership job on DBServer and checks data compatibility after
ActionTypeEnforceResignLeadership ActionType = "EnforceResignLeadership"

// ActionTypeEnsureSecuredResignLeadership in scopes Normal. Ensures that data is still replicated on other servers
ActionTypeEnsureSecuredResignLeadership ActionType = "EnsureSecuredResignLeadership"

// ActionTypeIdle in scopes Normal. Define idle operation in case if preconditions are not meet
ActionTypeIdle ActionType = "Idle"

Expand Down Expand Up @@ -601,6 +607,8 @@ func (a ActionType) DefaultTimeout() time.Duration {
return ActionEncryptionKeyStatusUpdateDefaultTimeout
case ActionTypeEnforceResignLeadership:
return ActionEnforceResignLeadershipDefaultTimeout
case ActionTypeEnsureSecuredResignLeadership:
return ActionEnsureSecuredResignLeadershipDefaultTimeout
case ActionTypeIdle:
return ActionIdleDefaultTimeout
case ActionTypeJWTAdd:
Expand Down Expand Up @@ -779,6 +787,8 @@ func (a ActionType) Priority() ActionPriority {
return ActionPriorityNormal
case ActionTypeEnforceResignLeadership:
return ActionPriorityNormal
case ActionTypeEnsureSecuredResignLeadership:
return ActionPriorityNormal
case ActionTypeIdle:
return ActionPriorityNormal
case ActionTypeJWTAdd:
Expand Down Expand Up @@ -969,6 +979,8 @@ func (a ActionType) Optional() bool {
return false
case ActionTypeEnforceResignLeadership:
return true
case ActionTypeEnsureSecuredResignLeadership:
return false
case ActionTypeIdle:
return false
case ActionTypeJWTAdd:
Expand Down
44 changes: 43 additions & 1 deletion pkg/deployment/agency/state/state.go
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
//
// DISCLAIMER
//
// Copyright 2016-2023 ArangoDB GmbH, Cologne, Germany
// Copyright 2016-2024 ArangoDB GmbH, Cologne, Germany
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -262,6 +262,48 @@ func (s State) PlanLeaderServersWithFailOver() Servers {
return r
}

// IsServerWithShardBackup returns true if server can be restarted with risk
func (s State) IsServerWithShardBackup(server Server) bool {
for db, dbData := range s.Plan.Collections {
for collection, collectionData := range dbData {
for shard, shardDetails := range collectionData.Shards {
if len(shardDetails) <= 1 {
// RF is 1, nothing to do
continue
}

// Fund current state
currentDBs, ok := s.Current.Collections[db]
if !ok {
continue
}

currentCollection, ok := currentDBs[collection]
if !ok {
continue
}

currentShard, ok := currentCollection[shard]
if !ok {
continue
}

if len(currentShard.Servers) == 0 {
continue
}

if currentShard.Servers[0] == server {
if len(currentShard.Servers) == 1 {
return false
}
}
}
}
}

return true
}

type CollectionShardDetails []CollectionShardDetail

type CollectionShardDetail struct {
Expand Down
54 changes: 53 additions & 1 deletion pkg/deployment/agency/state/state_test.go
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
//
// DISCLAIMER
//
// Copyright 2016-2023 ArangoDB GmbH, Cologne, Germany
// Copyright 2016-2024 ArangoDB GmbH, Cologne, Germany
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -307,6 +307,58 @@ func Test_IsDBServerReadyToRestart(t *testing.T) {
}
}

func Test_IsServerWithShardBackup(t *testing.T) {
type testCase struct {
generator Generator
ready bool
server Server
}
newDBWithCol := func(writeConcern int) CollectionGeneratorInterface {
return NewDatabaseRandomGenerator().RandomCollection().WithWriteConcern(writeConcern)
}
tcs := map[string]testCase{
"missing replica": {
generator: newDBWithCol(1).WithShard().WithPlan("A", "B").WithCurrent("A").Add().Add().Add(),
ready: false,
server: "A",
},
"ready replica": {
generator: newDBWithCol(1).WithShard().WithPlan("A", "B").WithCurrent("A", "B").Add().Add().Add(),
ready: true,
server: "A",
},
"not affected replica": {
generator: newDBWithCol(1).WithShard().WithPlan("A", "B").WithCurrent("A").Add().Add().Add(),
ready: true,
server: "B",
},
"not affected nonexisting replica": {
generator: newDBWithCol(1).WithShard().WithPlan("A", "B").WithCurrent("A").Add().Add().Add(),
ready: true,
server: "C",
},
"rf1": {
generator: newDBWithCol(1).WithShard().WithPlan("A").WithCurrent("A").Add().Add().Add(),
ready: true,
server: "A",
},
}

for name, tc := range tcs {
t.Run(name, func(t *testing.T) {
s := GenerateState(t, tc.generator)

res := s.IsServerWithShardBackup(tc.server)

if tc.ready {
require.True(t, res)
} else {
require.False(t, res)
}
})
}
}

func Test_GetCollectionDatabaseByID(t *testing.T) {
var s DumpState
require.NoError(t, json.Unmarshal(agencyDump39, &s))
Expand Down
13 changes: 13 additions & 0 deletions pkg/deployment/features/resign_leadership.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ package features

func init() {
registerFeature(enforcedResignLeadership)
registerFeature(ensureSecuredResignLeadership)
}

var enforcedResignLeadership = &feature{
Expand All @@ -31,7 +32,19 @@ var enforcedResignLeadership = &feature{
enabledByDefault: true,
}

var ensureSecuredResignLeadership = &feature{
name: "ensure-secured-resign-leadership",
description: "Ensures that even if ResignLeadership job timeouted, data is still replicated on other servers",
enterpriseRequired: false,
enabledByDefault: true,
}

// EnforcedResignLeadership returns enforced ResignLeadership.
func EnforcedResignLeadership() Feature {
return enforcedResignLeadership
}

// EnsureSecuredResignLeadership returns information if data is saved on other DBServers.
func EnsureSecuredResignLeadership() Feature {
return ensureSecuredResignLeadership
}
2 changes: 1 addition & 1 deletion pkg/deployment/reconcile/action.config.generated.go
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
//
// Copyright 2023 ArangoDB GmbH, Cologne, Germany
// Copyright 2023-2024 ArangoDB GmbH, Cologne, Germany
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand Down
19 changes: 18 additions & 1 deletion pkg/deployment/reconcile/action.register.generated.go
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
//
// Copyright 2016-2023 ArangoDB GmbH, Cologne, Germany
// Copyright 2016-2024 ArangoDB GmbH, Cologne, Germany
//
// Licensed under the Apache License, Version 2.0 (the "License");
// you may not use this file except in compliance with the License.
Expand Down Expand Up @@ -96,6 +96,9 @@ var (
_ Action = &actionEnforceResignLeadership{}
_ actionFactory = newEnforceResignLeadershipAction

_ Action = &actionEnsureSecuredResignLeadership{}
_ actionFactory = newEnsureSecuredResignLeadershipAction

_ Action = &actionIdle{}
_ actionFactory = newIdleAction

Expand Down Expand Up @@ -619,6 +622,20 @@ func init() {
registerAction(action, function)
}

// EnsureSecuredResignLeadership
{
// Get Action type
action := api.ActionTypeEnsureSecuredResignLeadership

// Get Action defition
function := newEnsureSecuredResignLeadershipAction

// Wrap action main function

// Register action
registerAction(action, function)
}

// Idle
{
// Get Action type
Expand Down
Loading

0 comments on commit 192dfc5

Please sign in to comment.