-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[main] Add new method to track originating cluster of Backup #613
base: main
Are you sure you want to change the base?
Changes from all commits
08af6f3
4c40185
140a194
ad5b6dd
8ceba36
4c9237a
e63ec61
6b62c5e
a84e5f9
9ac7ecb
dfc646a
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,109 @@ | ||
package backup | ||
|
||
import ( | ||
v1 "github.com/rancher/backup-restore-operator/pkg/apis/resources.cattle.io/v1" | ||
"github.com/rancher/wrangler/v3/pkg/condition" | ||
) | ||
|
||
type backupClusterOriginConditionMeta struct { | ||
backupName string | ||
hasClusterOriginID bool | ||
clusterOriginID string | ||
hasCurrentOriginCondition bool | ||
currentOriginCondition bool | ||
canInPlaceRestore bool | ||
hasInPlaceRestoreCondition bool | ||
currentInPlaceRestoreCondition bool | ||
Comment on lines
+9
to
+16
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Instead of having boolean fields injected during construction, could encapsulate type backupClusterMdChecker struct{
backupName string
clusterOrigin string
b *backupv1.Backup
// Any controllers or client runtime needed for methods, if any
}
func (b backupClusterMd) hasClusterOriginID() bool {
// ...
}
func (b backupClusterMd) hasCurrentOriginCondition() bool {
// ...
} This allows us to isolate some more complicated logic, as well as allowing for testing if we make these methods or struct public, which we probably should - because we should be focused on stability. If making them public, could consider moving them to sub-package, Also greatly simplifies the if statements below IMO. |
||
} | ||
|
||
func newBackupClusterOriginConditionMeta(controllerClusterID string, backup *v1.Backup) backupClusterOriginConditionMeta { | ||
conditionMeta := backupClusterOriginConditionMeta{ | ||
backupName: backup.Name, | ||
hasClusterOriginID: false, | ||
hasCurrentOriginCondition: false, | ||
currentOriginCondition: false, | ||
canInPlaceRestore: false, | ||
hasInPlaceRestoreCondition: false, | ||
currentInPlaceRestoreCondition: false, | ||
} | ||
|
||
originalValue := backup.Status.OriginCluster | ||
conditionMeta.hasClusterOriginID = originalValue != "" | ||
if conditionMeta.hasClusterOriginID { | ||
conditionMeta.clusterOriginID = originalValue | ||
} | ||
|
||
currentOriginConditionString := condition.Cond(v1.BackupConditionClusterOrigin).GetStatus(backup) | ||
conditionMeta.hasCurrentOriginCondition = currentOriginConditionString != "" | ||
if !conditionMeta.hasCurrentOriginCondition { | ||
conditionMeta.currentOriginCondition = currentOriginConditionString == "True" | ||
} | ||
|
||
if conditionMeta.hasClusterOriginID { | ||
conditionMeta.canInPlaceRestore = conditionMeta.clusterOriginID == controllerClusterID | ||
} | ||
|
||
currentInPlaceRestoreString := condition.Cond(v1.BackupConditionInPlaceRestore).GetStatus(backup) | ||
conditionMeta.hasInPlaceRestoreCondition = currentInPlaceRestoreString != "" | ||
if !conditionMeta.hasInPlaceRestoreCondition { | ||
conditionMeta.currentInPlaceRestoreCondition = currentInPlaceRestoreString == "True" | ||
} | ||
|
||
return conditionMeta | ||
} | ||
|
||
// prepareClusterOriginConditions helps set the cluster origin conditions and reports if anything changed in this part of status. | ||
func (h *handler) prepareClusterOriginConditions(backup *v1.Backup) bool { | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think a lot of this logic in this function can also be simplified/more readable if we encapsulate most of it in struct methods |
||
conditionChanged := false | ||
if !h.canUseClusterOriginStatus { | ||
currentOriginConditionString := condition.Cond(v1.BackupConditionClusterOrigin).GetStatus(backup) | ||
if currentOriginConditionString != "False" { | ||
condition.Cond(v1.BackupConditionClusterOrigin).SetStatusBool(backup, false) | ||
condition.Cond(v1.BackupConditionClusterOrigin).Message(backup, "CRD not updated to include cluster UID yet.") | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. When would this realistically happen? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yeah, at least I think so. Granted it was during devworkflows, but I did run into it myself. Maybe not as big a concern in prod, but I think could still happen - in a chart like BRO if they don't also update the CRDs. I.e. Installed via helm CLI directly. |
||
conditionChanged = true | ||
} | ||
currentInPlaceRestoreString := condition.Cond(v1.BackupConditionInPlaceRestore).GetStatus(backup) | ||
if currentInPlaceRestoreString != "False" { | ||
condition.Cond(v1.BackupConditionInPlaceRestore).SetStatusBool(backup, false) | ||
condition.Cond(v1.BackupConditionInPlaceRestore).Message(backup, "Cannot determine if in-place Restore is viable.") | ||
conditionChanged = true | ||
} | ||
|
||
return conditionChanged | ||
} | ||
|
||
// TODO: We could add a fallback mode that uses filenames (and/or the annotation) when the CRD is not updated | ||
conditionMeta := newBackupClusterOriginConditionMeta(h.kubeSystemNS, backup) | ||
|
||
// Fist pass we only care to set BackupConditionClusterOrigin based on if the context is there | ||
if !conditionMeta.hasCurrentOriginCondition || conditionMeta.currentOriginCondition != conditionMeta.hasClusterOriginID { | ||
conditionChanged = true | ||
condition.Cond(v1.BackupConditionClusterOrigin).SetStatusBool(backup, conditionMeta.hasClusterOriginID) | ||
|
||
if conditionMeta.hasClusterOriginID { | ||
condition.Cond(v1.BackupConditionClusterOrigin).Message(backup, "Backup has cluster UID attached.") | ||
} else { | ||
condition.Cond(v1.BackupConditionClusterOrigin).Message(backup, "No cluster UID attached to backup.") | ||
} | ||
} | ||
|
||
// Second pass, we care about the specifics of the ClusterOrigin to set the InPlaceRestore condition | ||
if !conditionMeta.hasClusterOriginID { | ||
// When annotation is missing, we'll mark as unable to determine | ||
condition.Cond(v1.BackupConditionInPlaceRestore).SetStatusBool(backup, false) | ||
condition.Cond(v1.BackupConditionInPlaceRestore).Message(backup, "Unable to determine if in-place Restore is viable.") | ||
} | ||
|
||
if !conditionMeta.hasInPlaceRestoreCondition || conditionMeta.canInPlaceRestore != conditionMeta.currentInPlaceRestoreCondition { | ||
conditionChanged = true | ||
condition.Cond(v1.BackupConditionInPlaceRestore).SetStatusBool(backup, conditionMeta.canInPlaceRestore) | ||
if conditionMeta.canInPlaceRestore { | ||
condition.Cond(v1.BackupConditionInPlaceRestore).Message(backup, "In-place Restore appears viable.") | ||
} else { | ||
condition.Cond(v1.BackupConditionInPlaceRestore).Message(backup, "In-place Restore does not appear viable.") | ||
} | ||
} | ||
|
||
// When the annotation is present and not changed | ||
return conditionChanged | ||
} |
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -39,7 +39,9 @@ type handler struct { | |
dynamicClient dynamic.Interface | ||
defaultBackupMountPath string | ||
defaultS3BackupLocation *v1.S3ObjectStore | ||
kubeSystemNS string | ||
// TODO: rename to kubeSystemNamespaceUID; nit to improve clarity, it's not the string representation nor the NS resource | ||
kubeSystemNS string | ||
canUseClusterOriginStatus bool | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. stylistically, should avoid storing additional boolean fields, in favor of a method. I think this field might be useless? If we opt into using the backup CRD spec for storing cluster metadata, and older backup-restore versions don't set this new null-able field, then we cannot determine where it is from. |
||
} | ||
|
||
const DefaultRetentionCount = 10 | ||
|
@@ -56,15 +58,16 @@ func Register( | |
defaultS3 *v1.S3ObjectStore) { | ||
|
||
controller := &handler{ | ||
ctx: ctx, | ||
backups: backups, | ||
resourceSets: resourceSets, | ||
secrets: secrets, | ||
namespaces: namespaces, | ||
discoveryClient: clientSet.Discovery(), | ||
dynamicClient: dynamicInterface, | ||
defaultBackupMountPath: defaultLocalBackupLocation, | ||
defaultS3BackupLocation: defaultS3, | ||
ctx: ctx, | ||
backups: backups, | ||
resourceSets: resourceSets, | ||
secrets: secrets, | ||
namespaces: namespaces, | ||
discoveryClient: clientSet.Discovery(), | ||
dynamicClient: dynamicInterface, | ||
defaultBackupMountPath: defaultLocalBackupLocation, | ||
defaultS3BackupLocation: defaultS3, | ||
canUseClusterOriginStatus: util.VerifyBackupCrdHasClusterStatus(clientSet.ApiextensionsV1()), | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. would avoid having a function returning a value that uses API calls being passed to the handler. Instead would include the logic inside the function in the setup itself, before the call to See comment below as well. |
||
} | ||
if controller.defaultBackupMountPath != "" { | ||
logrus.Infof("Default location for storing backups is %v", controller.defaultBackupMountPath) | ||
|
@@ -74,12 +77,13 @@ func Register( | |
} | ||
|
||
// Use the kube-system NS.UID as the unique ID for a cluster | ||
kubeSystemNS, err := controller.namespaces.Get("kube-system", k8sv1.GetOptions{}) | ||
kubeSystemNamespaceUID, err := util.FetchClusterUID(namespaces) | ||
if err != nil { | ||
// fatal log here, because we need the kube-system ns UID while creating any backup file | ||
logrus.Fatalf("Error getting namespace kube-system %v", err) | ||
} | ||
controller.kubeSystemNS = string(kubeSystemNS.UID) | ||
// TODO: rename to kubeSystemNamespaceUID | ||
controller.kubeSystemNS = kubeSystemNamespaceUID | ||
// Register handlers | ||
backups.OnChange(ctx, "backups", controller.OnBackupChange) | ||
} | ||
|
@@ -94,27 +98,36 @@ func (h *handler) OnBackupChange(_ string, backup *v1.Backup) (*v1.Backup, error | |
return h.setReconcilingCondition(backup, err) | ||
} | ||
|
||
// Handle updates made on Backup CRs with existing backup files | ||
if backup.Status.LastSnapshotTS != "" { | ||
if backup.Spec.Schedule == "" { | ||
// Backup CR was meant for one-time backup, and the backup has been completed. Probably here from UpdateStatus call | ||
logrus.Infof("Backup CR %v has been processed for one-time backup, returning", backup.Name) | ||
// This could also mean backup CR was updated from recurring to one-time, in which case observedGeneration needs to be updated | ||
updBackupStatus := false | ||
shouldUpdateStatus := false | ||
if backup.Generation != backup.Status.ObservedGeneration { | ||
backup.Status.ObservedGeneration = backup.Generation | ||
updBackupStatus = true | ||
shouldUpdateStatus = true | ||
} | ||
// check if the backup-type needs to be changed too | ||
if backup.Status.BackupType != "One-time" { | ||
backup.Status.BackupType = "One-time" | ||
updBackupStatus = true | ||
shouldUpdateStatus = true | ||
} | ||
if updBackupStatus { | ||
// check if the origin cluster status needs updating | ||
clusterOriginChanged := h.prepareClusterOriginConditions(backup) | ||
if clusterOriginChanged { | ||
shouldUpdateStatus = true | ||
} | ||
Comment on lines
+118
to
+121
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. nit : can simplify using if originChanged := h.prepareClusterOriginConditions(backup); originChanged{
shouldUpdateStatus = true
} |
||
if shouldUpdateStatus { | ||
return h.backups.UpdateStatus(backup) | ||
} | ||
return backup, nil | ||
} | ||
if backup.Status.NextSnapshotAt != "" { | ||
// TODO: Verify how recurring backups work after a migration today | ||
// Then decide how/where to call prepareClusterOriginConditions for that. | ||
|
||
currTime := time.Now().Format(time.RFC3339) | ||
logrus.Infof("Next snapshot is scheduled for: %v, current time: %v", backup.Status.NextSnapshotAt, currTime) | ||
|
||
|
@@ -173,20 +186,25 @@ func (h *handler) OnBackupChange(_ string, backup *v1.Backup) (*v1.Backup, error | |
return h.setReconcilingCondition(backup, err) | ||
} | ||
} | ||
|
||
storageLocationType := backup.Status.StorageLocation | ||
updateErr := retry.RetryOnConflict(retry.DefaultRetry, func() error { | ||
var err error | ||
backup, err = h.backups.Get(backup.Name, k8sv1.GetOptions{}) | ||
if err != nil { | ||
return err | ||
} | ||
// Set the Cluster origin reference on backup | ||
backup.Status.OriginCluster = h.kubeSystemNS | ||
// reset conditions to remove the reconciling condition, because as per kstatus lib its presence is considered an error | ||
backup.Status.Conditions = []genericcondition.GenericCondition{} | ||
|
||
condition.Cond(v1.BackupConditionReady).SetStatusBool(backup, true) | ||
condition.Cond(v1.BackupConditionReady).Message(backup, "Completed") | ||
condition.Cond(v1.BackupConditionUploaded).SetStatusBool(backup, true) | ||
|
||
h.prepareClusterOriginConditions(backup) | ||
|
||
backup.Status.LastSnapshotTS = time.Now().Format(time.RFC3339) | ||
if cronSchedule != nil { | ||
nextBackupAt := cronSchedule.Next(time.Now()) | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -6,9 +6,13 @@ import ( | |
"os" | ||
"reflect" | ||
|
||
apiextensionsv1 "k8s.io/apiextensions-apiserver/pkg/apis/apiextensions/v1" | ||
apiextensionsClientSetv1 "k8s.io/apiextensions-apiserver/pkg/client/clientset/clientset/typed/apiextensions/v1" | ||
|
||
v1core "github.com/rancher/wrangler/v3/pkg/generated/controllers/core/v1" | ||
"github.com/sirupsen/logrus" | ||
k8sv1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
"k8s.io/apimachinery/pkg/runtime/schema" | ||
"k8s.io/apiserver/pkg/server/options/encryptionconfig" | ||
"k8s.io/apiserver/pkg/storage/value" | ||
|
@@ -74,3 +78,47 @@ func ErrList(e []error) error { | |
} | ||
return nil | ||
} | ||
|
||
func FetchClusterUID(namespaces v1core.NamespaceController) (string, error) { | ||
kubesystemNamespace, err := namespaces.Get("kube-system", k8sv1.GetOptions{}) | ||
if err != nil { | ||
return "", err | ||
} | ||
|
||
return string(kubesystemNamespace.UID), nil | ||
} | ||
|
||
// Define the GroupVersionResource for CRDs | ||
var crdGVR = schema.GroupVersionResource{ | ||
Group: "apiextensions.k8s.io", | ||
Version: "v1", | ||
Resource: "customresourcedefinitions", | ||
} | ||
|
||
func getCRDDefinition(dynamicClient apiextensionsClientSetv1.ApiextensionsV1Interface, crdName string) (*apiextensionsv1.CustomResourceDefinition, error) { | ||
crd, err := dynamicClient.CustomResourceDefinitions().Get(context.TODO(), crdName, metav1.GetOptions{}) | ||
if err != nil { | ||
return nil, err | ||
} | ||
return crd, nil | ||
} | ||
|
||
func VerifyBackupCrdHasClusterStatus(client apiextensionsClientSetv1.ApiextensionsV1Interface) bool { | ||
crdName := "backups.resources.cattle.io" | ||
|
||
crd, err := getCRDDefinition(client, crdName) | ||
if err != nil { | ||
logrus.Infof("Error fetching CRD: %v", err) | ||
return false | ||
} | ||
Comment on lines
+110
to
+113
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. In a more ready version of this, I'd expect any errors like this to be surfaced in the setup step. And potentially exit since an error like that signifies a more serious problem or a transient error (in this case could consider a backoff mechanism) |
||
|
||
// Inspect the status schema, for example | ||
_, found := crd.Spec.Versions[0].Schema.OpenAPIV3Schema.Properties["status"].Properties["originCluster"] | ||
if found { | ||
logrus.Debugf("Status schema contains `originCluster` on CRD `%s`.\n", crdName) | ||
return true | ||
} | ||
Comment on lines
+109
to
+120
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Going back to the above discussion, storing the cluster metadata in the backup CRD spec as a nullable obejct shouldn't be a breaking change, and we can instead check if it is nil or not, rather than doing dynamic CRD reflection IMO it is easier to make it testable if we avoid checking crd definitions and focus on the presence/absence of a new field to determine that. |
||
|
||
logrus.Debugf("`originCluster` not found on status schema for CRD `%s`.\n", crdName) | ||
return false | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced the origin cluster metadata should be stored in the Status field.
I can see from the original CRD definition that the semantic difference between spec (the desired state of the object) and the status (what is the current state of the object) is a little bit muddied in this operator.
IMO, the origin cluster (and additional metadata) could be set in the Spec and be fully managed by the operator
In any case, the backup CRD's status should be the one surfacing whether or not a backup is viable for sure 👍