Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove ExistingManagement field from Cluster type #7262

Merged
merged 1 commit into from
Jan 12, 2024

Conversation

g-gaston
Copy link
Member

@g-gaston g-gaston commented Jan 5, 2024

Description of changes:
The function of this field was unclear and its name was confusing to say
the least. It seems like we now have a better way to achieve the same
thing and the resulting code is simpler and more expressive.

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@eks-distro-bot
Copy link
Collaborator

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please ask for approval from g-gaston. For more information see the Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@eks-distro-bot eks-distro-bot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jan 5, 2024
@g-gaston g-gaston force-pushed the remove-existing-management-field branch from f429590 to 24fc728 Compare January 5, 2024 13:22
Copy link

codecov bot commented Jan 5, 2024

Codecov Report

Attention: 6 lines in your changes are missing coverage. Please review.

Comparison is base (2414b66) 71.63% compared to head (dc370d5) 71.64%.
Report is 8 commits behind head on main.

Files Patch % Lines
...eksctl-anywhere/cmd/upgrademanagementcomponents.go 0.00% 2 Missing ⚠️
pkg/types/cluster.go 0.00% 2 Missing ⚠️
pkg/workflows/upgrade.go 66.66% 1 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #7262      +/-   ##
==========================================
+ Coverage   71.63%   71.64%   +0.01%     
==========================================
  Files         556      556              
  Lines       43199    43190       -9     
==========================================
- Hits        30945    30943       -2     
+ Misses      10543    10537       -6     
+ Partials     1711     1710       -1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@@ -126,15 +126,15 @@ func (f *Flux) Bootstrap(ctx context.Context, cluster *types.Cluster, clusterSpe
}

func (f *Flux) BootstrapGithub(ctx context.Context, cluster *types.Cluster, clusterSpec *cluster.Spec) error {
if cluster.ExistingManagement || clusterSpec.FluxConfig.Spec.Github == nil {
if clusterSpec.FluxConfig.Spec.Github == nil {
Copy link
Contributor

@chrisdoherty4 chrisdoherty4 Jan 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure what ExistingManagement ever represented (it always felt hyperloaded) but I would've thought it needs replacing with a new thing here as opposed to removed? I can't work out why removing is OK?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I extracted from the code, ExistingManagement was set to true in management clusters once the management components were installed.

Since I couldn't a find a test that stressed this condition, I assumed it was unnecessary.

That said, you are right and maybe I shouldn't have assumed that fast. Unfortunately I lack context about this code and I can't seem to fully understand how this is orchestrated by just looking at the code.

maybe @jiayiwang7 you would know more? Was this necessary? And if so, what was trying to do?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe initially ExistingManagement bool // true is the cluster has EKS Anywhere management components and we use this field to determine if this cluster is a management cluster. We only bootstrap/install Flux controllers on the management cluster, not workload cluster. This condition is required to ensure that (if the ExistingManagement is still being used to determine if its a management cluster)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does that mean that we can swap it for cluster.IsSelfManaged()?

What trips me about that is that this code was returning early if ExistingManagement was true. Which means we would never bootstrap management clusters. And that is the opposite of what you are describing.

Copy link
Contributor

@chrisdoherty4 chrisdoherty4 Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Side note: this conversation demonstrates why the overall change here is good; this field is confusing.

@vivek-koppuru @tatlat I see you added some code that used ExistingManagement in the upgradevalidations/versions.go for version skew handling semi-recently. How did you interpret ExistingManagement?

Copy link
Member

@vivek-koppuru vivek-koppuru Jan 9, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on what @jiayiwang7 is saying and what the condition seems to show here,

ExistingManagement: false,
, it seems as if ExisitingManagement means if there is a management cluster associated with the cluster in question, hence we are returning nil if we see that as true right? So in reality, the condition to replace with should be !cluster.IsSelfManaged()? But we need to change this because this is very confusing. The comment next to the field sounds misleading based on these conditions too, and inline with what @g-gaston was suggesting here:

What trips me about that is that this code was returning early if ExistingManagement was true. Which means we would never bootstrap management clusters. And that is the opposite of what you are describing.

Copy link
Contributor

@chrisdoherty4 chrisdoherty4 Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we think about including an IsManagement() method on Cluster (or a helper function if we need to do something dynamic)?

IsSelfManaged() is a fantastic improvement on ExistingManagement but when developing I tend to find myself wanting to know if the Cluster is a management cluster and IsSelfManaged still feels like a convoluted way of querying that.

Copy link
Member Author

@g-gaston g-gaston Jan 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok i figured out. The meaning (or the use) of ExistingManagement changes based on what the target cluster is.

If we are creating a management cluster ExistingManagement is false.
If we are creating a workload cluster, ExistingManagement is true for the management cluster and false for the workload cluster.

If we are upgrading a management cluster ExistingManagement is false.
If we are creating a workload cluster, ExistingManagement is true for the management cluster and false for the workload cluster.

Based on that, @jiayiwang7 and I figured that the right logic here is to just skip the bootstrap operation for workload clusters. For management clusters, we want to run bootstrap no matter if it's an upgrade or a create (bootstrap is idempotent).

I adjusted the code and added a couple unit tests to protect this behavior.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do we think about including an IsManagement() method on Cluster (or a helper function if we need to do something dynamic)?

Technically IsManagement can't be determined thought the cluster spec. Because the cluster spec doesn't say if a cluster is managing other clusters (and a cluster needs to be managing other cluster in order to be a management). So IsSelfManaged is more accurate in this situation.

That said, in practice, we use these two concepts interchangeably. In fact, although not accurate, "management cluster" is a more common term, regardless if the cluster is managing workload clusters or not.

I'm in favor of trading correctness for expresiveness/understability. So if we consider that IsSelfManaged is convoluted, I would suggest just renaming the method as opposed to have two methods to do the same thing. I would prefer if this was addressed in a separate PR though, if y'all are ok with that.

Copy link
Contributor

@chrisdoherty4 chrisdoherty4 Jan 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer if this was addressed in a separate PR though, if y'all are ok with that.

No problem.

Nice work diving deep @jiayiwang7 @g-gaston

@g-gaston g-gaston force-pushed the remove-existing-management-field branch from 24fc728 to d11c2ec Compare January 10, 2024 20:24
The function of this field was unclear and its name was confusing to say
the least. It seems like we now have a better way to achieve the same
thing and the resulting code is simpler and more expressive.
@g-gaston g-gaston force-pushed the remove-existing-management-field branch from d11c2ec to dc370d5 Compare January 10, 2024 20:35
@g-gaston
Copy link
Member Author

Overriding codepatch manually: path coverage is at 76% percent but the only file not well covered is the upgrade workflow, which will be removed in the release cycle in favor of separate workflows for management and workload clusters. In fact this PR increases the overall coverage thanks to the flux tests.

@g-gaston g-gaston merged commit 18094e9 into aws:main Jan 12, 2024
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lgtm size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants