Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

delete legacy acls when upgrading to v1.13.x (#4742) #4768

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

hackerain
Copy link
Contributor

@hackerain hackerain commented Nov 27, 2024

the acls in v1.13.x are in tier 2 rather than tier 0 in v1.12.x, the legacy acls
may cause some unexpected behaviors because acls in tier 0 have the higest priority.
we should delete legacy acls and recreate them when upgrading to v1.13.x.

Pull Request

What type of this PR

Examples of user facing changes:

  • Features
  • Bug fixes
  • Docs
  • Tests

Which issue(s) this PR fixes

Fixes #(issue-number)

@dosubot dosubot bot added the size:S This PR changes 10-29 lines, ignoring generated files. label Nov 27, 2024
@hackerain hackerain force-pushed the bugfix branch 2 times, most recently from 677f4dd to 8aa71e8 Compare November 27, 2024 06:14
@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. and removed size:S This PR changes 10-29 lines, ignoring generated files. labels Nov 27, 2024
@@ -277,6 +277,32 @@ func (c *OVNNbClient) CreateNodeACL(pgName, nodeIPStr, joinIPStr string) error {
func (c *OVNNbClient) CreateSgDenyAllACL(sgName string) error {
pgName := GetSgPortGroupName(sgName)

// for upgrading from v1.12.x to v1.13.x
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

请把这整个部分放到一个函数中吧,比如 cleanOldACL?

只有默认的 denyall 受影响了嘛? 还有别的安全组 acl 也受同样的原因影响嘛?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我盘点了下跟acl相关的逻辑,新旧版本的对比如下:

acl(v1.12.x)
* networkpolicy(tier 0)
    * handleUpdateNp  // Informer
        * UpdateIngressACLOps
        * UpdateEgressACLOps
        * CreateGatewayACL
* node(tier 0)
    * CheckNodePortGroup //周期性任务
        * CreateNodeACL
* sg(tier 0)
    * initDefaultDenyAllSecurityGroup // 启动时执行一次
        * CreateSgDenyAllACL
    * handleAddOrUpdateSg  // Informer
        * UpdateSgACL
        * CreateSgBaseACL
* subnet(tier 0)
    * handleAddOrUpdateSubnet  // Informer
        * UpdateLogicalSwitchACL
        * SetLogicalSwitchPrivate

acl(v1.13.x)
* networkpolicy(tier 2)
    * handleUpdateNp  // Informer
        * UpdateIngressACLOps
        * UpdateEgressACLOps
        * CreateGatewayACL
* node(tier 2)
    * CheckNodePortGroup //周期性任务
        * CreateNodeACL
* sg(tier 2)
    * initDefaultDenyAllSecurityGroup // 启动时执行一次
        * CreateSgDenyAllACL
    * handleAddOrUpdateSg  // Informer
        * UpdateSgACL
        * CreateSgBaseACL
* subnet(tier 2)
    * handleAddOrUpdateSubnet  // Informer
        *  UpdateLogicalSwitchACL
        *  SetLogicalSwitchPrivate
* admin network policy(tier 1)
    * handleAddAnp  // Informer
        * UpdateAnpRuleACLOps
* baseline admin network policy(tier 3)
    * handleAddBanp // Informer
        * UpdateAnpRuleACLOps

应该是我们做admin network policy功能时,引入的tier划分,原来的tier 0现在都切换到了tier 2,我觉得一个比较好的处理这块升级的逻辑的地方就是在控制器启动的时候,在worker启动以及各项初始化工作之前,将所有这些旧的acls规则删除,然后在初始化以及worker启动之后它会自动重建新的acl规则。

类似这种的升级逻辑,以后可能还会遇到,可以考虑将它做成一个固定的模式,在我们的代码中加入升级的原生支持,将特定的升级步骤加在upgradeXXX()中。

@bobz965 @zhangzujian 看这种方式是否可行?目前该PR是按照这种思路提的,如果可行,我再补充一些UT

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

我感觉可以,统一放到 upgradeXXX XXX可以是版本号,acl 这里如果检查到 tier0 就触发一次清理吧

@dosubot dosubot bot added size:L This PR changes 100-499 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Dec 2, 2024
@hackerain hackerain changed the title delete legacy denyall acls when upgrading to v1.13.x (#4742) [WIP]delete legacy acls when upgrading to v1.13.x (#4742) Dec 2, 2024
@coveralls
Copy link

coveralls commented Dec 4, 2024

Pull Request Test Coverage Report for Build 12177594217

Details

  • 64 of 135 (47.41%) changed or added relevant lines in 9 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.07%) to 18.746%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/controller/admin_network_policy.go 0 2 0.0%
pkg/controller/baseline_admin_network_policy.go 0 2 0.0%
pkg/controller/node.go 12 23 52.17%
pkg/controller/subnet.go 10 21 47.62%
pkg/controller/security_group.go 13 25 52.0%
pkg/controller/network_policy.go 15 30 50.0%
pkg/controller/controller.go 0 18 0.0%
Totals Coverage Status
Change from base Build 12164445128: 0.07%
Covered Lines: 8735
Relevant Lines: 46596

💛 - Coveralls

@hackerain hackerain changed the title [WIP]delete legacy acls when upgrading to v1.13.x (#4742) delete legacy acls when upgrading to v1.13.x (#4742) Dec 4, 2024
npName = "np" + np.Name
}
pgName := strings.ReplaceAll(fmt.Sprintf("%s.%s", npName, np.Namespace), "-", ".")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除空行

}

func (c *Controller) upgradeNetworkPolicies() error {
if err := c.upgradeNetworkPoliciesForV1_13(); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log err


for _, node := range nodes {
pgName := strings.ReplaceAll(node.Annotations[util.PortNameAnnotation], "-", ".")

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

移除空行

}

func (c *Controller) upgradeNodes() error {
if err := c.upgradeNodesForV1_13(); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log err

}

func (c *Controller) upgradeSecurityGroups() error {
if err := c.upgradeSecurityGroupsForV1_13(); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log err

}

func (c *Controller) upgradeSubnets() error {
if err := c.upgradeSubnetsForV1_13(); err != nil {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

log err

@@ -1118,6 +1122,10 @@ func aclFilter(direction string, externalIDs map[string]string) func(acl *ovnnb.
return false
}

if tier != -1 && acl.Tier != tier {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

这个 -1 比较难理解,常量化维护吧

@hackerain hackerain force-pushed the bugfix branch 3 times, most recently from 383cd19 to 4a4ba55 Compare December 5, 2024 02:26
@zhangzujian
Copy link
Member

If network policy support is disabled by command line parameter --enable-np=false, kube-ovn-controller will panic:

I1205 02:39:41.184134      13 init.go:660] start to sync subnets
W1205 02:39:41.184146      13 init.go:672] subnet join is not ready
W1205 02:39:41.184152      13 init.go:672] subnet ovn-default is not ready
I1205 02:39:41.184157      13 init.go:749] start to sync vlans
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x20 pc=0x5611b713c1f3]

@hackerain
Copy link
Contributor Author

@zhangzujian ok, thank you, I will fix it

the acls in v1.13.x are in tier 2 rather than tier 0 in v1.12.x, the legacy acls
may cause some unexpected behaviors because acls in tier 0 have the higest priority.
we should delete legacy acls and recreate them when upgrading to v1.13.x.

Signed-off-by: suo <[email protected]>
Comment on lines +165 to +166
DeleteAcls(parentName, parentType, direction string, externalIDs map[string]string, tier int) error
DeleteAclsOps(parentName, parentType, direction string, externalIDs map[string]string, tier int) ([]ovsdb.Operation, error)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
DeleteAcls(parentName, parentType, direction string, externalIDs map[string]string, tier int) error
DeleteAclsOps(parentName, parentType, direction string, externalIDs map[string]string, tier int) ([]ovsdb.Operation, error)
DeleteAcls(parentName, parentType, direction string, externalIDs map[string]string, tier *int) error
DeleteAclsOps(parentName, parentType, direction string, externalIDs map[string]string, tier *int) ([]ovsdb.Operation, error)

这里用指针是不是更好一点?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
size:L This PR changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants