Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 集群中存在多个子网和 IP 池时,kubeovn 无法正确识别 attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69 为 kubevirt VirtualMachine 分配 IP 地址 #4573

Closed
hexiaodai opened this issue Sep 29, 2024 · 5 comments · Fixed by #4777
Labels
bug Something isn't working

Comments

@hexiaodai
Copy link

Kube-OVN Version

v1.12

Kubernetes Version

v1.25.3

Operation-system/Kernel Version

❯ awk -F '=' '/PRETTY_NAME/ { print $2 }' /etc/os-release
"Ubuntu 22.04.2 LTS"
❯ uname -r
6.8.0-40-generic

Description

集群中存在多个子网和 IP 池时,kubeovn 无法正确识别 attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69 为 kubevirt VirtualMachine 分配 IP 地址。

Steps To Reproduce

  1. 创建 subnet-10-66subnet-10-69 子网:
NAME           PROVIDER                 VPC           PROTOCOL   CIDR            PRIVATE   NAT     DEFAULT   GATEWAYTYPE   V4USED   V4AVAILABLE   V6USED   V6AVAILABLE   EXCLUDEIPS                                             U2OINTERCONNECTIONIP
subnet-10-69   attachnet.default.ovn    ovn-cluster   IPv4       10.69.0.0/16    false     true    false     distributed   3        65470         0        0             ["10.69.0.1..10.69.0.10","10.69.0.101..10.69.0.151"]
subnet-10-70   attachnet2.default.ovn   ovn-cluster   IPv4       10.70.0.0/16    false     true    false     distributed   0        65473         0        0             ["10.70.0.1..10.70.0.10","10.70.0.101..10.70.0.151"]

# subnet-10-66
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: subnet-10-66
spec:
  cidrBlock: 10.66.0.0/16
  default: false
  enableLb: true
  excludeIps:
  - 10.66.0.1..10.66.0.10
  - 10.66.0.101..10.66.0.151
  gateway: 10.66.0.1
  gatewayNode: ""
  gatewayType: distributed
  namespaces:
  - default
  natOutgoing: true
  private: false
  protocol: IPv4
  provider: attachnet.default.ovn
  vpc: ovn-cluster

# subnet-10-69
apiVersion: kubeovn.io/v1
kind: Subnet
metadata:
  name: subnet-10-69
spec:
  cidrBlock: 10.69.0.0/16
  default: false
  enableLb: true
  excludeIps:
  - 10.69.0.1..10.69.0.10
  - 10.69.0.101..10.69.0.151
  gateway: 10.69.0.1
  gatewayNode: ""
  gatewayType: distributed
  namespaces:
  - default
  natOutgoing: true
  private: false
  protocol: IPv4
  provider: attachnet.default.ovn
  vpc: ovn-cluster
  1. 创建 subnet-10-66-6subnet-10-69-9 IP 地址池,并且分别指定 subnet 字段为 subnet-10-66 和 subnet-10-69:
NAME             SUBNET         IPS                           V4USED   V4AVAILABLE   V6USED   V6AVAILABLE
subnet-10-66-6   subnet-10-66   ["10.66.6.10..10.66.6.240"]   8        223           0        0
subnet-10-69-9   subnet-10-69   ["10.69.9.10..10.69.9.240"]   2        229           0        0

# subnet-10-66-6
apiVersion: kubeovn.io/v1
kind: IPPool
metadata:
  name: subnet-10-66-6
spec:
  ips:
  - 10.66.6.10..10.66.6.240
  namespaces:
  - default
  subnet: subnet-10-66

# subnet-10-69-9
apiVersion: kubeovn.io/v1
kind: IPPool
metadata:
  name: subnet-10-69-9
spec:
  ips:
  - 10.69.9.10..10.69.9.240
  namespaces:
  - default
  subnet: subnet-10-69
  1. 创建 kubevirt VirtualMachine,并且指定 attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69
apiVersion: kubevirt.io/v1
kind: VirtualMachine
...
  template:
    metadata:
      annotations:
        attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69
  1. 查看 virt-launcher-vmpod
Events:
  Type     Reason                  Age                 From                 Message
  ----     ------                  ----                ----                 -------
  Normal   Scheduled               27s                 default-scheduler    Successfully assigned default/virt-launcher-vm-k8s-qz8ff to ubuntu
  Warning  AcquireAddressFailed    23s (x14 over 33s)  kube-ovn-controller  NoAvailableAddress
  Warning  FailedCreatePodSandBox  6s                  kubelet              Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "c4f6501ee3eb4c28742213fab5e8f560ee39fe01d9796142142bc7d64de845cc": plugin type="multus" name="multus-cni-network" failed (add): [default/virt-launcher-vm-k8s-qz8ff/7acbcc49-8d37-40c1-a0ff-46e3efb81599:attachnet]: error adding container to network "attachnet": RPC failed; request ip return 500 no address allocated to pod default/virt-launcher-vm-k8s-qz8ff provider attachnet.default.ovn, please see kube-ovn-controller logs to find errors
  1. 查看 kube-ovn-controller 日志,发现它尝试从 subnet-10-66-6 IP 地址池中分配 IP(期望状态是从 subnet-10-69-9 IP 池中分配 IP,因为 subnet-10-69-9 IP 池与 subnet-10-69 子网绑定)
I0929 18:05:47.763725       7 pod.go:347] enqueue update pod default/virt-launcher-vm-k8s-qz8ff
I0929 18:05:47.763828       7 pod.go:519] handle add/update pod default/virt-launcher-vm-k8s-qz8ff
I0929 18:05:47.793967       7 pod.go:576] sync pod default/virt-launcher-vm-k8s-qz8ff allocated
I0929 18:05:47.794121       7 ipam.go:62] allocate v4 , v6 , mac  for default/vm-k8s from ippool subnet-10-66-6 in subnet subnet-10-69
E0929 18:05:47.794255       7 pod.go:589] NoAvailableAddress
I0929 18:05:47.795021       7 event.go:377] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"virt-launcher-vm-k8s-qz8ff", UID:"7acbcc49-8d37-40c1-a0ff-46e3efb81599", APIVersion:"v1", ResourceVersion:"19804583", FieldPath:""}): type: 'Warning' reason: 'AcquireAddressFailed' NoAvailableAddress
E0929 18:05:47.795328       7 pod.go:406] error syncing 'default/virt-launcher-vm-k8s-qz8ff': NoAvailableAddress, requeuing
  1. 修改 kubevirt VirtualMachine,并且同时指定 attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69attachnet.default.ovn.kubernetes.io/ip_pool: subnet-10-69-9,IP 分配正常,kubevirt VirtualMachine 正常启动
apiVersion: kubevirt.io/v1
kind: VirtualMachine
...
  template:
    metadata:
      annotations:
        attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69
        attachnet.default.ovn.kubernetes.io/ip_pool: subnet-10-69-9
  1. 删除 subnet-10-66-9subnet-10-69-9,并且仅指定 kubevirt VirtualMachine attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69,IP 分配正常,kubevirt VirtualMachine 正常启动

Current Behavior

集群中同时存在多个子网和 IP 池,kubeovn 无法从通过指定 attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69 为 kubevirt VirtualMachine 分配 IP 地址。

详见 kube-ovn-controller 日志:

I0929 18:05:47.763725       7 pod.go:347] enqueue update pod default/virt-launcher-vm-k8s-qz8ff
I0929 18:05:47.763828       7 pod.go:519] handle add/update pod default/virt-launcher-vm-k8s-qz8ff
I0929 18:05:47.793967       7 pod.go:576] sync pod default/virt-launcher-vm-k8s-qz8ff allocated
I0929 18:05:47.794121       7 ipam.go:62] allocate v4 , v6 , mac  for default/vm-k8s from ippool subnet-10-66-6 in subnet subnet-10-69
E0929 18:05:47.794255       7 pod.go:589] NoAvailableAddress
I0929 18:05:47.795021       7 event.go:377] Event(v1.ObjectReference{Kind:"Pod", Namespace:"default", Name:"virt-launcher-vm-k8s-qz8ff", UID:"7acbcc49-8d37-40c1-a0ff-46e3efb81599", APIVersion:"v1", ResourceVersion:"19804583", FieldPath:""}): type: 'Warning' reason: 'AcquireAddressFailed' NoAvailableAddress
E0929 18:05:47.795328       7 pod.go:406] error syncing 'default/virt-launcher-vm-k8s-qz8ff': NoAvailableAddress, requeuing

Expected Behavior

集群中同时存在多个子网和 IP 池时,kubeovn 能正确的通过 attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69 为 kubevirt VirtualMachine 分配 IP 地址。

@hexiaodai hexiaodai added the bug Something isn't working label Sep 29, 2024
Copy link

dosubot bot commented Sep 29, 2024

@hexiaodai 你好! 我是 Dosu,一个可以帮助你解决 bug、回答问题并成为贡献者的机器人。我在这里帮助你,等待人类维护者。

要在集群中存在多个子网和 IP 池时,正确地通过 attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69 为 KubeVirt VirtualMachine 分配 IP 地址,您需要确保 VirtualMachine 被正确注解以指定逻辑交换机(子网)。

以下是如何注解 VirtualMachine 的示例:

apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  name: my-vm
  annotations:
    attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69
spec:
  running: false
  template:
    metadata:
      labels:
        kubevirt.io/domain: my-vm
    spec:
      domain:
        devices:
          disks:
          - disk:
              bus: virtio
            name: containerdisk
      volumes:
      - name: containerdisk
        containerDisk:
          image: kubevirt/cirros-container-disk-demo

在这个示例中,注解 attachnet.default.ovn.kubernetes.io/logical_switch: subnet-10-69 用于指定 VirtualMachine 应该连接到名为 subnet-10-69 的逻辑交换机(子网)。

此外,确保子网 subnet-10-69 在 Kube-OVN 中正确配置,并且 IPAM(IP 地址管理)系统知道这个子网。Kube-OVN 控制器会解析这些注解并从指定的子网中分配 IP 地址。控制器代码处理网络注解的解析,并确保使用正确的子网进行 IP 分配:

func (c *Controller) getPodAttachmentNet(pod *v1.Pod) ([]*kubeovnNet, error) {
    // ... existing code ...
    subnetName := pod.Annotations[fmt.Sprintf(util.LogicalSwitchAnnotationTemplate, providerName)]
    if subnetName == "" {
        for _, subnet := range subnets {
            if subnet.Spec.Provider == providerName {
                subnetName = subnet.Name
                break
            }
        }
    }
    // ... existing code ...
}

这种配置确保即使在集群中存在多个子网和 IP 池时,VirtualMachine 的 IP 地址分配也会从指定的子网中进行[1][2]

To continue talking to Dosu, mention @dosu.

@zhangzujian
Copy link
Member

# subnet-10-69-9
apiVersion: kubeovn.io/v1
kind: IPPool
metadata:
  name: subnet-10-69-9
spec:
  ips:
  - 10.69.9.10..10.69.9.240
  namespaces:
  - default
  subnet: subnet-10-69

不同 IPPool 不支持同时绑定同一个 namespace。

Copy link
Contributor

github-actions bot commented Dec 8, 2024

Issues go stale after 60d of inactivity. Please comment or re-open the issue if you are still interested in getting this issue fixed.

@oilbeater
Copy link
Collaborator

不同 IPPool 不支持同时绑定同一个 namespace。

@zhangzujian 这个限制的原因是啥,有可能去掉么?

@zhangzujian
Copy link
Member

不同 IPPool 不支持同时绑定同一个 namespace。

@zhangzujian 这个限制的原因是啥,有可能去掉么?

IPPool 并不是为了这种场景设计的,所以没有做支持。 现在有一个 PR 已经在做支持了。

@zhangzujian zhangzujian linked a pull request Dec 9, 2024 that will close this issue
1 task
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants