Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] 自定义VPC的Nat-gw pod重启,导致已存在的外部访问EIP的流量断开,且无法自动恢复。 #4690

Open
kldancer opened this issue Nov 4, 2024 · 7 comments
Labels
bug Something isn't working eip vpc

Comments

@kldancer
Copy link

kldancer commented Nov 4, 2024

Kube-OVN Version

v1.12.22

Kubernetes Version

v1.27.6

Operation-system/Kernel Version

5.10.0

Description

发现问题是重建的nat-gw pod中macvlan子网卡net1 ,没有自动将eip和新mac地址的映射更新到交换机上。
需要手动进入nat-gw pod中执行 arping -c 1 -A -I net1 . 流量才恢复。
该现象是否正常,是否应该为一个优化项?

Steps To Reproduce

  1. 建立好EIP。在管理节点上先启动ping EIP。
  2. 手动删除对应的nat-gw pod,观察icmp是否断流且不自动恢复

Current Behavior

断流且不自动恢复

Expected Behavior

nat-gw pod 重启应该能自动恢复已存在的连接。

@kldancer kldancer added the bug Something isn't working label Nov 4, 2024
Copy link

dosubot bot commented Nov 4, 2024

@kldancer 你好!我是Dosu,一个可以帮助你解决问题、回答疑问,并成为贡献者的机器人。在等待人类维护者的过程中,我会尽力帮助你解决问题。

I found a similar open issue that might be related to your problem. The issue discusses a scenario where after restarting the kube-ovn network plugin, EIP connections cannot be restored automatically. The suggested solution involves addressing conntrack entries that might not perform SNAT correctly after a NAT Gateway restart. It suggests preventing traffic before routes and iptables rules are configured or flushing conntrack entries without SNAT/DNAT after configuration. Additionally, using the release-1.12-mc branch or the master branch of kube-ovn might help, as many related bugs were fixed in these versions [1].

Your issue seems to involve the need to manually update ARP entries using the arping command to restore traffic, which indicates that automatic ARP updates might not be functioning as expected. This could be considered for optimization to ensure automatic recovery of connections after a nat-gw pod restart.

To continue talking to Dosu, mention @dosu.

@kldancer
Copy link
Author

kldancer commented Nov 4, 2024

手动执行的命令:arping -c 1 -A -I net1 172.40.0.6 。172.40.0.6为EIP.
或者:arping -c 1 -I net1 -s 172.40.0.6 172.40.0.1 。都可以恢复流量

@bobz965
Copy link
Collaborator

bobz965 commented Nov 4, 2024

发现问题是重建的nat-gw pod中macvlan子网卡net1 ,没有自动将eip和新mac地址的映射更新到交换机上。

如果这个是问题的话,可以在创建eip的时候发送一个免费arp包。你可以提个pr来尝试修复该问题

@bobz965
Copy link
Collaborator

bobz965 commented Nov 4, 2024

ovs 的 fip 应该也有类似的实现,fip 绑定的时候会宣告一下。

@kldancer
Copy link
Author

kldancer commented Nov 4, 2024

ovs 的 fip 应该也有类似的实现,fip 绑定的时候会宣告一下。

请问有具体的实现吗?能否附个链接看看🤔

@bobz965
Copy link
Collaborator

bobz965 commented Nov 5, 2024

the FIP namespace sends out a gratuitous ARP reply and request to update the ARP cache of devices in the network: https://access.redhat.com/solutions/5185931

@bobz965
Copy link
Collaborator

bobz965 commented Nov 5, 2024

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working eip vpc
Projects
None yet
Development

No branches or pull requests

2 participants