-
Notifications
You must be signed in to change notification settings - Fork 451
udp checksum 校验错误导致宿主机访问 SVC 概率性失败
Oilbeater edited this page Mar 31, 2021
·
4 revisions
Kube-OVN 1.6.0+
SVC 的 Endpoint 为容器网络 Pod,从宿主机访问 SVC ClusterIP+Port 概率性出现请求卡主,需要半分钟左右才会返回
观察 Endpoint 内 Pod 所在宿主机的系统日志,通过 dmesg 可看到类似如下日志,则可判断由于 udp checksum 问题导致请求失败
[ 8702.057455] UDP: bad checksum. From 192.168.16.44:13066 to 192.168.16.45:6081 ulen 98
[ 8702.097551] UDP: bad checksum. From 192.168.16.44:4234 to 192.168.16.45:6081 ulen 98
[ 8702.128824] UDP: bad checksum. From 192.168.16.44:11537 to 192.168.16.45:6081 ulen 98
[ 8702.385434] UDP: bad checksum. From 192.168.16.44:32102 to 192.168.16.45:6081 ulen 98
[ 8703.099713] UDP: bad checksum. From 192.168.16.44:4234 to 192.168.16.45:6081 ulen 98
[ 8703.388079] UDP: bad checksum. From 192.168.16.44:32102 to 192.168.16.45:6081 ulen 98
对于麒麟 V10 操作系统 dmesg 中无法显示相关信息,通过 netstat -us
观察 InCsumErrors
计数器是否一直增加
# netstat -us
IcmpMsg:
InType0: 22
InType3: 24
InType8: 117852
OutType0: 117852
OutType3: 29
OutType8: 22
Udp:
3040636 packets received
0 packets to unknown port received.
4 packet receive errors
602 packets sent
0 receive buffer errors
0 send buffer errors
InCsumErrors: 4
UdpLite:
IpExt:
InBcastPkts: 10244
InOctets: 4446320361
OutOctets: 1496815600
InBcastOctets: 3095950
InNoECTPkts: 7683903
关闭 Geneve 的 udp checksum 校验,修改 kube-system/kube-ovn-cni daemonset 的启动参数,将 --encap-checksum=false
spec:
containers:
- args:
- --enable-mirror=false
- --encap-checksum=false
- --service-cluster-ip-range=10.96.0.0/12
关闭每个节点 kube-ovn相关网卡的 tx offloading
ethtool -K ovn0 tx off
ethtool -K genev_sys_6081 tx off