一、简述
在开发一个laravel/php项目过程中,使用了一些第三方sdk,它会做http请求。但是请求特别慢,大概在2.5s、5秒多,甚至超时,于是有了这次的debug过程。
二、服务器环境
主物理机:debian9
内核:4.9.0-13-amd64 #1 SMP Debian 4.9.228-1 (2020-07-05) x86_64 GNU/Linux
k3s master:v1.20.5+k3s1 (355fff30)
三、bug原理
0x00、k3s环境网络拓扑图
0x01、主物理机路由信息
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
liuxu@master:~$ ip route
default via 10.158.3.1 dev eth0 onlink
10.42.0.0/24 dev cni0 proto kernel scope link src 10.42.0.1
10.42.2.0/24 via 10.42.2.0 dev flannel.1 onlink
10.158.3.0/24 dev eth0 proto kernel scope link src 10.158.3.24
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
liuxu@master:~$ ip -d link show flannel.1
4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default
link/ether 42:c0:6a:0f:fc:ad brd ff:ff:ff:ff:ff:ff promiscuity 0
vxlan id 1 local 10.158.3.24 dev eth0 srcport 0 0 dstport 8472 nolearning ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
liuxu@master:~$ ip -d link show cni0
5: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default qlen 1000
link/ether a6:e6:d3:93:36:60 brd ff:ff:ff:ff:ff:ff promiscuity 0
bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q bridge_id 8000.a6:e6:d3:93:36:60 designated_root 8000.a6:e6:d3:93:36:60 root_port 0 root_path_cost 0 topology_change 0 topology_change_detected 0 hello_timer 0.00 tcn_timer 0.00 topology_change_timer 0.00 gc_timer 246.38 vlan_default_pvid 1 group_fwd_mask 0 group_address 01:80:c2:00:00:00 mcast_snooping 1 mcast_router 1 mcast_query_use_ifaddr 0 mcast_querier 0 mcast_hash_elasticity 4 mcast_hash_max 512 mcast_last_member_count 2 mcast_startup_query_count 2 mcast_last_member_interval 100 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000 mcast_startup_query_interval 3124 nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535
所以由上可知:
- overlay网络层看,
coreDNS
和业务pod
不是同一个网段,也就是分别在2台物理服务器上。coreDNS
在master服务器上,业务pod
在agent服务器上。 - pod中的容器通过veth桥接到cni0网桥,网络和flannel.1(vxlan)也桥接上,所以
业务pod
和coreDNS
是通过flannel.1通信。 - flannel.1通过eth0通信。
0x02、dns查询和libc.so
a. libc.so的问题。
在实际应用中,libc.so实际上有两种。一种是glibc.so,ubuntu、debian、centos这些系统使用。还有一种musl版本的libc.so,由alpine linux在使用。而业务pod
基于apline linux打包的容器镜像。
它们之间实际上有很多微小的差异:https://wiki.musl-libc.org/functional-differences-from-glibc.html
与本次bug相关的有:
- glibc.so和musl libc.so查询dns时,都会并发的发送A和AAAA两个请求,其目的是为了兼容ipv4和ipv6。
- musl libc.so不支持single-request-reopen、single-request等选项,且此类选项是glibc.so 2.9、2.10才支持。
- 对于/etc/resolv.conf中的nameserver,如果有多条记录,glibc.so会从上往下按顺序使用。如果第一个nameserver无法访问,则再使用第二个nameserver。而musl libc.so则会同时读取多条nameserver建立多条连接并发dns请求,并使用最先收到的返回。
- php curl模块,或者curl命令都使用libcurl.so,而libcurl.so会使用libc.so,所以调试php的curl时,可直接使用curl命令代替。
b. /etc/resolv.conf说明。
1
2
3
4
liuxu@master:~$ cat /etc/resolv.conf
nameserver 10.43.0.100
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
这个文件意思是,如果要访问一个域名test.example.cn,会经过一下步骤:
- 通过10.43.0.100查询test.example.cn.default.svc.cluster.local的A和AAAA记录,timeout默认为5秒,实际上是A和AAAA的timeout各2.5秒。
- 通过10.43.0.100查询test.example.cn.svc.cluster.local的A和AAAA记录。
- 通过10.43.0.100查询test.example.cn.cluster.local的A和AAAA记录
- 通过10.43.0.100查询test.example.cn的A和AAAA记录,此时
coreDNS
读取宿主机/etc/resolv.conf,根据宿主机的nameserver转发请求并返回。
0x03、关键信息抓包定位点
业务pod
查询test.example.cn的DNS时,数据包流通路径:
业务pod
(10.42.2.87/24)从pod中向coreDNS
service(10.43.0.100/16)发送A/AAAA请求。- iptables通过dnat更换dst ip 10.43.0.100到10.42.0.16(
coreDNS
pod ip)。 - 数据进入agent服务器cni0,agent服务器cni0将请求转发给从机flannel.1。
- agent服务器flannel.1将请求打包为vxlan包,交给agent服务器eth0(10.158.3.35/24),agent服务器eth0通过云服务商网络,将数据发送给master服务器eth0(10.158.3.24/24)。
- master服务器eth0收到是vxlan数据包(flannle.1 8472端口),将数据包交给master服务器flannel.1。
- master服务器flannel.1将数据包解开,得到dns请求数据包,通过目的地址为10.42.0.0/24和route,所以将数据包交给master服务器cni0。(此处丢包,本次bug原因)
- master服务器cni0通过veth将请求交给
coreDNS
pod。 coreDNS
解析dns请求后按原路返回数据包。
0x04、抓包信息
agent服务器cni0抓包到的数据包:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
14:21:47.812950 IP (tos 0x0, ttl 64, id 10589, offset 0, flags [DF], proto UDP (17), length 95)
10.42.2.87.48303 > 10.42.0.16.domain: [bad udp cksum 0x17df -> 0xe467!] 63998+ A? test.example.cn.default.svc.cluster.local. (67)
14:21:47.813014 IP (tos 0x0, ttl 64, id 10590, offset 0, flags [DF], proto UDP (17), length 95)
10.42.2.87.48303 > 10.42.0.16.domain: [bad udp cksum 0x17df -> 0xc754!] 64529+ AAAA? test.example.cn.default.svc.cluster.local. (67)
14:21:47.813477 IP (tos 0x0, ttl 62, id 60151, offset 0, flags [DF], proto UDP (17), length 188)
10.43.0.100.domain > 10.42.2.87.48303: [udp sum ok] 64529 NXDomain*- q: AAAA? test.example.cn.default.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (160)
14:21:47.813525 IP (tos 0x0, ttl 62, id 60152, offset 0, flags [DF], proto UDP (17), length 188)
10.43.0.100.domain > 10.42.2.87.48303: [udp sum ok] 63998 NXDomain*- q: A? test.example.cn.default.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (160)
14:21:47.813578 IP (tos 0x0, ttl 64, id 10591, offset 0, flags [DF], proto UDP (17), length 87)
10.42.2.87.49709 > 10.42.0.16.domain: [bad udp cksum 0x17d7 -> 0x1089!] 804+ A? test.example.cn.svc.cluster.local. (59)
14:21:47.813624 IP (tos 0x0, ttl 64, id 10592, offset 0, flags [DF], proto UDP (17), length 87)
10.42.2.87.49709 > 10.42.0.16.domain: [bad udp cksum 0x17d7 -> 0xf339!] 1395+ AAAA? test.example.cn.svc.cluster.local. (59)
14:21:47.813971 IP (tos 0x0, ttl 62, id 60153, offset 0, flags [DF], proto UDP (17), length 180)
10.43.0.100.domain > 10.42.2.87.49709: [udp sum ok] 1395 NXDomain*- q: AAAA? test.example.cn.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (152)
14:21:47.814008 IP (tos 0x0, ttl 62, id 60154, offset 0, flags [DF], proto UDP (17), length 180)
10.43.0.100.domain > 10.42.2.87.49709: [udp sum ok] 804 NXDomain*- q: A? test.example.cn.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (152)
14:21:47.814064 IP (tos 0x0, ttl 64, id 10593, offset 0, flags [DF], proto UDP (17), length 83)
10.42.2.87.35181 > 10.42.0.16.domain: [bad udp cksum 0x17d3 -> 0x6300!] 25419+ A? test.example.cn.cluster.local. (55)
14:21:47.814096 IP (tos 0x0, ttl 64, id 10594, offset 0, flags [DF], proto UDP (17), length 83)
10.42.2.87.35181 > 10.42.0.16.domain: [bad udp cksum 0x17d3 -> 0x468e!] 25789+ AAAA? test.example.cn.cluster.local. (55)
14:21:47.814367 IP (tos 0x0, ttl 62, id 60155, offset 0, flags [DF], proto UDP (17), length 176)
10.43.0.100.domain > 10.42.2.87.35181: [udp sum ok] 25419 NXDomain*- q: A? test.example.cn.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (148)
14:21:50.316083 IP (tos 0x0, ttl 64, id 10992, offset 0, flags [DF], proto UDP (17), length 83)
10.42.2.87.35181 > 10.42.0.16.domain: [bad udp cksum 0x17d3 -> 0x468e!] 25789+ AAAA? test.example.cn.cluster.local. (55)
14:21:50.316573 IP (tos 0x0, ttl 62, id 60543, offset 0, flags [DF], proto UDP (17), length 176)
10.43.0.100.domain > 10.42.2.87.35181: [udp sum ok] 25789 NXDomain*- q: AAAA? test.example.cn.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (148)
14:21:50.316672 IP (tos 0x0, ttl 64, id 10993, offset 0, flags [DF], proto UDP (17), length 69)
10.42.2.87.34314 > 10.42.0.16.domain: [bad udp cksum 0x17c5 -> 0x5bc9!] 37116+ A? test.example.cn. (41)
14:21:50.316717 IP (tos 0x0, ttl 64, id 10994, offset 0, flags [DF], proto UDP (17), length 69)
10.42.2.87.34314 > 10.42.0.16.domain: [bad udp cksum 0x17c5 -> 0x3f7f!] 37446+ AAAA? test.example.cn. (41)
14:21:50.363460 IP (tos 0x0, ttl 62, id 60552, offset 0, flags [DF], proto UDP (17), length 108)
10.43.0.100.domain > 10.42.2.87.34314: [udp sum ok] 37116 q: A? test.example.cn. 1/0/0 test.example.cn. [30s] A 121.xx.xx.xx (80)
14:21:50.510398 IP (tos 0x0, ttl 62, id 60559, offset 0, flags [DF], proto UDP (17), length 152)
10.43.0.100.domain > 10.42.2.87.34314: [udp sum ok] 37446 q: AAAA? test.example.cn. 0/1/0 ns: mbgadev.cn. [30s] SOA vip3.alidns.com. hostmaster.hichina.com. 2020091420 3600 1200 86400 360 (124)
agent服务器flannel.1抓包到的数据包:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
14:21:47.812974 IP (tos 0x0, ttl 63, id 10589, offset 0, flags [DF], proto UDP (17), length 95)
10.42.2.87.48303 > 10.42.0.16.domain: [bad udp cksum 0x17df -> 0xe467!] 63998+ A? test.example.cn.default.svc.cluster.local. (67)
14:21:47.813018 IP (tos 0x0, ttl 63, id 10590, offset 0, flags [DF], proto UDP (17), length 95)
10.42.2.87.48303 > 10.42.0.16.domain: [bad udp cksum 0x17df -> 0xc754!] 64529+ AAAA? test.example.cn.default.svc.cluster.local. (67)
14:21:47.813458 IP (tos 0x0, ttl 63, id 60151, offset 0, flags [DF], proto UDP (17), length 188)
10.42.0.16.domain > 10.42.2.87.48303: [udp sum ok] 64529 NXDomain*- q: AAAA? test.example.cn.default.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (160)
14:21:47.813512 IP (tos 0x0, ttl 63, id 60152, offset 0, flags [DF], proto UDP (17), length 188)
10.42.0.16.domain > 10.42.2.87.48303: [udp sum ok] 63998 NXDomain*- q: A? test.example.cn.default.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (160)
14:21:47.813595 IP (tos 0x0, ttl 63, id 10591, offset 0, flags [DF], proto UDP (17), length 87)
10.42.2.87.49709 > 10.42.0.16.domain: [bad udp cksum 0x17d7 -> 0x1089!] 804+ A? test.example.cn.svc.cluster.local. (59)
14:21:47.813635 IP (tos 0x0, ttl 63, id 10592, offset 0, flags [DF], proto UDP (17), length 87)
10.42.2.87.49709 > 10.42.0.16.domain: [bad udp cksum 0x17d7 -> 0xf339!] 1395+ AAAA? test.example.cn.svc.cluster.local. (59)
14:21:47.813962 IP (tos 0x0, ttl 63, id 60153, offset 0, flags [DF], proto UDP (17), length 180)
10.42.0.16.domain > 10.42.2.87.49709: [udp sum ok] 1395 NXDomain*- q: AAAA? test.example.cn.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (152)
14:21:47.814003 IP (tos 0x0, ttl 63, id 60154, offset 0, flags [DF], proto UDP (17), length 180)
10.42.0.16.domain > 10.42.2.87.49709: [udp sum ok] 804 NXDomain*- q: A? test.example.cn.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (152)
14:21:47.814077 IP (tos 0x0, ttl 63, id 10593, offset 0, flags [DF], proto UDP (17), length 83)
10.42.2.87.35181 > 10.42.0.16.domain: [bad udp cksum 0x17d3 -> 0x6300!] 25419+ A? test.example.cn.cluster.local. (55)
14:21:47.814100 IP (tos 0x0, ttl 63, id 10594, offset 0, flags [DF], proto UDP (17), length 83)
10.42.2.87.35181 > 10.42.0.16.domain: [bad udp cksum 0x17d3 -> 0x468e!] 25789+ AAAA? test.example.cn.cluster.local. (55)
14:21:47.814358 IP (tos 0x0, ttl 63, id 60155, offset 0, flags [DF], proto UDP (17), length 176)
10.42.0.16.domain > 10.42.2.87.35181: [udp sum ok] 25419 NXDomain*- q: A? test.example.cn.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (148)
14:21:50.316100 IP (tos 0x0, ttl 63, id 10992, offset 0, flags [DF], proto UDP (17), length 83)
10.42.2.87.35181 > 10.42.0.16.domain: [bad udp cksum 0x17d3 -> 0x468e!] 25789+ AAAA? test.example.cn.cluster.local. (55)
14:21:50.316552 IP (tos 0x0, ttl 63, id 60543, offset 0, flags [DF], proto UDP (17), length 176)
10.42.0.16.domain > 10.42.2.87.35181: [udp sum ok] 25789 NXDomain*- q: AAAA? test.example.cn.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (148)
14:21:50.316692 IP (tos 0x0, ttl 63, id 10993, offset 0, flags [DF], proto UDP (17), length 69)
10.42.2.87.34314 > 10.42.0.16.domain: [bad udp cksum 0x17c5 -> 0x5bc9!] 37116+ A? test.example.cn. (41)
14:21:50.316720 IP (tos 0x0, ttl 63, id 10994, offset 0, flags [DF], proto UDP (17), length 69)
10.42.2.87.34314 > 10.42.0.16.domain: [bad udp cksum 0x17c5 -> 0x3f7f!] 37446+ AAAA? test.example.cn. (41)
14:21:50.363441 IP (tos 0x0, ttl 63, id 60552, offset 0, flags [DF], proto UDP (17), length 108)
10.42.0.16.domain > 10.42.2.87.34314: [udp sum ok] 37116 q: A? test.example.cn. 1/0/0 test.example.cn. [30s] A 121.xx.xx.xx (80)
14:21:50.510381 IP (tos 0x0, ttl 63, id 60559, offset 0, flags [DF], proto UDP (17), length 152)
10.42.0.16.domain > 10.42.2.87.34314: [udp sum ok] 37446 q: AAAA? test.example.cn. 0/1/0 ns: mbgadev.cn. [30s] SOA vip3.alidns.com. hostmaster.hichina.com. 2020091420 3600 1200 86400 360 (124)
14:21:52.633975 IP (tos 0x0, ttl 63, id 11543, offset 0, flags [DF], proto UDP (17), length 111)
master服务器flannel.1抓包到的数据包:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
14:21:47.818265 IP (tos 0x0, ttl 63, id 10589, offset 0, flags [DF], proto UDP (17), length 95)
10.42.2.87.48303 > 10.42.0.16.domain: [udp sum ok] 63998+ A? test.example.cn.default.svc.cluster.local. (67)
14:21:47.818291 IP (tos 0x0, ttl 63, id 10590, offset 0, flags [DF], proto UDP (17), length 95)
10.42.2.87.48303 > 10.42.0.16.domain: [udp sum ok] 64529+ AAAA? test.example.cn.default.svc.cluster.local. (67)
14:21:47.818509 IP (tos 0x0, ttl 63, id 60151, offset 0, flags [DF], proto UDP (17), length 188)
10.42.0.16.domain > 10.42.2.87.48303: [bad udp cksum 0x183c -> 0x239b!] 64529 NXDomain*- q: AAAA? test.example.cn.default.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (160)
14:21:47.818606 IP (tos 0x0, ttl 63, id 60152, offset 0, flags [DF], proto UDP (17), length 188)
10.42.0.16.domain > 10.42.2.87.48303: [bad udp cksum 0x183c -> 0x40ae!] 63998 NXDomain*- q: A? test.example.cn.default.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (160)
14:21:47.818847 IP (tos 0x0, ttl 63, id 10591, offset 0, flags [DF], proto UDP (17), length 87)
10.42.2.87.49709 > 10.42.0.16.domain: [udp sum ok] 804+ A? test.example.cn.svc.cluster.local. (59)
14:21:47.818872 IP (tos 0x0, ttl 63, id 10592, offset 0, flags [DF], proto UDP (17), length 87)
10.42.2.87.49709 > 10.42.0.16.domain: [udp sum ok] 1395+ AAAA? test.example.cn.svc.cluster.local. (59)
14:21:47.819023 IP (tos 0x0, ttl 63, id 60153, offset 0, flags [DF], proto UDP (17), length 180)
10.42.0.16.domain > 10.42.2.87.49709: [bad udp cksum 0x1834 -> 0x4f80!] 1395 NXDomain*- q: AAAA? test.example.cn.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (152)
14:21:47.819102 IP (tos 0x0, ttl 63, id 60154, offset 0, flags [DF], proto UDP (17), length 180)
10.42.0.16.domain > 10.42.2.87.49709: [bad udp cksum 0x1834 -> 0x6ccf!] 804 NXDomain*- q: A? test.example.cn.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (152)
14:21:47.819316 IP (tos 0x0, ttl 63, id 10593, offset 0, flags [DF], proto UDP (17), length 83)
10.42.2.87.35181 > 10.42.0.16.domain: [udp sum ok] 25419+ A? test.example.cn.cluster.local. (55)
14:21:47.819323 IP (tos 0x0, ttl 63, id 10594, offset 0, flags [DF], proto UDP (17), length 83)
10.42.2.87.35181 > 10.42.0.16.domain: [udp sum ok] 25789+ AAAA? test.example.cn.cluster.local. (55)
14:21:47.819436 IP (tos 0x0, ttl 63, id 60155, offset 0, flags [DF], proto UDP (17), length 176)
10.42.0.16.domain > 10.42.2.87.35181: [bad udp cksum 0x1830 -> 0xbf46!] 25419 NXDomain*- q: A? test.example.cn.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (148)
14:21:50.321330 IP (tos 0x0, ttl 63, id 10992, offset 0, flags [DF], proto UDP (17), length 83)
10.42.2.87.35181 > 10.42.0.16.domain: [udp sum ok] 25789+ AAAA? test.example.cn.cluster.local. (55)
14:21:50.321598 IP (tos 0x0, ttl 63, id 60543, offset 0, flags [DF], proto UDP (17), length 176)
10.42.0.16.domain > 10.42.2.87.35181: [bad udp cksum 0x1830 -> 0xa2d4!] 25789 NXDomain*- q: AAAA? test.example.cn.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (148)
14:21:50.321934 IP (tos 0x0, ttl 63, id 10993, offset 0, flags [DF], proto UDP (17), length 69)
10.42.2.87.34314 > 10.42.0.16.domain: [udp sum ok] 37116+ A? test.example.cn. (41)
14:21:50.321951 IP (tos 0x0, ttl 63, id 10994, offset 0, flags [DF], proto UDP (17), length 69)
10.42.2.87.34314 > 10.42.0.16.domain: [udp sum ok] 37446+ AAAA? test.example.cn. (41)
14:21:50.368431 IP (tos 0x0, ttl 63, id 60552, offset 0, flags [DF], proto UDP (17), length 108)
10.42.0.16.domain > 10.42.2.87.34314: [bad udp cksum 0x17ec -> 0x968f!] 37116 q: A? test.example.cn. 1/0/0 test.example.cn. [30s] A 121.xx.xx.xx (80)
14:21:50.515406 IP (tos 0x0, ttl 63, id 60559, offset 0, flags [DF], proto UDP (17), length 152)
10.42.0.16.domain > 10.42.2.87.34314: [bad udp cksum 0x1818 -> 0xa75e!] 37446 q: AAAA? test.example.cn. 0/1/0 ns: mbgadev.cn. [30s] SOA vip3.alidns.com. hostmaster.hichina.com. 2020091420 3600 1200 86400 360 (124)
master服务器cni0抓包到的数据包:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
14:21:47.818281 IP (tos 0x0, ttl 62, id 10589, offset 0, flags [DF], proto UDP (17), length 95)
10.42.2.87.48303 > 10.42.0.16.domain: [udp sum ok] 63998+ A? test.example.cn.default.svc.cluster.local. (67)
14:21:47.818294 IP (tos 0x0, ttl 62, id 10590, offset 0, flags [DF], proto UDP (17), length 95)
10.42.2.87.48303 > 10.42.0.16.domain: [udp sum ok] 64529+ AAAA? test.example.cn.default.svc.cluster.local. (67)
14:21:47.818491 IP (tos 0x0, ttl 64, id 60151, offset 0, flags [DF], proto UDP (17), length 188)
10.42.0.16.domain > 10.42.2.87.48303: [bad udp cksum 0x183c -> 0x239b!] 64529 NXDomain*- q: AAAA? test.example.cn.default.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (160)
14:21:47.818598 IP (tos 0x0, ttl 64, id 60152, offset 0, flags [DF], proto UDP (17), length 188)
10.42.0.16.domain > 10.42.2.87.48303: [bad udp cksum 0x183c -> 0x40ae!] 63998 NXDomain*- q: A? test.example.cn.default.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (160)
14:21:47.818860 IP (tos 0x0, ttl 62, id 10591, offset 0, flags [DF], proto UDP (17), length 87)
10.42.2.87.49709 > 10.42.0.16.domain: [udp sum ok] 804+ A? test.example.cn.svc.cluster.local. (59)
14:21:47.818878 IP (tos 0x0, ttl 62, id 10592, offset 0, flags [DF], proto UDP (17), length 87)
10.42.2.87.49709 > 10.42.0.16.domain: [udp sum ok] 1395+ AAAA? test.example.cn.svc.cluster.local. (59)
14:21:47.819019 IP (tos 0x0, ttl 64, id 60153, offset 0, flags [DF], proto UDP (17), length 180)
10.42.0.16.domain > 10.42.2.87.49709: [bad udp cksum 0x1834 -> 0x4f80!] 1395 NXDomain*- q: AAAA? test.example.cn.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (152)
14:21:47.819099 IP (tos 0x0, ttl 64, id 60154, offset 0, flags [DF], proto UDP (17), length 180)
10.42.0.16.domain > 10.42.2.87.49709: [bad udp cksum 0x1834 -> 0x6ccf!] 804 NXDomain*- q: A? test.example.cn.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (152)
14:21:47.819326 IP (tos 0x0, ttl 62, id 10593, offset 0, flags [DF], proto UDP (17), length 83)
10.42.2.87.35181 > 10.42.0.16.domain: [udp sum ok] 25419+ A? test.example.cn.cluster.local. (55)
14:21:47.819433 IP (tos 0x0, ttl 64, id 60155, offset 0, flags [DF], proto UDP (17), length 176)
10.42.0.16.domain > 10.42.2.87.35181: [bad udp cksum 0x1830 -> 0xbf46!] 25419 NXDomain*- q: A? test.example.cn.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (148)
14:21:50.321340 IP (tos 0x0, ttl 62, id 10992, offset 0, flags [DF], proto UDP (17), length 83)
10.42.2.87.35181 > 10.42.0.16.domain: [udp sum ok] 25789+ AAAA? test.example.cn.cluster.local. (55)
14:21:50.321585 IP (tos 0x0, ttl 64, id 60543, offset 0, flags [DF], proto UDP (17), length 176)
10.42.0.16.domain > 10.42.2.87.35181: [bad udp cksum 0x1830 -> 0xa2d4!] 25789 NXDomain*- q: AAAA? test.example.cn.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (148)
14:21:50.321947 IP (tos 0x0, ttl 62, id 10993, offset 0, flags [DF], proto UDP (17), length 69)
10.42.2.87.34314 > 10.42.0.16.domain: [udp sum ok] 37116+ A? test.example.cn. (41)
14:21:50.321954 IP (tos 0x0, ttl 62, id 10994, offset 0, flags [DF], proto UDP (17), length 69)
10.42.2.87.34314 > 10.42.0.16.domain: [udp sum ok] 37446+ AAAA? test.example.cn. (41)
14:21:50.322158 IP (tos 0x0, ttl 64, id 11649, offset 0, flags [DF], proto UDP (17), length 80)
10.42.0.16.40309 > 183.60.83.19.domain: [bad udp cksum 0x153b -> 0xbeb2!] 37116+ [1au] A? test.example.cn. ar: . OPT UDPsize=2048 DO (52)
14:21:50.322314 IP (tos 0x0, ttl 64, id 17443, offset 0, flags [DF], proto UDP (17), length 80)
10.42.0.16.42824 > 183.60.82.98.domain: [bad udp cksum 0x148a -> 0x9946!] 37446+ [1au] AAAA? test.example.cn. ar: . OPT UDPsize=2048 DO (52)
14:21:50.368233 IP (tos 0x0, ttl 250, id 9965, offset 0, flags [DF], proto UDP (17), length 85)
183.60.83.19.domain > 10.42.0.16.40309: [udp sum ok] 37116 q: A? test.example.cn. 1/0/0 test.example.cn. [1m] A 121.xx.xx.xx (57)
14:21:50.368419 IP (tos 0x0, ttl 64, id 60552, offset 0, flags [DF], proto UDP (17), length 108)
10.42.0.16.domain > 10.42.2.87.34314: [bad udp cksum 0x17ec -> 0x968f!] 37116 q: A? test.example.cn. 1/0/0 test.example.cn. [30s] A 121.xx.xx.xx (80)
14:21:50.515262 IP (tos 0x0, ttl 58, id 25193, offset 0, flags [none], proto UDP (17), length 150)
183.60.82.98.domain > 10.42.0.16.42824: [udp sum ok] 37446 q: AAAA? test.example.cn. 0/1/1 ns: mbgadev.cn. [6m] SOA vip3.alidns.com. hostmaster.hichina.com. 2020091420 3600 1200 86400 360 ar: . OPT UDPsize=4096 DO (122)
14:21:50.515398 IP (tos 0x0, ttl 64, id 60559, offset 0, flags [DF], proto UDP (17), length 152)
10.42.0.16.domain > 10.42.2.87.34314: [bad udp cksum 0x1818 -> 0xa75e!] 37446 q: AAAA? test.example.cn. 0/1/0 ns: mbgadev.cn. [30s] SOA vip3.alidns.com. hostmaster.hichina.com. 2020091420 3600 1200 86400 360 (124)
14:21:52.639212 IP (tos 0x0, ttl 62, id 11543, offset 0, flags [DF], proto UDP (17), length 111)
由以上抓包信息可以看到,在master服务器cni0抓包到的数据包,少了一个:
1
2
14:21:47.819323 IP (tos 0x0, ttl 63, id 10594, offset 0, flags [DF], proto UDP (17), length 83)
10.42.2.87.35181 > 10.42.0.16.domain: [udp sum ok] 25789+ AAAA? test.example.cn.cluster.local. (55)
由于没有这个数据包,也就是coreDNS
没有收到这个dns请求,所以没有返回,导致了2.5秒后业务pod
重发了一次dns请求:
1
2
14:21:50.321954 IP (tos 0x0, ttl 62, id 10994, offset 0, flags [DF], proto UDP (17), length 69)
10.42.2.87.34314 > 10.42.0.16.domain: [udp sum ok] 37446+ AAAA? test.example.cn. (41)
这里有一个小知识,/rec/resolv.conf
不设置timeout时,默认是5秒。但是自从IPV6以后,一次dns请求会是一个A请求加一个AAAA请求,所以每个请求的timeout是2.5秒。
0x05、contrack insert fail情况:
master服务器:
1
2
3
liuxu@master:~$ sudo conntrack -S
cpu=0 found=0 invalid=1333 ignore=1963913 insert=0 insert_failed=17478 drop=17478 early_drop=0 error=2 search_restart=27053
cpu=1 found=0 invalid=615 ignore=1912454 insert=0 insert_failed=41030 drop=41030 early_drop=0 error=1 search_restart=14663
agent服务器:
1
2
3
4
5
liuxu@master:~$ sudo conntrack -S
cpu=0 found=304 invalid=136 ignore=145233 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=3208
cpu=1 found=269 invalid=115 ignore=172201 insert=0 insert_failed=0 drop=0 early_drop=0 error=1 search_restart=3267
cpu=2 found=300 invalid=140 ignore=160182 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=3134
cpu=3 found=281 invalid=143 ignore=167805 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=3263
可以看到主服务有大量insert_failed和drop的数据包。
0x06、一个值得注意的情况:
如果我将业务pod
放到coreDNS
的master服务器上,则不会有这个问题。从以上原理可知,应该为如果业务pod
放到了主服务,则pod会在cni0(10.42.0.1/24)下,和coreDNS
为同一网段,不需要DNAT即可访问dns服务。
四、解决方案
根据文档:
musl libc.so和glibc.so的差异:https://wiki.musl-libc.org/functional-differences-from-glibc.html
云服务商容器团队遇到此问题说明:https://tencentcloudcontainerteam.github.io/2018/10/26/DNS-5-seconds-delay/
weave对此问题的研究和对linux内核的补丁:https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts
k8s官网给出的解决方案:https://kubernetes.io/zh/docs/tasks/administer-cluster/nodelocaldns/
有以下解决方案:
a. 由于业务pod
基于alpine linux,所以给容器内添加额外的nameserver 223.5.5.5,让libc.so并发向coreDNS和阿里云dns做并发请求,这样即使master服务器丢包,agent服务器和阿里云的dns也不一定丢包。
b. 升级主服务器内核,由于master服务器是debian9(内核4.9),而weave对内核的补丁合并到了4.19,所以升级到debian10(内核4.19)即可减缓此bug的情况。
c. 为每台服务器或pod加入dns缓存服务,这样可以避免每次都到主服器的coreDNS查询。(值得一提的是,腾讯云的TKE可以直接安装nodelocaldns插件)
最终选用方案b解决此问题。
master服务器:
1
2
3
liuxu@master:~$ sudo conntrack -S
cpu=0 found=0 invalid=155 ignore=156433 insert=0 insert_failed=13241 drop=0 early_drop=0 error=3 search_restart=28343
cpu=1 found=0 invalid=53 ignore=182256 insert=0 insert_failed=23420 drop=0 early_drop=0 error=5 search_restart=15742
可以看到,并没有再drop数据包。