首页 记一次k3s的dns问题调查解决过程
文章
取消

记一次k3s的dns问题调查解决过程

一、简述

在开发一个laravel/php项目过程中,使用了一些第三方sdk,它会做http请求。但是请求特别慢,大概在2.5s、5秒多,甚至超时,于是有了这次的debug过程。

二、服务器环境

主物理机:debian9

内核:4.9.0-13-amd64 #1 SMP Debian 4.9.228-1 (2020-07-05) x86_64 GNU/Linux

k3s master:v1.20.5+k3s1 (355fff30)

三、bug原理

0x00、k3s环境网络拓扑图

img1

0x01、主物理机路由信息

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
[email protected]:~$ ip route
default via 10.158.3.1 dev eth0 onlink
10.42.0.0/24 dev cni0 proto kernel scope link src 10.42.0.1
10.42.2.0/24 via 10.42.2.0 dev flannel.1 onlink
10.158.3.0/24 dev eth0 proto kernel scope link src 10.158.3.24
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown

[email protected]:~$ ip -d link show flannel.1
4: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN mode DEFAULT group default
    link/ether 42:c0:6a:0f:fc:ad brd ff:ff:ff:ff:ff:ff promiscuity 0
    vxlan id 1 local 10.158.3.24 dev eth0 srcport 0 0 dstport 8472 nolearning ageing 300 udpcsum noudp6zerocsumtx noudp6zerocsumrx addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

[email protected]:~$ ip -d link show cni0
5: cni0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP mode DEFAULT group default qlen 1000
    link/ether a6:e6:d3:93:36:60 brd ff:ff:ff:ff:ff:ff promiscuity 0
    bridge forward_delay 1500 hello_time 200 max_age 2000 ageing_time 30000 stp_state 0 priority 32768 vlan_filtering 0 vlan_protocol 802.1Q bridge_id 8000.a6:e6:d3:93:36:60 designated_root 8000.a6:e6:d3:93:36:60 root_port 0 root_path_cost 0 topology_change 0 topology_change_detected 0 hello_timer    0.00 tcn_timer    0.00 topology_change_timer    0.00 gc_timer  246.38 vlan_default_pvid 1 group_fwd_mask 0 group_address 01:80:c2:00:00:00 mcast_snooping 1 mcast_router 1 mcast_query_use_ifaddr 0 mcast_querier 0 mcast_hash_elasticity 4 mcast_hash_max 512 mcast_last_member_count 2 mcast_startup_query_count 2 mcast_last_member_interval 100 mcast_membership_interval 26000 mcast_querier_interval 25500 mcast_query_interval 12500 mcast_query_response_interval 1000 mcast_startup_query_interval 3124 nf_call_iptables 0 nf_call_ip6tables 0 nf_call_arptables 0 addrgenmode eui64 numtxqueues 1 numrxqueues 1 gso_max_size 65536 gso_max_segs 65535

所以由上可知:

  1. overlay网络层看,coreDNS业务pod不是同一个网段,也就是分别在2台物理服务器上。coreDNS在master服务器上,业务pod在agent服务器上。
  2. pod中的容器通过veth桥接到cni0网桥,网络和flannel.1(vxlan)也桥接上,所以业务podcoreDNS是通过flannel.1通信。
  3. flannel.1通过eth0通信。

0x02、dns查询和libc.so

a. libc.so的问题。

在实际应用中,libc.so实际上有两种。一种是glibc.so,ubuntu、debian、centos这些系统使用。还有一种musl版本的libc.so,由alpine linux在使用。而业务pod基于apline linux打包的容器镜像。

它们之间实际上有很多微小的差异:https://wiki.musl-libc.org/functional-differences-from-glibc.html

与本次bug相关的有:

  1. glibc.so和musl libc.so查询dns时,都会并发的发送A和AAAA两个请求,其目的是为了兼容ipv4和ipv6。
  2. musl libc.so不支持single-request-reopen、single-request等选项,且此类选项是glibc.so 2.9、2.10才支持。
  3. 对于/etc/resolv.conf中的nameserver,如果有多条记录,glibc.so会从上往下按顺序使用。如果第一个nameserver无法访问,则再使用第二个nameserver。而musl libc.so则会同时读取多条nameserver建立多条连接并发dns请求,并使用最先收到的返回。
  4. php curl模块,或者curl命令都使用libcurl.so,而libcurl.so会使用libc.so,所以调试php的curl时,可直接使用curl命令代替。

b. /etc/resolv.conf说明。

1
2
3
4
[email protected]:~$ cat /etc/resolv.conf
nameserver 10.43.0.100
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5

这个文件意思是,如果要访问一个域名test.example.cn,会经过一下步骤:

  1. 通过10.43.0.100查询test.example.cn.default.svc.cluster.local的A和AAAA记录,timeout默认为5秒,实际上是A和AAAA的timeout各2.5秒。
  2. 通过10.43.0.100查询test.example.cn.svc.cluster.local的A和AAAA记录。
  3. 通过10.43.0.100查询test.example.cn.cluster.local的A和AAAA记录
  4. 通过10.43.0.100查询test.example.cn的A和AAAA记录,此时coreDNS读取宿主机/etc/resolv.conf,根据宿主机的nameserver转发请求并返回。

0x03、关键信息抓包定位点

业务pod查询test.example.cn的DNS时,数据包流通路径:

  1. 业务pod(10.42.2.87/24)从pod中向coreDNS service(10.43.0.100/16)发送A/AAAA请求。
  2. iptables通过dnat更换dst ip 10.43.0.100到10.42.0.16(coreDNS pod ip)。
  3. 数据进入agent服务器cni0,agent服务器cni0将请求转发给从机flannel.1。
  4. agent服务器flannel.1将请求打包为vxlan包,交给agent服务器eth0(10.158.3.35/24),agent服务器eth0通过云服务商网络,将数据发送给master服务器eth0(10.158.3.24/24)。
  5. master服务器eth0收到是vxlan数据包(flannle.1 8472端口),将数据包交给master服务器flannel.1。
  6. master服务器flannel.1将数据包解开,得到dns请求数据包,通过目的地址为10.42.0.0/24和route,所以将数据包交给master服务器cni0。(此处丢包,本次bug原因)
  7. master服务器cni0通过veth将请求交给coreDNS pod。
  8. coreDNS解析dns请求后按原路返回数据包。

0x04、抓包信息

agent服务器cni0抓包到的数据包:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
14:21:47.812950 IP (tos 0x0, ttl 64, id 10589, offset 0, flags [DF], proto UDP (17), length 95)
    10.42.2.87.48303 > 10.42.0.16.domain: [bad udp cksum 0x17df -> 0xe467!] 63998+ A? test.example.cn.default.svc.cluster.local. (67)
14:21:47.813014 IP (tos 0x0, ttl 64, id 10590, offset 0, flags [DF], proto UDP (17), length 95)
    10.42.2.87.48303 > 10.42.0.16.domain: [bad udp cksum 0x17df -> 0xc754!] 64529+ AAAA? test.example.cn.default.svc.cluster.local. (67)
14:21:47.813477 IP (tos 0x0, ttl 62, id 60151, offset 0, flags [DF], proto UDP (17), length 188)
    10.43.0.100.domain > 10.42.2.87.48303: [udp sum ok] 64529 NXDomain*- q: AAAA? test.example.cn.default.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (160)
14:21:47.813525 IP (tos 0x0, ttl 62, id 60152, offset 0, flags [DF], proto UDP (17), length 188)
    10.43.0.100.domain > 10.42.2.87.48303: [udp sum ok] 63998 NXDomain*- q: A? test.example.cn.default.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (160)
14:21:47.813578 IP (tos 0x0, ttl 64, id 10591, offset 0, flags [DF], proto UDP (17), length 87)
    10.42.2.87.49709 > 10.42.0.16.domain: [bad udp cksum 0x17d7 -> 0x1089!] 804+ A? test.example.cn.svc.cluster.local. (59)
14:21:47.813624 IP (tos 0x0, ttl 64, id 10592, offset 0, flags [DF], proto UDP (17), length 87)
    10.42.2.87.49709 > 10.42.0.16.domain: [bad udp cksum 0x17d7 -> 0xf339!] 1395+ AAAA? test.example.cn.svc.cluster.local. (59)
14:21:47.813971 IP (tos 0x0, ttl 62, id 60153, offset 0, flags [DF], proto UDP (17), length 180)
    10.43.0.100.domain > 10.42.2.87.49709: [udp sum ok] 1395 NXDomain*- q: AAAA? test.example.cn.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (152)
14:21:47.814008 IP (tos 0x0, ttl 62, id 60154, offset 0, flags [DF], proto UDP (17), length 180)
    10.43.0.100.domain > 10.42.2.87.49709: [udp sum ok] 804 NXDomain*- q: A? test.example.cn.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (152)
14:21:47.814064 IP (tos 0x0, ttl 64, id 10593, offset 0, flags [DF], proto UDP (17), length 83)
    10.42.2.87.35181 > 10.42.0.16.domain: [bad udp cksum 0x17d3 -> 0x6300!] 25419+ A? test.example.cn.cluster.local. (55)
14:21:47.814096 IP (tos 0x0, ttl 64, id 10594, offset 0, flags [DF], proto UDP (17), length 83)
    10.42.2.87.35181 > 10.42.0.16.domain: [bad udp cksum 0x17d3 -> 0x468e!] 25789+ AAAA? test.example.cn.cluster.local. (55)
14:21:47.814367 IP (tos 0x0, ttl 62, id 60155, offset 0, flags [DF], proto UDP (17), length 176)
    10.43.0.100.domain > 10.42.2.87.35181: [udp sum ok] 25419 NXDomain*- q: A? test.example.cn.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (148)
14:21:50.316083 IP (tos 0x0, ttl 64, id 10992, offset 0, flags [DF], proto UDP (17), length 83)
    10.42.2.87.35181 > 10.42.0.16.domain: [bad udp cksum 0x17d3 -> 0x468e!] 25789+ AAAA? test.example.cn.cluster.local. (55)
14:21:50.316573 IP (tos 0x0, ttl 62, id 60543, offset 0, flags [DF], proto UDP (17), length 176)
    10.43.0.100.domain > 10.42.2.87.35181: [udp sum ok] 25789 NXDomain*- q: AAAA? test.example.cn.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (148)
14:21:50.316672 IP (tos 0x0, ttl 64, id 10993, offset 0, flags [DF], proto UDP (17), length 69)
    10.42.2.87.34314 > 10.42.0.16.domain: [bad udp cksum 0x17c5 -> 0x5bc9!] 37116+ A? test.example.cn. (41)
14:21:50.316717 IP (tos 0x0, ttl 64, id 10994, offset 0, flags [DF], proto UDP (17), length 69)
    10.42.2.87.34314 > 10.42.0.16.domain: [bad udp cksum 0x17c5 -> 0x3f7f!] 37446+ AAAA? test.example.cn. (41)
14:21:50.363460 IP (tos 0x0, ttl 62, id 60552, offset 0, flags [DF], proto UDP (17), length 108)
    10.43.0.100.domain > 10.42.2.87.34314: [udp sum ok] 37116 q: A? test.example.cn. 1/0/0 test.example.cn. [30s] A 121.xx.xx.xx (80)
14:21:50.510398 IP (tos 0x0, ttl 62, id 60559, offset 0, flags [DF], proto UDP (17), length 152)
    10.43.0.100.domain > 10.42.2.87.34314: [udp sum ok] 37446 q: AAAA? test.example.cn. 0/1/0 ns: mbgadev.cn. [30s] SOA vip3.alidns.com. hostmaster.hichina.com. 2020091420 3600 1200 86400 360 (124)

agent服务器flannel.1抓包到的数据包:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
14:21:47.812974 IP (tos 0x0, ttl 63, id 10589, offset 0, flags [DF], proto UDP (17), length 95)
    10.42.2.87.48303 > 10.42.0.16.domain: [bad udp cksum 0x17df -> 0xe467!] 63998+ A? test.example.cn.default.svc.cluster.local. (67)
14:21:47.813018 IP (tos 0x0, ttl 63, id 10590, offset 0, flags [DF], proto UDP (17), length 95)
    10.42.2.87.48303 > 10.42.0.16.domain: [bad udp cksum 0x17df -> 0xc754!] 64529+ AAAA? test.example.cn.default.svc.cluster.local. (67)
14:21:47.813458 IP (tos 0x0, ttl 63, id 60151, offset 0, flags [DF], proto UDP (17), length 188)
    10.42.0.16.domain > 10.42.2.87.48303: [udp sum ok] 64529 NXDomain*- q: AAAA? test.example.cn.default.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (160)
14:21:47.813512 IP (tos 0x0, ttl 63, id 60152, offset 0, flags [DF], proto UDP (17), length 188)
    10.42.0.16.domain > 10.42.2.87.48303: [udp sum ok] 63998 NXDomain*- q: A? test.example.cn.default.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (160)
14:21:47.813595 IP (tos 0x0, ttl 63, id 10591, offset 0, flags [DF], proto UDP (17), length 87)
    10.42.2.87.49709 > 10.42.0.16.domain: [bad udp cksum 0x17d7 -> 0x1089!] 804+ A? test.example.cn.svc.cluster.local. (59)
14:21:47.813635 IP (tos 0x0, ttl 63, id 10592, offset 0, flags [DF], proto UDP (17), length 87)
    10.42.2.87.49709 > 10.42.0.16.domain: [bad udp cksum 0x17d7 -> 0xf339!] 1395+ AAAA? test.example.cn.svc.cluster.local. (59)
14:21:47.813962 IP (tos 0x0, ttl 63, id 60153, offset 0, flags [DF], proto UDP (17), length 180)
    10.42.0.16.domain > 10.42.2.87.49709: [udp sum ok] 1395 NXDomain*- q: AAAA? test.example.cn.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (152)
14:21:47.814003 IP (tos 0x0, ttl 63, id 60154, offset 0, flags [DF], proto UDP (17), length 180)
    10.42.0.16.domain > 10.42.2.87.49709: [udp sum ok] 804 NXDomain*- q: A? test.example.cn.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (152)
14:21:47.814077 IP (tos 0x0, ttl 63, id 10593, offset 0, flags [DF], proto UDP (17), length 83)
    10.42.2.87.35181 > 10.42.0.16.domain: [bad udp cksum 0x17d3 -> 0x6300!] 25419+ A? test.example.cn.cluster.local. (55)
14:21:47.814100 IP (tos 0x0, ttl 63, id 10594, offset 0, flags [DF], proto UDP (17), length 83)
    10.42.2.87.35181 > 10.42.0.16.domain: [bad udp cksum 0x17d3 -> 0x468e!] 25789+ AAAA? test.example.cn.cluster.local. (55)
14:21:47.814358 IP (tos 0x0, ttl 63, id 60155, offset 0, flags [DF], proto UDP (17), length 176)
    10.42.0.16.domain > 10.42.2.87.35181: [udp sum ok] 25419 NXDomain*- q: A? test.example.cn.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (148)
14:21:50.316100 IP (tos 0x0, ttl 63, id 10992, offset 0, flags [DF], proto UDP (17), length 83)
    10.42.2.87.35181 > 10.42.0.16.domain: [bad udp cksum 0x17d3 -> 0x468e!] 25789+ AAAA? test.example.cn.cluster.local. (55)
14:21:50.316552 IP (tos 0x0, ttl 63, id 60543, offset 0, flags [DF], proto UDP (17), length 176)
    10.42.0.16.domain > 10.42.2.87.35181: [udp sum ok] 25789 NXDomain*- q: AAAA? test.example.cn.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (148)
14:21:50.316692 IP (tos 0x0, ttl 63, id 10993, offset 0, flags [DF], proto UDP (17), length 69)
    10.42.2.87.34314 > 10.42.0.16.domain: [bad udp cksum 0x17c5 -> 0x5bc9!] 37116+ A? test.example.cn. (41)
14:21:50.316720 IP (tos 0x0, ttl 63, id 10994, offset 0, flags [DF], proto UDP (17), length 69)
    10.42.2.87.34314 > 10.42.0.16.domain: [bad udp cksum 0x17c5 -> 0x3f7f!] 37446+ AAAA? test.example.cn. (41)
14:21:50.363441 IP (tos 0x0, ttl 63, id 60552, offset 0, flags [DF], proto UDP (17), length 108)
    10.42.0.16.domain > 10.42.2.87.34314: [udp sum ok] 37116 q: A? test.example.cn. 1/0/0 test.example.cn. [30s] A 121.xx.xx.xx (80)
14:21:50.510381 IP (tos 0x0, ttl 63, id 60559, offset 0, flags [DF], proto UDP (17), length 152)
    10.42.0.16.domain > 10.42.2.87.34314: [udp sum ok] 37446 q: AAAA? test.example.cn. 0/1/0 ns: mbgadev.cn. [30s] SOA vip3.alidns.com. hostmaster.hichina.com. 2020091420 3600 1200 86400 360 (124)
14:21:52.633975 IP (tos 0x0, ttl 63, id 11543, offset 0, flags [DF], proto UDP (17), length 111)

master服务器flannel.1抓包到的数据包:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
14:21:47.818265 IP (tos 0x0, ttl 63, id 10589, offset 0, flags [DF], proto UDP (17), length 95)
    10.42.2.87.48303 > 10.42.0.16.domain: [udp sum ok] 63998+ A? test.example.cn.default.svc.cluster.local. (67)
14:21:47.818291 IP (tos 0x0, ttl 63, id 10590, offset 0, flags [DF], proto UDP (17), length 95)
    10.42.2.87.48303 > 10.42.0.16.domain: [udp sum ok] 64529+ AAAA? test.example.cn.default.svc.cluster.local. (67)
14:21:47.818509 IP (tos 0x0, ttl 63, id 60151, offset 0, flags [DF], proto UDP (17), length 188)
    10.42.0.16.domain > 10.42.2.87.48303: [bad udp cksum 0x183c -> 0x239b!] 64529 NXDomain*- q: AAAA? test.example.cn.default.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (160)
14:21:47.818606 IP (tos 0x0, ttl 63, id 60152, offset 0, flags [DF], proto UDP (17), length 188)
    10.42.0.16.domain > 10.42.2.87.48303: [bad udp cksum 0x183c -> 0x40ae!] 63998 NXDomain*- q: A? test.example.cn.default.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (160)
14:21:47.818847 IP (tos 0x0, ttl 63, id 10591, offset 0, flags [DF], proto UDP (17), length 87)
    10.42.2.87.49709 > 10.42.0.16.domain: [udp sum ok] 804+ A? test.example.cn.svc.cluster.local. (59)
14:21:47.818872 IP (tos 0x0, ttl 63, id 10592, offset 0, flags [DF], proto UDP (17), length 87)
    10.42.2.87.49709 > 10.42.0.16.domain: [udp sum ok] 1395+ AAAA? test.example.cn.svc.cluster.local. (59)
14:21:47.819023 IP (tos 0x0, ttl 63, id 60153, offset 0, flags [DF], proto UDP (17), length 180)
    10.42.0.16.domain > 10.42.2.87.49709: [bad udp cksum 0x1834 -> 0x4f80!] 1395 NXDomain*- q: AAAA? test.example.cn.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (152)
14:21:47.819102 IP (tos 0x0, ttl 63, id 60154, offset 0, flags [DF], proto UDP (17), length 180)
    10.42.0.16.domain > 10.42.2.87.49709: [bad udp cksum 0x1834 -> 0x6ccf!] 804 NXDomain*- q: A? test.example.cn.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (152)
14:21:47.819316 IP (tos 0x0, ttl 63, id 10593, offset 0, flags [DF], proto UDP (17), length 83)
    10.42.2.87.35181 > 10.42.0.16.domain: [udp sum ok] 25419+ A? test.example.cn.cluster.local. (55)
14:21:47.819323 IP (tos 0x0, ttl 63, id 10594, offset 0, flags [DF], proto UDP (17), length 83)
    10.42.2.87.35181 > 10.42.0.16.domain: [udp sum ok] 25789+ AAAA? test.example.cn.cluster.local. (55)
14:21:47.819436 IP (tos 0x0, ttl 63, id 60155, offset 0, flags [DF], proto UDP (17), length 176)
    10.42.0.16.domain > 10.42.2.87.35181: [bad udp cksum 0x1830 -> 0xbf46!] 25419 NXDomain*- q: A? test.example.cn.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (148)
14:21:50.321330 IP (tos 0x0, ttl 63, id 10992, offset 0, flags [DF], proto UDP (17), length 83)
    10.42.2.87.35181 > 10.42.0.16.domain: [udp sum ok] 25789+ AAAA? test.example.cn.cluster.local. (55)
14:21:50.321598 IP (tos 0x0, ttl 63, id 60543, offset 0, flags [DF], proto UDP (17), length 176)
    10.42.0.16.domain > 10.42.2.87.35181: [bad udp cksum 0x1830 -> 0xa2d4!] 25789 NXDomain*- q: AAAA? test.example.cn.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (148)
14:21:50.321934 IP (tos 0x0, ttl 63, id 10993, offset 0, flags [DF], proto UDP (17), length 69)
    10.42.2.87.34314 > 10.42.0.16.domain: [udp sum ok] 37116+ A? test.example.cn. (41)
14:21:50.321951 IP (tos 0x0, ttl 63, id 10994, offset 0, flags [DF], proto UDP (17), length 69)
    10.42.2.87.34314 > 10.42.0.16.domain: [udp sum ok] 37446+ AAAA? test.example.cn. (41)
14:21:50.368431 IP (tos 0x0, ttl 63, id 60552, offset 0, flags [DF], proto UDP (17), length 108)
    10.42.0.16.domain > 10.42.2.87.34314: [bad udp cksum 0x17ec -> 0x968f!] 37116 q: A? test.example.cn. 1/0/0 test.example.cn. [30s] A 121.xx.xx.xx (80)
14:21:50.515406 IP (tos 0x0, ttl 63, id 60559, offset 0, flags [DF], proto UDP (17), length 152)
    10.42.0.16.domain > 10.42.2.87.34314: [bad udp cksum 0x1818 -> 0xa75e!] 37446 q: AAAA? test.example.cn. 0/1/0 ns: mbgadev.cn. [30s] SOA vip3.alidns.com. hostmaster.hichina.com. 2020091420 3600 1200 86400 360 (124)

master服务器cni0抓包到的数据包:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
14:21:47.818281 IP (tos 0x0, ttl 62, id 10589, offset 0, flags [DF], proto UDP (17), length 95)
    10.42.2.87.48303 > 10.42.0.16.domain: [udp sum ok] 63998+ A? test.example.cn.default.svc.cluster.local. (67)
14:21:47.818294 IP (tos 0x0, ttl 62, id 10590, offset 0, flags [DF], proto UDP (17), length 95)
    10.42.2.87.48303 > 10.42.0.16.domain: [udp sum ok] 64529+ AAAA? test.example.cn.default.svc.cluster.local. (67)
14:21:47.818491 IP (tos 0x0, ttl 64, id 60151, offset 0, flags [DF], proto UDP (17), length 188)
    10.42.0.16.domain > 10.42.2.87.48303: [bad udp cksum 0x183c -> 0x239b!] 64529 NXDomain*- q: AAAA? test.example.cn.default.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (160)
14:21:47.818598 IP (tos 0x0, ttl 64, id 60152, offset 0, flags [DF], proto UDP (17), length 188)
    10.42.0.16.domain > 10.42.2.87.48303: [bad udp cksum 0x183c -> 0x40ae!] 63998 NXDomain*- q: A? test.example.cn.default.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (160)
14:21:47.818860 IP (tos 0x0, ttl 62, id 10591, offset 0, flags [DF], proto UDP (17), length 87)
    10.42.2.87.49709 > 10.42.0.16.domain: [udp sum ok] 804+ A? test.example.cn.svc.cluster.local. (59)
14:21:47.818878 IP (tos 0x0, ttl 62, id 10592, offset 0, flags [DF], proto UDP (17), length 87)
    10.42.2.87.49709 > 10.42.0.16.domain: [udp sum ok] 1395+ AAAA? test.example.cn.svc.cluster.local. (59)
14:21:47.819019 IP (tos 0x0, ttl 64, id 60153, offset 0, flags [DF], proto UDP (17), length 180)
    10.42.0.16.domain > 10.42.2.87.49709: [bad udp cksum 0x1834 -> 0x4f80!] 1395 NXDomain*- q: AAAA? test.example.cn.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (152)
14:21:47.819099 IP (tos 0x0, ttl 64, id 60154, offset 0, flags [DF], proto UDP (17), length 180)
    10.42.0.16.domain > 10.42.2.87.49709: [bad udp cksum 0x1834 -> 0x6ccf!] 804 NXDomain*- q: A? test.example.cn.svc.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (152)
14:21:47.819326 IP (tos 0x0, ttl 62, id 10593, offset 0, flags [DF], proto UDP (17), length 83)
    10.42.2.87.35181 > 10.42.0.16.domain: [udp sum ok] 25419+ A? test.example.cn.cluster.local. (55)
14:21:47.819433 IP (tos 0x0, ttl 64, id 60155, offset 0, flags [DF], proto UDP (17), length 176)
    10.42.0.16.domain > 10.42.2.87.35181: [bad udp cksum 0x1830 -> 0xbf46!] 25419 NXDomain*- q: A? test.example.cn.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (148)
14:21:50.321340 IP (tos 0x0, ttl 62, id 10992, offset 0, flags [DF], proto UDP (17), length 83)
    10.42.2.87.35181 > 10.42.0.16.domain: [udp sum ok] 25789+ AAAA? test.example.cn.cluster.local. (55)
14:21:50.321585 IP (tos 0x0, ttl 64, id 60543, offset 0, flags [DF], proto UDP (17), length 176)
    10.42.0.16.domain > 10.42.2.87.35181: [bad udp cksum 0x1830 -> 0xa2d4!] 25789 NXDomain*- q: AAAA? test.example.cn.cluster.local. 0/1/0 ns: cluster.local. [5s] SOA ns.dns.cluster.local. hostmaster.cluster.local. 1641665463 7200 1800 86400 5 (148)
14:21:50.321947 IP (tos 0x0, ttl 62, id 10993, offset 0, flags [DF], proto UDP (17), length 69)
    10.42.2.87.34314 > 10.42.0.16.domain: [udp sum ok] 37116+ A? test.example.cn. (41)
14:21:50.321954 IP (tos 0x0, ttl 62, id 10994, offset 0, flags [DF], proto UDP (17), length 69)
    10.42.2.87.34314 > 10.42.0.16.domain: [udp sum ok] 37446+ AAAA? test.example.cn. (41)
14:21:50.322158 IP (tos 0x0, ttl 64, id 11649, offset 0, flags [DF], proto UDP (17), length 80)
    10.42.0.16.40309 > 183.60.83.19.domain: [bad udp cksum 0x153b -> 0xbeb2!] 37116+ [1au] A? test.example.cn. ar: . OPT UDPsize=2048 DO (52)
14:21:50.322314 IP (tos 0x0, ttl 64, id 17443, offset 0, flags [DF], proto UDP (17), length 80)
    10.42.0.16.42824 > 183.60.82.98.domain: [bad udp cksum 0x148a -> 0x9946!] 37446+ [1au] AAAA? test.example.cn. ar: . OPT UDPsize=2048 DO (52)
14:21:50.368233 IP (tos 0x0, ttl 250, id 9965, offset 0, flags [DF], proto UDP (17), length 85)
    183.60.83.19.domain > 10.42.0.16.40309: [udp sum ok] 37116 q: A? test.example.cn. 1/0/0 test.example.cn. [1m] A 121.xx.xx.xx (57)
14:21:50.368419 IP (tos 0x0, ttl 64, id 60552, offset 0, flags [DF], proto UDP (17), length 108)
    10.42.0.16.domain > 10.42.2.87.34314: [bad udp cksum 0x17ec -> 0x968f!] 37116 q: A? test.example.cn. 1/0/0 test.example.cn. [30s] A 121.xx.xx.xx (80)
14:21:50.515262 IP (tos 0x0, ttl 58, id 25193, offset 0, flags [none], proto UDP (17), length 150)
    183.60.82.98.domain > 10.42.0.16.42824: [udp sum ok] 37446 q: AAAA? test.example.cn. 0/1/1 ns: mbgadev.cn. [6m] SOA vip3.alidns.com. hostmaster.hichina.com. 2020091420 3600 1200 86400 360 ar: . OPT UDPsize=4096 DO (122)
14:21:50.515398 IP (tos 0x0, ttl 64, id 60559, offset 0, flags [DF], proto UDP (17), length 152)
    10.42.0.16.domain > 10.42.2.87.34314: [bad udp cksum 0x1818 -> 0xa75e!] 37446 q: AAAA? test.example.cn. 0/1/0 ns: mbgadev.cn. [30s] SOA vip3.alidns.com. hostmaster.hichina.com. 2020091420 3600 1200 86400 360 (124)
14:21:52.639212 IP (tos 0x0, ttl 62, id 11543, offset 0, flags [DF], proto UDP (17), length 111)

由以上抓包信息可以看到,在master服务器cni0抓包到的数据包,少了一个:

1
2
14:21:47.819323 IP (tos 0x0, ttl 63, id 10594, offset 0, flags [DF], proto UDP (17), length 83)
    10.42.2.87.35181 > 10.42.0.16.domain: [udp sum ok] 25789+ AAAA? test.example.cn.cluster.local. (55)

由于没有这个数据包,也就是coreDNS没有收到这个dns请求,所以没有返回,导致了2.5秒后业务pod重发了一次dns请求:

1
2
14:21:50.321954 IP (tos 0x0, ttl 62, id 10994, offset 0, flags [DF], proto UDP (17), length 69)
    10.42.2.87.34314 > 10.42.0.16.domain: [udp sum ok] 37446+ AAAA? test.example.cn. (41)

这里有一个小知识,/rec/resolv.conf不设置timeout时,默认是5秒。但是自从IPV6以后,一次dns请求会是一个A请求加一个AAAA请求,所以每个请求的timeout是2.5秒。

0x05、contrack insert fail情况:

master服务器:

1
2
3
[email protected]:~$ sudo conntrack -S
cpu=0           found=0 invalid=1333 ignore=1963913 insert=0 insert_failed=17478 drop=17478 early_drop=0 error=2 search_restart=27053
cpu=1           found=0 invalid=615 ignore=1912454 insert=0 insert_failed=41030 drop=41030 early_drop=0 error=1 search_restart=14663

agent服务器:

1
2
3
4
5
[email protected]:~$ sudo conntrack -S
cpu=0           found=304 invalid=136 ignore=145233 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=3208
cpu=1           found=269 invalid=115 ignore=172201 insert=0 insert_failed=0 drop=0 early_drop=0 error=1 search_restart=3267
cpu=2           found=300 invalid=140 ignore=160182 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=3134
cpu=3           found=281 invalid=143 ignore=167805 insert=0 insert_failed=0 drop=0 early_drop=0 error=0 search_restart=3263

可以看到主服务有大量insert_failed和drop的数据包。

0x06、一个值得注意的情况:

如果我将业务pod放到coreDNS的master服务器上,则不会有这个问题。从以上原理可知,应该为如果业务pod放到了主服务,则pod会在cni0(10.42.0.1/24)下,和coreDNS为同一网段,不需要DNAT即可访问dns服务。

四、解决方案

根据文档:

musl libc.so和glibc.so的差异:https://wiki.musl-libc.org/functional-differences-from-glibc.html

云服务商容器团队遇到此问题说明:https://tencentcloudcontainerteam.github.io/2018/10/26/DNS-5-seconds-delay/

weave对此问题的研究和对linux内核的补丁:https://www.weave.works/blog/racy-conntrack-and-dns-lookup-timeouts

k8s官网给出的解决方案:https://kubernetes.io/zh/docs/tasks/administer-cluster/nodelocaldns/

有以下解决方案:

a. 由于业务pod基于alpine linux,所以给容器内添加额外的nameserver 223.5.5.5,让libc.so并发向coreDNS和阿里云dns做并发请求,这样即使master服务器丢包,agent服务器和阿里云的dns也不一定丢包。

b. 升级主服务器内核,由于master服务器是debian9(内核4.9),而weave对内核的补丁合并到了4.19,所以升级到debian10(内核4.19)即可减缓此bug的情况。

c. 为每台服务器或pod加入dns缓存服务,这样可以避免每次都到主服器的coreDNS查询。(值得一提的是,腾讯云的TKE可以直接安装nodelocaldns插件)

最终选用方案b解决此问题。

master服务器:

1
2
3
[email protected]:~$ sudo conntrack -S
cpu=0           found=0 invalid=155 ignore=156433 insert=0 insert_failed=13241 drop=0 early_drop=0 error=3 search_restart=28343
cpu=1           found=0 invalid=53 ignore=182256 insert=0 insert_failed=23420 drop=0 early_drop=0 error=5 search_restart=15742

可以看到,并没有再drop数据包。

知识共享许可协议 本文由作者按照 CC BY-SA 4.0 进行授权