高并发情况下backlog过低出现的问题

问题

  • 最近重构一个以前写的FTP Server,压测时,服务器的backlog是20,然后client总是有一些链接,已经new Socket()成功(根据测试,这意味着三次握手完成),但是就是收不到welcome信息,服务端的log看到的accept的链接数目少于client打开的链接数目(服务端的accept也没有报错),少的数量刚好是client端hang住在读welcome信息那些链接的数目。后来尝试了调整ulimit无果,调整了backlog后就好了。看起来就像是一些链接被悄无声息(没有返回FIN或RST之类的报文)地从 等待被应用accept的队列 移除。
  • 我抽查的十几个hang住的端口对应的报文都是三次握手成功,然后还hang住时是没有新的报文的,client一关闭,发送了FIN后,一种情况是,server端就发送数据给client,另一种情况是server端对client的FIN报文返回RST。
  • 报文如下

    • 第一种情况的报文,这种是client hang住没多久就关掉整个client进程时出现的情况

      1
      2
      3
      4
      5
      6
      7
      8
       117890 271.658989117  127.0.0.1             127.0.0.1             TCP      74     [TCP Port numbers reused] 40900 → 8083 [SYN, ECN, CWR] Seq=0 Win=43690 Len=0 MSS=65495 SACK_PERM=1 TSval=298647083 TSecr=0 WS=128
      201498 272.673187012 127.0.0.1 127.0.0.1 TCP 74 [TCP Retransmission] 40900 → 8083 [SYN] Seq=0 Win=43690 Len=0 MSS=65495 SACK_PERM=1 TSval=298648097 TSecr=0 WS=128
      201591 272.673232077 127.0.0.1 127.0.0.1 TCP 74 8083 → 40900 [SYN, ACK] Seq=0 Ack=1 Win=43690 Len=0 MSS=65495 SACK_PERM=1 TSval=298648068 TSecr=298648097 WS=128
      201643 272.673288084 127.0.0.1 127.0.0.1 TCP 66 40900 → 8083 [ACK] Seq=1 Ack=1 Win=43776 Len=0 TSval=298648097 TSecr=298648068
      203783 297.152188039 127.0.0.1 127.0.0.1 TCP 66 40900 → 8083 [FIN, ACK] Seq=1 Ack=1 Win=43776 Len=0 TSval=298672576 TSecr=298648068
      203823 297.152451323 127.0.0.1 127.0.0.1 TCP 73 8083 → 40900 [PSH, ACK] Seq=1 Ack=2 Win=29312 Len=7 TSval=298672576 TSecr=298672576
      0000 68 65 6c 6c 6f 0d 0a hello..
      203825 297.152464118 127.0.0.1 127.0.0.1 TCP 54 40900 → 8083 [RST] Seq=2 Win=0 Len=0
    • 第二种情况的报文,这种是client hang住很久才关掉整个client进程时出现的情况

      1
      2
      3
      4
      5
      6
       21004 108.400081512  127.0.0.1             127.0.0.1             TCP      74     46538 → 8083 [SYN, ECN, CWR] Seq=0 Win=43690 Len=0 MSS=65495 SACK_PERM=1 TSval=298483826 TSecr=0 WS=128
      92060 109.409186851 127.0.0.1 127.0.0.1 TCP 74 [TCP Retransmission] 46538 → 8083 [SYN] Seq=0 Win=43690 Len=0 MSS=65495 SACK_PERM=1 TSval=298484836 TSecr=0 WS=128
      92108 109.409228584 127.0.0.1 127.0.0.1 TCP 74 8083 → 46538 [SYN, ACK] Seq=0 Ack=1 Win=43690 Len=0 MSS=65495 SACK_PERM=1 TSval=298484804 TSecr=298484836 WS=128
      92155 109.409268850 127.0.0.1 127.0.0.1 TCP 66 46538 → 8083 [ACK] Seq=1 Ack=1 Win=43776 Len=0 TSval=298484836 TSecr=298484804
      112908 254.340209126 127.0.0.1 127.0.0.1 TCP 66 46538 → 8083 [FIN, ACK] Seq=1 Ack=1 Win=43776 Len=0 TSval=298629764 TSecr=298484804
      112909 254.340214301 127.0.0.1 127.0.0.1 TCP 54 8083 → 46538 [RST] Seq=1 Win=0 Len=0

复现

  • server

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    import java.io.IOException;
    import java.net.ServerSocket;
    import java.net.Socket;
    import java.util.concurrent.Executor;
    import java.util.concurrent.Executors;

    public class TestBacklogServer {
    public static void main(String[] args) throws IOException {
    Executor executor = Executors.newFixedThreadPool(128);
    ServerSocket socket = new ServerSocket(8083, 10);
    int cnt = 0;
    while (true) {
    final Socket client = socket.accept();
    System.out.println(cnt + ": accept one: " + client.getRemoteSocketAddress());
    cnt++;
    executor.execute(() -> {
    try {
    client.getOutputStream().write("hello\r\n".getBytes());
    client.close();
    } catch (IOException e) {
    e.printStackTrace();
    }
    });
    }
    }
    }
  • client。之所以设置了timeout,是为了拿到hang住的端口号,测试时可以不设置timeout,然后在没有新的输出时,threadDump,可以看到有多个链接hang在read那里

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    51
    52
    53
    54
    55
    56
    57
    58
    59
    60
    61
    62
    63
    64
    65
    66
    67
    import java.io.BufferedReader;
    import java.io.FileReader;
    import java.io.IOException;
    import java.io.InputStream;
    import java.net.ServerSocket;
    import java.net.Socket;
    import java.net.SocketTimeoutException;
    import java.util.HashSet;
    import java.util.Vector;
    import java.util.concurrent.ExecutorService;
    import java.util.concurrent.Executors;
    import java.util.concurrent.TimeUnit;
    import java.util.concurrent.atomic.LongAdder;
    import java.util.regex.Matcher;
    import java.util.regex.Pattern;

    public class TestBacklogClient {
    private static final LongAdder adder = new LongAdder();

    public static void main(String[] args) throws IOException, InterruptedException {
    Vector<Integer> timeoutPorts = new Vector<>(1240);
    ExecutorService pool = Executors.newFixedThreadPool(128);
    for (int i = 0; i < 10240; i++) {
    int finalI = i;
    pool.execute(() -> {
    try {
    Socket socket = new Socket("localhost", 8083);
    socket.setSoTimeout(1024 * 10);
    System.out.println("connect Success one: " + finalI);
    String r = readLine(socket.getInputStream());
    assert r == null || "hello\r\n".equals(r);
    if (r == null) {
    timeoutPorts.add(socket.getLocalPort());
    }
    System.out.println("read success one: " + finalI);
    } catch (IOException e) {
    e.printStackTrace();
    }
    });
    }
    pool.shutdown();
    pool.awaitTermination(1024, TimeUnit.DAYS);
    System.out.println(adder.sum());
    System.out.println(timeoutPorts);
    }

    private static String readLine(InputStream in) throws IOException {
    byte[] buf = new byte[1024];
    StringBuilder builder = new StringBuilder();
    try {
    while (true) {
    int t = in.read(buf);
    if (t < 0) {
    break;
    }
    builder.append(new String(buf, 0, t));
    }
    return builder.toString();
    } catch (SocketTimeoutException ex) {
    adder.add(1);
    System.out.println("time out, read: " + builder.length()
    + ", " + builder.toString()
    + ", hasCRLF: " + (builder.lastIndexOf("\r\n") >= 0));
    }
    return null;
    }
    }
  • localhost的系统配置信息

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    $ uname -r
    4.18.0-13-generic

    $ /bin/cat /proc/sys/net/ipv4/tcp_max_syn_backlog
    512

    $ lsb_release -a
    No LSB modules are available.
    Distributor ID: Ubuntu
    Description: Ubuntu 18.10
    Release: 18.10
    Codename: cosmic

    $ java -version
    openjdk version "11.0.1" 2018-10-16
    OpenJDK Runtime Environment 18.9 (build 11.0.1+13)
    OpenJDK 64-Bit Server VM 18.9 (build 11.0.1+13, mixed mode)

    $ /bin/cat /proc/sys/net/core/somaxconn
    128
  • vps的配置信息

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    root@vultr:~# ./jdk-11.0.2/bin/java -version
    java version "11.0.2" 2019-01-15 LTS
    Java(TM) SE Runtime Environment 18.9 (build 11.0.2+9-LTS)
    Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.2+9-LTS, mixed mode)

    root@vultr:~# uname -r
    4.18.0-10-generic

    root@vultr:~# lsb_release -a
    No LSB modules are available.
    Distributor ID: Ubuntu
    Description: Ubuntu 18.10
    Release: 18.10
    Codename: cosmic

    root@vultr:~# /bin/cat /proc/sys/net/ipv4/tcp_max_syn_backlog
    128

    root@vultr:~# /bin/cat /proc/sys/net/core/somaxconn
    128
  • 复现过程出现过这个现象:client端ctrl-c关掉后,server端突然就接收到了剩下的所有链接。这个在server运行在localhost时并且 一旦所有其他没有被hang住的链接完成后马上就ctrl-c时出现过,但是在server运行在vps上没出现过。

猜测原因一

  • 按照TCP/IP详解卷二的说法,如果listen的那个端点的队列满了(已经完成三次握手的队列,其大小由backlog确定),那么就不回复syn。但是,有个问题,如果多个syn同时到达,那么即使这时候队列未满,也可能接受了这些syn的一部分后队列就满了,那么linux系统如何处理——是否会存在有些链接三次握手完成,但是放不进队列里。

  • 按照这篇文章(http://veithen.io/2014/01/01/how-tcp-backlog-works-in-linux.html)的说法,如果遇到这种情况,服务器会不断发送syn/ack报文(如果/proc/sys/net/ipv4/tcp_abort_on_overflow1,则发送RST)

  • 我自己也抓到了这种报文,其中8083是server,server的backlog只有10,然后有10240个client企图链接该server。

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    No.     Time           Source                Destination           Protocol Length Info
    117974 271.659812650 127.0.0.1 127.0.0.1 TCP 74 [TCP Port numbers reused] 40970 → 8083 [SYN, ECN, CWR] Seq=0 Win=43690 Len=0 MSS=65495 SACK_PERM=1 TSval=298647084 TSecr=0 WS=128
    117976 271.659818367 127.0.0.1 127.0.0.1 TCP 74 8083 → 40970 [SYN, ACK, ECN] Seq=0 Ack=1 Win=43690 Len=0 MSS=65495 SACK_PERM=1 TSval=298647084 TSecr=298647084 WS=128
    117977 271.659825016 127.0.0.1 127.0.0.1 TCP 66 40970 → 8083 [ACK] Seq=1 Ack=1 Win=43776 Len=0 TSval=298647084 TSecr=298647084
    201482 272.673178662 127.0.0.1 127.0.0.1 TCP 74 [TCP Retransmission] 8083 → 40970 [SYN, ACK, ECN] Seq=0 Ack=1 Win=43690 Len=0 MSS=65495 SACK_PERM=1 TSval=298648097 TSecr=298647084 WS=128
    201563 272.673220860 127.0.0.1 127.0.0.1 TCP 66 [TCP Dup ACK 117977#1] 40970 → 8083 [ACK] Seq=1 Ack=1 Win=43776 Len=0 TSval=298648097 TSecr=298647084
    201993 274.693205982 127.0.0.1 127.0.0.1 TCP 74 [TCP Retransmission] 8083 → 40970 [SYN, ACK, ECN] Seq=0 Ack=1 Win=43690 Len=0 MSS=65495 SACK_PERM=1 TSval=298650117 TSecr=298648097 WS=128
    201995 274.693247830 127.0.0.1 127.0.0.1 TCP 66 [TCP Dup ACK 117977#2] 40970 → 8083 [ACK] Seq=1 Ack=1 Win=43776 Len=0 TSval=298650117 TSecr=298647084
    201997 274.693527487 127.0.0.1 127.0.0.1 TCP 73 8083 → 40970 [PSH, ACK] Seq=1 Ack=1 Win=43776 Len=7 TSval=298650117 TSecr=298650117
    0000 68 65 6c 6c 6f 0d 0a hello..
    201999 274.693556287 127.0.0.1 127.0.0.1 TCP 66 40970 → 8083 [ACK] Seq=1 Ack=8 Win=43776 Len=0 TSval=298650117 TSecr=298650117
    202001 274.695599421 127.0.0.1 127.0.0.1 TCP 66 8083 → 40970 [FIN, ACK] Seq=8 Ack=1 Win=43776 Len=0 TSval=298650119 TSecr=298650117
    202009 274.737168658 127.0.0.1 127.0.0.1 TCP 66 40970 → 8083 [ACK] Seq=1 Ack=9 Win=43776 Len=0 TSval=298650161 TSecr=298650119
    203731 297.151676910 127.0.0.1 127.0.0.1 TCP 66 40970 → 8083 [FIN, ACK] Seq=1 Ack=9 Win=43776 Len=0 TSval=298672575 TSecr=298650119
    203732 297.151682855 127.0.0.1 127.0.0.1 TCP 66 8083 → 40970 [ACK] Seq=9 Ack=2 Win=43776 Len=0 TSval=298672575 TSecr=298672575

    /proc/sys/net/ipv4/tcp_abort_on_overflow设为1之后,抓到了如下报文。需要注意,那个RST并不是服务器关闭了,服务器后面还对其他链接响应了很多报文

    1
    2
    3
    4
    5
    6
    7
    No.     Time           Source                Destination           Protocol Length Info
    5904 9.174057671 127.0.0.1 127.0.0.1 TCP 74 54448 → 8083 [SYN, ECN, CWR] Seq=0 Win=43690 Len=0 MSS=65495 SACK_PERM=1 TSval=300065429 TSecr=0 WS=128
    92075 10.190560916 127.0.0.1 127.0.0.1 TCP 74 [TCP Retransmission] 54448 → 8083 [SYN] Seq=0 Win=43690 Len=0 MSS=65495 SACK_PERM=1 TSval=300066446 TSecr=0 WS=128
    92209 12.206628025 127.0.0.1 127.0.0.1 TCP 74 [TCP Retransmission] 54448 → 8083 [SYN] Seq=0 Win=43690 Len=0 MSS=65495 SACK_PERM=1 TSval=300068462 TSecr=0 WS=128
    92258 12.206814283 127.0.0.1 127.0.0.1 TCP 74 8083 → 54448 [SYN, ACK] Seq=0 Ack=1 Win=43690 Len=0 MSS=65495 SACK_PERM=1 TSval=300068462 TSecr=300068462 WS=128
    92319 12.207072233 127.0.0.1 127.0.0.1 TCP 66 54448 → 8083 [ACK] Seq=1 Ack=1 Win=43776 Len=0 TSval=300068462 TSecr=300068462
    92345 12.207466000 127.0.0.1 127.0.0.1 TCP 54 8083 → 54448 [RST] Seq=1 Win=0 Len=0

    需要注意,那个RST并不是服务器关闭了,服务器后面还对其他链接响应了很多报文——如下面所示,最后的那个No.为92345后面还有很多报文,都是8083端口的server与client的通信

    1
    2
    3
    4
    5
    6
    No.     Time           Source                Destination           Protocol Length Info
    92345 12.207466000 127.0.0.1 127.0.0.1 TCP 54 8083 → 54448 [RST] Seq=1 Win=0 Len=0
    92346 12.207985091 127.0.0.1 127.0.0.1 TCP 73 8083 → 54168 [PSH, ACK] Seq=1 Ack=1 Win=43776 Len=7 TSval=300068463 TSecr=300068462
    92347 12.208004676 127.0.0.1 127.0.0.1 TCP 66 54168 → 8083 [ACK] Seq=1 Ack=8 Win=43776 Len=0 TSval=300068463 TSecr=300068463
    92348 12.208041224 127.0.0.1 127.0.0.1 TCP 66 8083 → 54168 [FIN, ACK] Seq=8 Ack=1 Win=43776 Len=0 TSval=300068463 TSecr=300068463
    92349 12.208052138 127.0.0.1 127.0.0.1 TCP 73 8083 → 54172 [PSH, ACK] Seq=1 Ack=1 Win=43776 Len=7 TSval=300068463 TSecr=300068462
  • 然而,这个并不是上述问题的原因,因为打印出来的hang住的端口号,跟发生重传syn/ack的端口号对不上

猜测原因二

  • 网上搜到这篇文章http://www.10tiao.com/html/749/201411/201005717/1.html
  • 案例分析(二)那里描述的现象刚好跟跟上述问题非常相似——也是client握手成功但是读不到信息,作者分析的原因是,第三次握手的ACK因为AcceptQueue溢出而被丢弃,导致client进入了ESTABLISHED状态,但是server只是在SYN_RECV状态,并且因为net.ipv4.tcp_synack_retries=1(我的测试机器的这个参数是5),所以如果Server重传的SYN+ACK报文对应的ACK还是被丢弃,那么就会进入client以为连接成功,但是server并没有连接成功的状态
  • 然而,我把server架设到另一个机子上,然后使用watch -n 0.5 "netcat -n | grep 8083 | grep -i "syn""检查server那里是否有链接是一直处于syn_recv状态,但是整个过程中看到的处于SYN_RECV状态的链接端口号都是在不断变动(说明它们不是hang住的链接),并且client端所有没有hang住的链接都完成后,只剩下hang住的链接时,server端是看不到处于syn_recv状态的链接的
  • 所以我认为可能也不是这个原因(欢迎指正!)

结尾

  • 这个问题并没有解决,如果有大佬知道原因,或者有一些想法,欢迎与我交流,谢谢!