NetBSD Problem Report #55567

From kardel@kardel.name  Thu Aug 13 05:10:31 2020
Return-Path: <kardel@kardel.name>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 57C2B1A9239
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 13 Aug 2020 05:10:31 +0000 (UTC)
Message-Id: <20200813051025.D3055DA0DEF@pip.kardel.name>
Date: Thu, 13 Aug 2020 07:10:25 +0200 (CEST)
From: kardel@netbsd.org
Reply-To: kardel@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: tcp-send slows down to slow single byte transfers
X-Send-Pr-Version: 3.95

>Number:         55567
>Notify-List:    kim
>Category:       kern
>Synopsis:       tcp-send slows down to slow single byte transfers
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Aug 13 05:15:00 +0000 2020
>Closed-Date:    Sun Sep 13 15:06:56 +0000 2020
>Last-Modified:  Sun Sep 13 15:06:56 +0000 2020
>Originator:     Frank Kardel
>Release:        NetBSD 9.99.70
>Organization:

>Environment:


System: NetBSD pip 9.99.70 NetBSD 9.99.70 (PIPGEN) #32: Wed Aug 12 15:29:33 CEST 2020 kardel@pip:/src/NetBSD/cur/src/obj.amd64/sys/arch/amd64/compile/PIPGEN amd64
Architecture: x86_64
Machine: amd64
>Description:
While running a java application on -current(9.99.60+70) with JDBC talking to a postgres11 database I observed the following:

The application runs and processes large number of queries via the tcp back-end connection via the loopback interface. After a while the communication

slows down significantly.

The database back-end gets a single character of a query string once every five seconds and later on once per minute. At the same time netstat -an

shows that the send-queue has more than one character available.

#root@Alpine:/home/kardel 20:42 [28]# tcpdump -i lo0 -n port 5433 and port 65525
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
21:45:24.330991 IP 127.0.0.1.5433 > 127.0.0.1.65525: Flags [P.], seq 1517533791:1517533826, ack 31819, win 4096, options [nop,nop,TS val 1066 ecr 1066], length 35
21:45:24.525198 IP 127.0.0.1.65525 > 127.0.0.1.5433: Flags [.], ack 1517533826, win 4096, options [nop,nop,TS val 1066 ecr 1066], length 0
21:45:31.723722 IP 127.0.0.1.65525 > 127.0.0.1.5433: Flags [P.], seq 31819:31978, ack 1517533826, win 4096, options [nop,nop,TS val 1080 ecr 1066], length 159
21:45:31.725197 IP 127.0.0.1.5433 > 127.0.0.1.65525: Flags [P.], seq 1517533826:1517534277, ack 31978, win 4096, options [nop,nop,TS val 1080 ecr 1080], length 451
21:45:31.725459 IP 127.0.0.1.65525 > 127.0.0.1.5433: Flags [P.], seq 31978:32074, ack 1517534277, win 4096, options [nop,nop,TS val 1080 ecr 1080], length 96
21:45:31.726212 IP 127.0.0.1.5433 > 127.0.0.1.65525: Flags [P.], seq 1517534277:1517534718, ack 32074, win 4096, options [nop,nop,TS val 1080 ecr 1080], length 441
21:45:31.726477 IP 127.0.0.1.65525 > 127.0.0.1.5433: Flags [P.], seq 32074:32233, ack 1517534718, win 4096, options [nop,nop,TS val 1080 ecr 1080], length 159
21:45:31.727340 IP 127.0.0.1.5433 > 127.0.0.1.65525: Flags [P.], seq 1517534718:1517535152, ack 32233, win 4096, options [nop,nop,TS val 1080 ecr 1080], length 434
21:45:31.727536 IP 127.0.0.1.65525 > 127.0.0.1.5433: Flags [P.], seq 32233:32329, ack 1517535152, win 4096, options [nop,nop,TS val 1080 ecr 1080], length 96
21:45:31.728592 IP 127.0.0.1.5433 > 127.0.0.1.65525: Flags [P.], seq 1517535152:1517535600, ack 32329, win 4096, options [nop,nop,TS val 1080 ecr 1080], length 448
21:45:31.729087 IP 127.0.0.1.65525 > 127.0.0.1.5433: Flags [P.], seq 32329:32552, ack 1517535600, win 4096, options [nop,nop,TS val 1080 ecr 1080], length 223
21:45:31.730461 IP 127.0.0.1.5433 > 127.0.0.1.65525: Flags [P.], seq 1517535600:1517535664, ack 32552, win 4096, options [nop,nop,TS val 1080 ecr 1080], length 64
21:45:31.730994 IP 127.0.0.1.65525 > 127.0.0.1.5433: Flags [P.], seq 32552:32622, ack 1517535664, win 4096, options [nop,nop,TS val 1080 ecr 1080], length 70
21:45:31.731107 IP 127.0.0.1.5433 > 127.0.0.1.65525: Flags [P.], seq 1517535664:1517535700, ack 32622, win 4096, options [nop,nop,TS val 1080 ecr 1080], length 36
21:45:31.731309 IP 127.0.0.1.65525 > 127.0.0.1.5433: Flags [P.], seq 32622:32683, ack 1517535700, win 4096, options [nop,nop,TS val 1080 ecr 1080], length 61
21:45:31.731444 IP 127.0.0.1.5433 > 127.0.0.1.65525: Flags [P.], seq 1517535700:1517535785, ack 32683, win 4096, options [nop,nop,TS val 1080 ecr 1080], length 85
21:45:31.925506 IP 127.0.0.1.65525 > 127.0.0.1.5433: Flags [.], seq 32683:32769, ack 1517535785, win 4096, options [nop,nop,TS val 1081 ecr 1080], length 86
21:45:32.125511 IP 127.0.0.1.5433 > 127.0.0.1.65525: Flags [.], ack 32769, win 4096, options [nop,nop,TS val 1081 ecr 1081], length 0
21:45:37.125738 IP 127.0.0.1.65525 > 127.0.0.1.5433: Flags [.], seq 32769:32770, ack 1517535785, win 4096, options [nop,nop,TS val 1091 ecr 1081], length 1

### What did just happen?

21:45:37.325728 IP 127.0.0.1.5433 > 127.0.0.1.65525: Flags [.], ack 32770, win 4096, options [nop,nop,TS val 1092 ecr 1091], length 0
21:45:42.126167 IP 127.0.0.1.65525 > 127.0.0.1.5433: Flags [.], seq 32770:32771, ack 1517535785, win 4096, options [nop,nop,TS val 1101 ecr 1092], length 1
21:45:42.326164 IP 127.0.0.1.5433 > 127.0.0.1.65525: Flags [.], ack 32771, win 4096, options [nop,nop,TS val 1102 ecr 1101], length 0
21:45:47.126311 IP 127.0.0.1.65525 > 127.0.0.1.5433: Flags [.], seq 32771:32772, ack 1517535785, win 4096, options [nop,nop,TS val 1111 ecr 1102], length 1
21:45:47.326311 IP 127.0.0.1.5433 > 127.0.0.1.65525: Flags [.], ack 32772, win 4096, options [nop,nop,TS val 1112 ecr 1111], length 0
21:45:52.126460 IP 127.0.0.1.65525 > 127.0.0.1.5433: Flags [.], seq 32772:32773, ack 1517535785, win 4096, options [nop,nop,TS val 1121 ecr 1112], length 1
21:45:52.326463 IP 127.0.0.1.5433 > 127.0.0.1.65525: Flags [.], ack 32773, win 4096, options [nop,nop,TS val 1122 ecr 1121], length 0
21:45:57.126615 IP 127.0.0.1.65525 > 127.0.0.1.5433: Flags [.], seq 32773:32774, ack 1517535785, win 4096, options [nop,nop,TS val 1131 ecr 1122], length 1
21:45:57.326619 IP 127.0.0.1.5433 > 127.0.0.1.65525: Flags [.], ack 32774, win 4096, options [nop,nop,TS val 1132 ecr 1131], length 0
21:46:02.126768 IP 127.0.0.1.65525 > 127.0.0.1.5433: Flags [.], seq 32774:32775, ack 1517535785, win 4096, options [nop,nop,TS val 1141 ecr 1132], length 1
21:46:02.326766 IP 127.0.0.1.5433 > 127.0.0.1.65525: Flags [.], ack 32775, win 4096, options [nop,nop,TS val 1142 ecr 1141], length 0
21:46:07.126920 IP 127.0.0.1.65525 > 127.0.0.1.5433: Flags [.], seq 32775:32776, ack 1517535785, win 4096, options [nop,nop,TS val 1151 ecr 1142], length 1
21:46:07.326932 IP 127.0.0.1.5433 > 127.0.0.1.65525: Flags [.], ack 32776, win 4096, options [nop,nop,TS val 1152 ecr 1151], length 0
21:46:12.127070 IP 127.0.0.1.65525 > 127.0.0.1.5433: Flags [.], seq 32776:32777, ack 1517535785, win 4096, options [nop,nop,TS val 1161 ecr 1152], length 1
#root@Alpine:/home/kardel 20:40 [27]# netstat -na | grep 5433

tcp        0      0  127.0.0.1.5433 127.0.0.1.65525        ESTABLISHED
tcp        0    506  127.0.0.1.65525        127.0.0.1.5433 ESTABLISHED

I suspect something triggered the slow sending and it seems to be stuck in that state even though

the window is 4096 and all packets are (slowly - 200ms) acked. 

>How-To-Repeat:
	No simple method yet known. The Java application / database reproduces it in 5 minutes. There
	are reports that this also happens with iperf3 runs.
>Fix:
	not yet - state handling needs to be analysed

>Release-Note:

>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55567: tcp-send slows down to slow single byte transfers
Date: Fri, 14 Aug 2020 01:17:30 +0000

 Not sent to gnats.

    ------

 From: Frank Kardel <kardel@netbsd.org>
 To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
 Subject: Re: kern/55567: tcp-send slows down to slow single byte transfers
 Date: Thu, 13 Aug 2020 14:44:56 +0200

 Here is some information on the state when we are in the slowed down mode:

 Connection: (netstat -Ana | grep 65454
 ffffd17391db63a8 tcp        0      0 127.0.0.1.5433 127.0.0.1.65454
 ESTABL
 ffffd17391db6060 tcp        0    163 127.0.0.1.65454 127.0.0.1.5433
 ESTABL

 Server side TCP-PCB:
 TCP Protocol Control Block at 0xffffd17391db63a8:

 Timers:
         REXMT   0x100<MPSAFE>
         PERSIST 0x100<MPSAFE>
         KEEP    0x102<PENDING,MPSAFE>   719818
         2MSL    0x100<MPSAFE>


 State: ESTABLISHED, flags 0x9e4, inpcb 0xffffd17355b48c70, in6pcb 0x0

 rxtshift 0, rxtcur 3, dupacks 0
 peermss 33584, ourmss 33584, segsz 20480, segqlen 0

 snd_una 72877108, snd_nxt 72877108, snd_up 72877108
 snd_wl1 2183613798, snd_wl2 72877023, iss 2198870750, snd_wnd 32768

 rcv_wnd 32768, rcv_nxt 2183614269, rcv_up 2183613798, irs 2183533130

 rcv_adv 2183647037, snd_max 72877108, snd_cwnd 78135, snd_ssthresh 536854528
 rcvtime 43067, rtttime 0, rtseq 72877023, srtt 14, rttvar 9, rttmin 2,
 max_sndwnd 32768

 oobflags 0, iobc 0, softerror 0

 snd_scale 3, rcv_scale 3, req_r_scale 3, req_s_scale 3
 ts_recent 2123, ts_regent_age 43067, last_ack_sent 2183614269

 Client side TCP_PCB:
 TCP Protocol Control Block at 0xffffd17391db6060:

 Timers:
         REXMT   0x100<MPSAFE>
         PERSIST 0x102<PENDING,MPSAFE>   57
         KEEP    0x102<PENDING,MPSAFE>   719552
         2MSL    0x100<MPSAFE>


 State: ESTABLISHED, flags 0xbe4, inpcb 0xffffd17355b48b80, in6pcb 0x0

 rxtshift 0, rxtcur 3, dupacks 0
 peermss 33584, ourmss 33584, segsz 16384, segqlen 0

 snd_una 2183614282, snd_nxt 2183614282, snd_up 2183614281
 snd_wl1 2199032056, snd_wl2 2183579549, iss 2183533130, snd_wnd 0

 rcv_wnd 32768, rcv_nxt 72877108, rcv_up 2199032056, irs 2198870750

 rcv_adv 72909876, snd_max 2183614282, snd_cwnd 32769, snd_ssthresh 536854528
 rcvtime 43198, rtttime 0, rtseq 2183613798, srtt 32, rttvar 3, rttmin 2,
 max_sndwnd 32768

 oobflags 0, iobc 0, softerror 0

 snd_scale 3, rcv_scale 3, req_r_scale 3, req_s_scale 3
 ts_recent 2254, ts_regent_age 43198, last_ack_sent 72877108

 Current packet trace:
 14:40:25.770035 IP 127.0.0.1.65454 > 127.0.0.1.5433: Flags [.], seq
 2183614290:2183614291, ack 72877108, win 4096, options [nop,nop,TS val 2343
 ecr 2334], length 1
 14:40:25.970039 IP 127.0.0.1.5433 > 127.0.0.1.65454: Flags [.], ack 1, win
 4096, options [nop,nop,TS val 2344 ecr 2343], length 0
 14:40:30.770195 IP 127.0.0.1.65454 > 127.0.0.1.5433: Flags [.], seq 1:2, ack
 1, win 4096, options [nop,nop,TS val 2353 ecr 2344], length 1
 14:40:30.970199 IP 127.0.0.1.5433 > 127.0.0.1.65454: Flags [.], ack 2, win
 4096, options [nop,nop,TS val 2354 ecr 2353], length 0
 14:40:35.770256 IP 127.0.0.1.65454 > 127.0.0.1.5433: Flags [.], seq 2:3, ack
 1, win 4096, options [nop,nop,TS val 2363 ecr 2354], length 1
 14:40:35.970358 IP 127.0.0.1.5433 > 127.0.0.1.65454: Flags [.], ack 3, win
 4096, options [nop,nop,TS val 2364 ecr 2363], length 0
 14:40:40.770512 IP 127.0.0.1.65454 > 127.0.0.1.5433: Flags [.], seq 3:4, ack
 1, win 4096, options [nop,nop,TS val 2373 ecr 2364], length 1
 14:40:40.970516 IP 127.0.0.1.5433 > 127.0.0.1.65454: Flags [.], ack 4, win
 4096, options [nop,nop,TS val 2374 ecr 2373], length 0
 14:40:45.770674 IP 127.0.0.1.65454 > 127.0.0.1.5433: Flags [.], seq 4:5, ack
 1, win 4096, options [nop,nop,TS val 2383 ecr 2374], length 1
 14:40:45.970676 IP 127.0.0.1.5433 > 127.0.0.1.65454: Flags [.], ack 5, win
 4096, options [nop,nop,TS val 2384 ecr 2383], length 0
 14:40:50.770831 IP 127.0.0.1.65454 > 127.0.0.1.5433: Flags [.], seq 5:6, ack
 1, win 4096, options [nop,nop,TS val 2393 ecr 2384], length 1
 14:40:50.970837 IP 127.0.0.1.5433 > 127.0.0.1.65454: Flags [.], ack 6, win
 4096, options [nop,nop,TS val 2394 ecr 2393], length 0
 14:40:55.770990 IP 127.0.0.1.65454 > 127.0.0.1.5433: Flags [.], seq 6:7, ack
 1, win 4096, options [nop,nop,TS val 2403 ecr 2394], length 1
 14:40:55.970977 IP 127.0.0.1.5433 > 127.0.0.1.65454: Flags [.], ack 7, win
 4096, options [nop,nop,TS val 2404 ecr 2403], length 0
 14:41:00.771132 IP 127.0.0.1.65454 > 127.0.0.1.5433: Flags [.], seq 7:8, ack
 1, win 4096, options [nop,nop,TS val 2413 ecr 2404], length 1
 14:41:00.971155 IP 127.0.0.1.5433 > 127.0.0.1.65454: Flags [.], ack 8, win
 4096, options [nop,nop,TS val 2414 ecr 2413], length 0

From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55567: tcp-send slows down to slow single byte transfers
Date: Fri, 14 Aug 2020 06:23:10 +0200

 Thanks for catching that!

 Frank


 On 08/14/20 03:20, David Holland wrote:
 > The following reply was made to PR kern/55567; it has been noted by GNATS.
 >
 > From: David Holland <dholland-bugs@netbsd.org>
 > To: gnats-bugs@netbsd.org
 > Cc:
 > Subject: Re: kern/55567: tcp-send slows down to slow single byte transfers
 > Date: Fri, 14 Aug 2020 01:17:30 +0000
 >
 >   Not sent to gnats.
 >   
 >      ------
 >   
 >   From: Frank Kardel <kardel@netbsd.org>
 >   To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
 >   Subject: Re: kern/55567: tcp-send slows down to slow single byte transfers
 >   Date: Thu, 13 Aug 2020 14:44:56 +0200
 >   
 >   Here is some information on the state when we are in the slowed down mode:
 >   
 >   Connection: (netstat -Ana | grep 65454
 >   ffffd17391db63a8 tcp        0      0 127.0.0.1.5433 127.0.0.1.65454
 >   ESTABL
 >   ffffd17391db6060 tcp        0    163 127.0.0.1.65454 127.0.0.1.5433
 >   ESTABL
 >   
 >   Server side TCP-PCB:
 >   TCP Protocol Control Block at 0xffffd17391db63a8:
 >   
 >   Timers:
 >           REXMT   0x100<MPSAFE>
 >           PERSIST 0x100<MPSAFE>
 >           KEEP    0x102<PENDING,MPSAFE>   719818
 >           2MSL    0x100<MPSAFE>
 >   
 >   
 >   State: ESTABLISHED, flags 0x9e4, inpcb 0xffffd17355b48c70, in6pcb 0x0
 >   
 >   rxtshift 0, rxtcur 3, dupacks 0
 >   peermss 33584, ourmss 33584, segsz 20480, segqlen 0
 >   
 >   snd_una 72877108, snd_nxt 72877108, snd_up 72877108
 >   snd_wl1 2183613798, snd_wl2 72877023, iss 2198870750, snd_wnd 32768
 >   
 >   rcv_wnd 32768, rcv_nxt 2183614269, rcv_up 2183613798, irs 2183533130
 >   
 >   rcv_adv 2183647037, snd_max 72877108, snd_cwnd 78135, snd_ssthresh 536854528
 >   rcvtime 43067, rtttime 0, rtseq 72877023, srtt 14, rttvar 9, rttmin 2,
 >   max_sndwnd 32768
 >   
 >   oobflags 0, iobc 0, softerror 0
 >   
 >   snd_scale 3, rcv_scale 3, req_r_scale 3, req_s_scale 3
 >   ts_recent 2123, ts_regent_age 43067, last_ack_sent 2183614269
 >   
 >   Client side TCP_PCB:
 >   TCP Protocol Control Block at 0xffffd17391db6060:
 >   
 >   Timers:
 >           REXMT   0x100<MPSAFE>
 >           PERSIST 0x102<PENDING,MPSAFE>   57
 >           KEEP    0x102<PENDING,MPSAFE>   719552
 >           2MSL    0x100<MPSAFE>
 >   
 >   
 >   State: ESTABLISHED, flags 0xbe4, inpcb 0xffffd17355b48b80, in6pcb 0x0
 >   
 >   rxtshift 0, rxtcur 3, dupacks 0
 >   peermss 33584, ourmss 33584, segsz 16384, segqlen 0
 >   
 >   snd_una 2183614282, snd_nxt 2183614282, snd_up 2183614281
 >   snd_wl1 2199032056, snd_wl2 2183579549, iss 2183533130, snd_wnd 0
 >   
 >   rcv_wnd 32768, rcv_nxt 72877108, rcv_up 2199032056, irs 2198870750
 >   
 >   rcv_adv 72909876, snd_max 2183614282, snd_cwnd 32769, snd_ssthresh 536854528
 >   rcvtime 43198, rtttime 0, rtseq 2183613798, srtt 32, rttvar 3, rttmin 2,
 >   max_sndwnd 32768
 >   
 >   oobflags 0, iobc 0, softerror 0
 >   
 >   snd_scale 3, rcv_scale 3, req_r_scale 3, req_s_scale 3
 >   ts_recent 2254, ts_regent_age 43198, last_ack_sent 72877108
 >   
 >   Current packet trace:
 >   14:40:25.770035 IP 127.0.0.1.65454 > 127.0.0.1.5433: Flags [.], seq
 >   2183614290:2183614291, ack 72877108, win 4096, options [nop,nop,TS val 2343
 >   ecr 2334], length 1
 >   14:40:25.970039 IP 127.0.0.1.5433 > 127.0.0.1.65454: Flags [.], ack 1, win
 >   4096, options [nop,nop,TS val 2344 ecr 2343], length 0
 >   14:40:30.770195 IP 127.0.0.1.65454 > 127.0.0.1.5433: Flags [.], seq 1:2, ack
 >   1, win 4096, options [nop,nop,TS val 2353 ecr 2344], length 1
 >   14:40:30.970199 IP 127.0.0.1.5433 > 127.0.0.1.65454: Flags [.], ack 2, win
 >   4096, options [nop,nop,TS val 2354 ecr 2353], length 0
 >   14:40:35.770256 IP 127.0.0.1.65454 > 127.0.0.1.5433: Flags [.], seq 2:3, ack
 >   1, win 4096, options [nop,nop,TS val 2363 ecr 2354], length 1
 >   14:40:35.970358 IP 127.0.0.1.5433 > 127.0.0.1.65454: Flags [.], ack 3, win
 >   4096, options [nop,nop,TS val 2364 ecr 2363], length 0
 >   14:40:40.770512 IP 127.0.0.1.65454 > 127.0.0.1.5433: Flags [.], seq 3:4, ack
 >   1, win 4096, options [nop,nop,TS val 2373 ecr 2364], length 1
 >   14:40:40.970516 IP 127.0.0.1.5433 > 127.0.0.1.65454: Flags [.], ack 4, win
 >   4096, options [nop,nop,TS val 2374 ecr 2373], length 0
 >   14:40:45.770674 IP 127.0.0.1.65454 > 127.0.0.1.5433: Flags [.], seq 4:5, ack
 >   1, win 4096, options [nop,nop,TS val 2383 ecr 2374], length 1
 >   14:40:45.970676 IP 127.0.0.1.5433 > 127.0.0.1.65454: Flags [.], ack 5, win
 >   4096, options [nop,nop,TS val 2384 ecr 2383], length 0
 >   14:40:50.770831 IP 127.0.0.1.65454 > 127.0.0.1.5433: Flags [.], seq 5:6, ack
 >   1, win 4096, options [nop,nop,TS val 2393 ecr 2384], length 1
 >   14:40:50.970837 IP 127.0.0.1.5433 > 127.0.0.1.65454: Flags [.], ack 6, win
 >   4096, options [nop,nop,TS val 2394 ecr 2393], length 0
 >   14:40:55.770990 IP 127.0.0.1.65454 > 127.0.0.1.5433: Flags [.], seq 6:7, ack
 >   1, win 4096, options [nop,nop,TS val 2403 ecr 2394], length 1
 >   14:40:55.970977 IP 127.0.0.1.5433 > 127.0.0.1.65454: Flags [.], ack 7, win
 >   4096, options [nop,nop,TS val 2404 ecr 2403], length 0
 >   14:41:00.771132 IP 127.0.0.1.65454 > 127.0.0.1.5433: Flags [.], seq 7:8, ack
 >   1, win 4096, options [nop,nop,TS val 2413 ecr 2404], length 1
 >   14:41:00.971155 IP 127.0.0.1.5433 > 127.0.0.1.65454: Flags [.], ack 8, win
 >   4096, options [nop,nop,TS val 2414 ecr 2413], length 0
 >   

From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55567: tcp-send slows down to slow single byte transfers
Date: Mon, 17 Aug 2020 11:52:57 +0200

 This is a multi-part message in MIME format.
 --------------7998CBB9F2A2AE5CA07C4A61
 Content-Type: text/plain; charset=utf-8; format=flowed
 Content-Transfer-Encoding: 7bit

 Updating observations:

 Testcase database connection - request/reply pattern:
 Client side while everything is working quickly - send windo is 134336:
 TCP Protocol Control Block at 0xffff97010a313a20:

 Timers:
          REXMT   0x100<MPSAFE>
          PERSIST 0x100<MPSAFE>
          KEEP    0x102<PENDING,MPSAFE>   695872
          2MSL    0x100<MPSAFE>


 State: ESTABLISHED, flags 0xbe4, inpcb 0xffff97010a27b780, in6pcb 0x0

 rxtshift 0, rxtcur 3, dupacks 0
 peermss 33584, ourmss 33584, segsz 33572, segqlen 0

 snd_una 751851283, snd_nxt 751851283, snd_up 751850985
 snd_wl1 760580867, snd_wl2 751851283, iss 751812995, snd_wnd 134336

 rcv_wnd 134336, rcv_nxt 760580867, rcv_up 760580867, irs 760431634

 rcv_adv 760715203, snd_max 751851283, snd_cwnd 105455, snd_ssthresh 
 1073725440
 rcvtime 119937, rtttime 0, rtseq 751850985, srtt 11, rttvar 7, rttmin 2, 
 max_sndwnd 134336

 oobflags 0, iobc 0, softerror 0

 snd_scale 3, rcv_scale 3, req_r_scale 3, req_s_scale 3
 ts_recent 4, ts_regent_age 119937, last_ack_sent 760580867

 The server side around that time - send window is 191680:

 TCP Protocol Control Block at 0xffff970177875070:

 Timers:
          REXMT   0x102<PENDING,MPSAFE>   100
          PERSIST 0x100<MPSAFE>
          KEEP    0x102<PENDING,MPSAFE>   720000
          2MSL    0x100<MPSAFE>


 State: ESTABLISHED, flags 0x9e4, inpcb 0xffff97010a27b868, in6pcb 0x0

 rxtshift 0, rxtcur 2, dupacks 0
 peermss 33584, ourmss 33584, segsz 33572, segqlen 0

 snd_una 2723953510, snd_nxt 2723961702, snd_up 2723953510
 snd_wl1 751861149, snd_wl2 2723609446, iss 760431634, snd_wnd 191680

 rcv_wnd 134336, rcv_nxt 751861149, rcv_up 751861149, irs 751812995

 rcv_adv 751995485, snd_max 2723961702, snd_cwnd 524280, snd_ssthresh 67144
 rcvtime 120706, rtttime 120706, rtseq 2723953510, srtt 7, rttvar 3, 
 rttmin 2, max_sndwnd 199520

 oobflags 0, iobc 0, softerror 0

 snd_scale 3, rcv_scale 3, req_r_scale 3, req_s_scale 3
 ts_recent 773, ts_regent_age 120706, last_ack_sent 751861149

 So all is well ans communication is quick - then after some time the 
 client side closes the send window:

 TCP Protocol Control Block at 0xffff97010a313a20:

 Timers:
          REXMT   0x100<MPSAFE>
          PERSIST 0x102<PENDING,MPSAFE>   272
          KEEP    0x102<PENDING,MPSAFE>   719791
          2MSL    0x100<MPSAFE>


 State: ESTABLISHED, flags 0xbe4, inpcb 0xffff97010a27b780, in6pcb 0x0

 rxtshift 0, rxtcur 3, dupacks 0
 peermss 33584, ourmss 33584, segsz 33572, segqlen 0

 snd_una 751995524, snd_nxt 751995524, snd_up 751995523
 snd_wl1 760593254, snd_wl2 751861149, iss 751812995, snd_wnd 0

 rcv_wnd 199872, rcv_nxt 2929580456, rcv_up 760593254, irs 760431634

 rcv_adv 2929780328, snd_max 751995524, snd_cwnd 67144, snd_ssthresh 67144
 rcvtime 123247, rtttime 0, rtseq 751995341, srtt 32, rttvar 3, rttmin 2, 
 max_sndwnd 134336

 oobflags 0, iobc 0, softerror 0

 snd_scale 3, rcv_scale 3, req_r_scale 3, req_s_scale 3
 ts_recent 3314, ts_regent_age 123247, last_ack_sent 2929580456

 and the server is still fine:
 TCP Protocol Control Block at 0xffff970177875070:

 Timers:
          REXMT   0x100<MPSAFE>
          PERSIST 0x100<MPSAFE>
          KEEP    0x102<PENDING,MPSAFE>   719715
          2MSL    0x100<MPSAFE>


 State: ESTABLISHED, flags 0x9e4, inpcb 0xffff97010a27b868, in6pcb 0x0

 rxtshift 0, rxtcur 3, dupacks 0
 peermss 33584, ourmss 33584, segsz 33572, segqlen 0

 snd_una 2929580456, snd_nxt 2929580456, snd_up 2929580456
 snd_wl1 751995341, snd_wl2 2929580456, iss 760431634, snd_wnd 199872

 rcv_wnd 134336, rcv_nxt 751995511, rcv_up 751995341, irs 751812995

 rcv_adv 752129847, snd_max 2929580456, snd_cwnd 235004, snd_ssthresh 67144
 rcvtime 123116, rtttime 0, rtseq 2929580196, srtt 11, rttvar 7, rttmin 
 2, max_sndwnd 199872

 oobflags 0, iobc 0, softerror 0

 snd_scale 3, rcv_scale 3, req_r_scale 3, req_s_scale 3
 ts_recent 3183, ts_regent_age 123116, last_ack_sent 751995511

 But at this state the persist timer sends 1 byte window probes which get 
 acknowledgments with the correct sequence numbers.

 The advertised window size is not picked up in tcp_input ACK processing 
 because  SEQ_LT(tp->snd_wl1, th->th_seq)
 at this point SEQ_LT(760593254, 2929580456) returns 0 in the window 
 update part (step6). The question is
 how it happened that snd_wl1 reached 760593254 though one would expect 
 it around (rcv_nxt)2929580456 =< (rcv_adv)2929780328
 if I didn't confuse things here.

 At that state it is unlikely the the connection will recover.

 The pcap trace for this is included.


 --------------7998CBB9F2A2AE5CA07C4A61
 Content-Type: application/gzip;
  name="trace.pcap.gz"
 Content-Transfer-Encoding: base64
 Content-Disposition: attachment;
  filename="trace.pcap.gz"

 H4sICGZPNl8AA3RyYWNlLnBjYXAApdRNaNNgHAbwf9ZRhh+wwwY7Brx0EgS1c63sYKc9CDkI
 DjzIlmZJ6tItqbTpKl4W8OBaEMFTi0RQhkwGOfiF6GHqHMyNbRIctNvUi0cvngep79u0je1e
 ZV0Cb0MLeZ7nByVfl54/7oBOqF+x2v3t8DlutXSs+t1ApwOdKECR5QAu+AFmACh8Kgu9Yca6
 mzCNlTt6H1+w76Efqa4jAEcT+FxBz52FtDQlCRp9ko6nkgodEYRkRtW4y5qk0NkJKSXRMu67
 X3b6NF+9z3eTjTX39YYrC7gLdW7qfZECJTX3nUbPdZ7BHyMoJAA92fGMLN5CPwz/BgrdeqCr
 gi7cd2Kb4Bsn+jZNY23Rq+/1NsEnkHxri4w1G/TqG9oh+ESSbzZoGht7Xn2lHYJPIvk29lBn
 0avv2i7BFyf6iqZhDXr12bW+hOuLszdIPmsQddpV3/XD+25/I/gmiD7bNLZkr77j352+tOtL
 sjLJtyUzVk6o+sTD+wZ+OH1PXN8jViH5coJplOaqvgf7fef/4buUVHhZdYS471mtb4iq91Eh
 ViX5SnOocx37Ksx/fPAQ/K7v19++d6iPeeq8T1dc3yc2S/Stm0Z5Xu9GvjeNvsmaj4rA1Sgb
 vThCK7IakHj51DQ/lZFEXpP6aT5Nx+VUWsN9X174IIR6Qm5fEL/L9/vK84yVH9W7IwU70NQ3
 +R7/7176IIyeC7s5A605zu78aGN3fyMng3N4HPT5g7+dQWMtg6Zx2DLKWT74oDHCoBmcpeIg
 9WNbg7iWQToOW0U5P18deBBHGJTDWSIOCiy1NSjWMiiPw/4AdXQSGsMHAAA=
 --------------7998CBB9F2A2AE5CA07C4A61--

From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@netbsd.org, tech-net@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/55567: tcp-send slows down to slow single byte transfers
 (analysis and fix)
Date: Wed, 2 Sep 2020 10:01:30 +0200

 Analysis:

 The issue is that on a tcp connection that is only sending data but not 
 receiving data it
 can happen that the send window closes. In this state bytes are 
 transferred to the receiver
 via one by one via the zero window probes but snd_wnd updates are skipped.
 The reason for the skipped updates is that SND.WL2 (last window update 
 ACK) has left
 the valid range of SND.UNA =< SEG.ACK <= SND.MAXSEND by a large amount.

 Data flows again when the receiving side sends some data (that's why you 
 can get
 remote login session unstuck by typing.

 Tracking a connection finds that SND.WL2 is basically starving in this 
 scenario and reaches
 a point where the window update code does not update the send window any 
 more:

      /*
       * Update window information.
       * Don't look at window if no ACK: TAC's send garbage on first SYN.
       */
      if ((tiflags & TH_ACK) && (SEQ_LT(tp->snd_wl1, th->th_seq) ||
          (tp->snd_wl1 == th->th_seq && (SEQ_LT(tp->snd_wl2, th->th_ack) ||
          (tp->snd_wl2 == th->th_ack && tiwin > tp->snd_wnd))))) {

 It is an ACK, yes, th_seq has not changed (no data from the recieving side).
 SEQ_LT(tp->snd_wl2, th->th_ack) returns 0 in the stuck state and it is 
 not a true window
 update,, thus the send window updates are stuck until the comparison 
 returns 1 again.
 This can take a very long time at a data rate of 0.2 Bytes/second (the 
 rate will be scaled
 back further later on).

 So, why are updates to SND.WL2 not happening. The cause is an 
 optimization common
 cases of unidirectional transfers:
      /*
       * Fast path: check for the two common cases of a uni-directional
       * data transfer. If:
       *    o We are in the ESTABLISHED state, and
       *    o The packet has no control flags, and
       *    o The packet is in-sequence, and
       *    o The window didn't change, and
       *    o We are not retransmitting
       * It's a candidate.
       *
       * If the length (tlen) is zero and the ack moved forward, we're
       * the sender side of the transfer. Just free the data acked and
       * wake any higher level process that was blocked waiting for
       * space.
       *
       * If the length is non-zero and the ack didn't move, we're the
       * receiver side. If we're getting packets in-order (the reassembly
       * queue is empty), add the data to the socket buffer and note
       * that we need a delayed ack.
       */

 The path taken that leads to SND.WL2 being starved out is the pure ACK 
 section
 which adjusts the send buffer, updates SND.UNA, SND.FACK, SND.HIGH, 
 frees mbuf
 , send more data if available and returns.
 SND.WL2 is never touched here an so a longer sequence can leave SND.WL2 
 far enough
 behind for the stuck zero window size to occur.

 Proposed fix:
 diff -u -r1.418 tcp_input.c
 --- tcp_input.c    6 Jul 2020 18:49:12 -0000    1.418
 +++ tcp_input.c    2 Sep 2020 07:59:46 -0000
 @@ -1897,6 +1897,19 @@
                   tp->snd_fack = tp->snd_una;
                   if (SEQ_LT(tp->snd_high, tp->snd_una))
                       tp->snd_high = tp->snd_una;
 +                /*
 +                 * drag snd_wl2 along so only newer
 +                 * ACKs can update the window size.
 +                 * also avoids the state where snd_wl2
 +                 * is eventually larger than th_ack and thus
 +                 * blocking the window update mechanism and
 +                 * the connection gets stuck for a loooong
 +                 * time in the zero sized send window state.
 +                 *
 +                 * see PR/kern 55567
 +                 */
 +                tp->snd_wl2 = tp->snd_una;
 +
                   m_freem(m);

                   /*

 Frank

State-Changed-From-To: open->analyzed
State-Changed-By: kardel@NetBSD.org
State-Changed-When: Wed, 02 Sep 2020 08:48:59 +0000
State-Changed-Why:
Analysis complete, fix tested


From: Jukka Ruohonen <jruohonen@iki.fi>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org,
	gnats-admin@netbsd.org, kardel@NetBSD.org
Subject: Re: kern/55567 (tcp-send slows down to slow single byte transfers)
Date: Wed, 2 Sep 2020 12:22:36 +0300

 On Wed, Sep 02, 2020 at 08:49:00AM +0000, kardel@NetBSD.org wrote:
 > Analysis complete, fix tested

 Can we have pullups to netbsd-9? The issue is serious, and seriously
 annoying.

 - Jukka

From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: kern/55567 (tcp-send slows down to slow single byte transfers)
Date: Wed, 2 Sep 2020 11:29:37 +0200

 On 09/02/20 11:25, Jukka Ruohonen wrote:
 > The following reply was made to PR kern/55567; it has been noted by GNATS.
 >
 > From: Jukka Ruohonen <jruohonen@iki.fi>
 > To: gnats-bugs@netbsd.org
 > Cc: kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org,
 > 	gnats-admin@netbsd.org, kardel@NetBSD.org
 > Subject: Re: kern/55567 (tcp-send slows down to slow single byte transfers)
 > Date: Wed, 2 Sep 2020 12:22:36 +0300
 >
 >   On Wed, Sep 02, 2020 at 08:49:00AM +0000, kardel@NetBSD.org wrote:
 >   > Analysis complete, fix tested
 >   
 >   Can we have pullups to netbsd-9? The issue is serious, and seriously
 >   annoying.
 >   
 >   - Jukka
 >   

 Will do - I am just waiting for a second pair of eyes+brain having a 
 look on the analysis and fix

 before I commit and request the pullups for -9 and -8 a bit later.

 I am pretty confident though as I now managed to move ~6Tb without any 
 issues.

From: Jason Thorpe <thorpej@me.com>
To: Frank Kardel <kardel@netbsd.org>
Cc: kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org,
 gnats-bugs@netbsd.org
Subject: Re: kern/55567: tcp-send slows down to slow single byte transfers
 (analysis and fix)
Date: Wed, 2 Sep 2020 07:58:52 -0700

 > On Sep 2, 2020, at 1:05 AM, Frank Kardel <kardel@netbsd.org> wrote:
 >=20
 > The path taken that leads to SND.WL2 being starved out is the pure ACK=20=

 > section
 > which adjusts the send buffer, updates SND.UNA, SND.FACK, SND.HIGH,=20
 > frees mbuf
 > , send more data if available and returns.
 > SND.WL2 is never touched here an so a longer sequence can leave =
 SND.WL2=20
 > far enough
 > behind for the stuck zero window size to occur.
 >=20
 > Proposed fix:

 Frank --

 This analysis looks right to me, and your proposed fix looks correct.  =
 It looks like FreeBSD also has an equivalent assignment to SND.WL2 in =
 the same spot for quite a while now, so I would say the change has had =
 some solid bake time already.  Ship it!

 -- thorpej

From: Frank Kardel <kardel@netbsd.org>
To: Jason Thorpe <thorpej@me.com>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: kern/55567: tcp-send slows down to slow single byte transfers
 (analysis and fix)
Date: Wed, 2 Sep 2020 17:00:35 +0200

 Thanks for double checking!

 Frank


 On 09/02/20 16:58, Jason Thorpe wrote:
 >> On Sep 2, 2020, at 1:05 AM, Frank Kardel <kardel@netbsd.org> wrote:
 >>
 >> The path taken that leads to SND.WL2 being starved out is the pure ACK
 >> section
 >> which adjusts the send buffer, updates SND.UNA, SND.FACK, SND.HIGH,
 >> frees mbuf
 >> , send more data if available and returns.
 >> SND.WL2 is never touched here an so a longer sequence can leave SND.WL2
 >> far enough
 >> behind for the stuck zero window size to occur.
 >>
 >> Proposed fix:
 > Frank --
 >
 > This analysis looks right to me, and your proposed fix looks correct.  It looks like FreeBSD also has an equivalent assignment to SND.WL2 in the same spot for quite a while now, so I would say the change has had some solid bake time already.  Ship it!
 >
 > -- thorpej
 >

From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55567: tcp-send slows down to slow single byte transfers
 (analysis and fix)
Date: Wed, 2 Sep 2020 18:58:38 +0200

 Pullups requested:

 [pullup-9 #1074] Pullup critical TCP fix.
 [pullup-8 #1602] Pullup critical TCP fix.

State-Changed-From-To: analyzed->needs-pullups
State-Changed-By: kardel@NetBSD.org
State-Changed-When: Wed, 02 Sep 2020 17:03:12 +0000
State-Changed-Why:
pull ups have been requested for -8 and -9


State-Changed-From-To: needs-pullups->closed
State-Changed-By: kardel@NetBSD.org
State-Changed-When: Thu, 03 Sep 2020 13:44:12 +0000
State-Changed-Why:
pull ups for -9 and -8 completed.
Thanks.


State-Changed-From-To: closed->needs-pullups
State-Changed-By: kardel@NetBSD.org
State-Changed-When: Fri, 11 Sep 2020 09:10:50 +0000
State-Changed-Why:
second failure mode was fixed


From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55567 (tcp-send slows down to slow single byte transfers)
Date: Fri, 11 Sep 2020 12:29:39 +0200

 Module Name:	src
 Committed By:	kardel
 Date:		Fri Sep 11 09:08:47 UTC 2020

 Modified Files:
 	src/sys/netinet: tcp_input.c

 Log Message:
 PR/kern 55567

 fix the data-only fast path. RCV.UP and SND.WL1 could be left behind
 on long sequences of data only packets. pull them along to avoid relative
 sequence wraps.

 consistent with FreeBSD

 addresses second failure mode of PR/kern 55567.

 pullup to netbsd-8
 pullup to netbsd-9

 To generate a diff of this commit:
 cvs rdiff -u -r1.419 -r1.420 src/sys/netinet/tcp_input.c

From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/55567 (tcp-send slows down to slow single byte transfers)
Date: Fri, 11 Sep 2020 17:00:44 +0200

 [pullup-9 #1081] Pullup another critical TCP fix.
 [pullup-8 #1604] Pullup another critical TCP fix.

State-Changed-From-To: needs-pullups->closed
State-Changed-By: kardel@NetBSD.org
State-Changed-When: Sun, 13 Sep 2020 15:06:56 +0000
State-Changed-Why:
pullups done.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.