NetBSD Problem Report #54194

From www@netbsd.org  Sat May 11 12:22:35 2019
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id A34227A156
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 11 May 2019 12:22:35 +0000 (UTC)
Message-Id: <20190511122234.51FAD7A177@mollari.NetBSD.org>
Date: Sat, 11 May 2019 12:22:34 +0000 (UTC)
From: venture37@geeklan.co.uk
Reply-To: venture37@geeklan.co.uk
To: gnats-bugs@NetBSD.org
Subject: TX stall in mec(4)
X-Send-Pr-Version: www-1.0

>Number:         54194
>Category:       port-sgimips
>Synopsis:       TX stall in mec(4)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-sgimips-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat May 11 12:25:00 +0000 2019
>Last-Modified:  Wed Apr 01 14:25:01 +0000 2020
>Originator:     Sevan Janiyan
>Release:        NetBSD-HEAD
>Organization:
>Environment:
>Description:
In the R5k o2 (IP32), mec(4) functions fine on inbound traffic but fails on outbound traffic e.g trying to scp a file from the o2 by another computer or replying to icmp echo request greater than or equal to 4068, ping -s 4067 o2 works. scp transfers start off stalled and stay there.

from tsutsui@:

tcpdump(8) with -v option at the destnation host says
ssh packets have incorrect checksums after some xfers.

scp:
---
15:35:00.448879 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.20.20.65535 > 192.168.20.1.ssh: Flags [S], cksum 0x8b9a (correct), seq 432109867, win 32768, options [mss 1460,nop,wscale 3,sackOK,TS val 1 ecr 0], length 0
15:35:00.448916 IP (tos 0x0, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.20.1.ssh > 192.168.20.20.65535: Flags [S.], cksum 0x3adf (correct), seq 401356989, ack 432109868, win 32768, options [mss 1460,nop,wscale 3,sackOK,TS val 1 ecr 1], length 0

 :
(all packets have correct cksums)
 :

15:35:06.906495 IP (tos 0x8, ttl 64, id 84, offset 0, flags [DF], proto TCP (6), length 1500)
    192.168.20.20.65535 > 192.168.20.1.ssh: Flags [.], cksum 0xa626 (correct), seq 2067:3515, ack 2999, win 4197, options [nop,nop,TS val 14 ecr 14], length 1448
15:35:06.906500 IP (tos 0x8, ttl 64, id 85, offset 0, flags [DF], proto TCP (6), length 1500)
    192.168.20.20.65535 > 192.168.20.1.ssh: Flags [.], cksum 0xc835 (correct), seq 3515:4963, ack 2999, win 4197, options [nop,nop,TS val 14 ecr 14], length 1448
15:35:06.906533 IP (tos 0x20, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.20.1.ssh > 192.168.20.20.65535: Flags [.], cksum 0xbb7a (correct), ack 4963, win 3835, options [nop,nop,TS val 14 ecr 14], length 0
15:35:06.906568 IP (tos 0x20, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.20.1.ssh > 192.168.20.20.65535: Flags [.], cksum 0xba10 (correct), ack 4963, win 4197, options [nop,nop,TS val 14 ecr 14], length 0
15:35:06.906742 IP (tos 0x8, ttl 64, id 86, offset 0, flags [DF], proto TCP (6), length 1500)
    192.168.20.20.65535 > 192.168.20.1.ssh: Flags [.], cksum 0xb87b (incorrect -> 0xf03b), seq 4963:6411, ack 2999, win 4197, options [nop,nop,TS val 14 ecr 14], length 1448
15:35:06.907451 IP (tos 0x8, ttl 64, id 87, offset 0, flags [DF], proto TCP (6), length 1500)
    192.168.20.20.65535 > 192.168.20.1.ssh: Flags [.], cksum 0x1060 (correct), seq 6411:7859, ack 2999, win 4197, options [nop,nop,TS val 14 ecr 14], length 1448
15:35:06.907453 IP (tos 0x8, ttl 64, id 88, offset 0, flags [DF], proto TCP (6), length 1500)
    192.168.20.20.65535 > 192.168.20.1.ssh: Flags [.], cksum 0x92dc (correct), seq 7859:9307, ack 2999, win 4197, options [nop,nop,TS val 14 ecr 14], length 1448
15:35:06.907465 IP (tos 0x20, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 64)
    192.168.20.1.ssh > 192.168.20.20.65535: Flags [.], cksum 0x2662 (correct), ack 4963, win 4197, options [nop,nop,TS val 14 ecr 14,nop,nop,sack 1 {6411:7859}], length 0
15:35:06.907471 IP (tos 0x20, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 64)
    192.168.20.1.ssh > 192.168.20.20.65535: Flags [.], cksum 0x20ba (correct), ack 4963, win 4197, options [nop,nop,TS val 14 ecr 14,nop,nop,sack 1 {6411:9307}], length 0
15:35:06.907700 IP (tos 0x8, ttl 64, id 89, offset 0, flags [DF], proto TCP (6), length 1500)
    192.168.20.20.65535 > 192.168.20.1.ssh: Flags [.], cksum 0x4382 (correct), seq 9307:10755, ack 2999, win 4197, options [nop,nop,TS val 14 ecr 14], length 1448
15:35:06.907702 IP (tos 0x8, ttl 64, id 90, offset 0, flags [DF], proto TCP (6), length 1500)
    192.168.20.20.65535 > 192.168.20.1.ssh: Flags [.], cksum 0x4057 (correct), seq 10755:12203, ack 2999, win 4197, options [nop,nop,TS val 14 ecr 14], length 1448
15:35:06.907716 IP (tos 0x20, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 64)
    192.168.20.1.ssh > 192.168.20.20.65535: Flags [.], cksum 0x1b12 (correct), ack 4963, win 4197, options [nop,nop,TS val 14 ecr 14,nop,nop,sack 1 {6411:10755}], length 0
15:35:06.907721 IP (tos 0x20, ttl 64, id 0, offset 0, flags [DF], proto TCP (6), length 64)
    192.168.20.1.ssh > 192.168.20.20.65535: Flags [.], cksum 0x156a (correct), ack 4963, win 4197, options [nop,nop,TS val 14 ecr 14,nop,nop,sack 1 {6411:12203}], length 0
15:35:06.908576 IP (tos 0x8, ttl 64, id 91, offset 0, flags [DF], proto TCP (6), length 1500)
    192.168.20.20.65535 > 192.168.20.1.ssh: Flags [.], cksum 0xb87b (incorrect -> 0xf03b), seq 4963:6411, ack 2999, win 4197, options [nop,nop,TS val 14 ecr 14], length 1448
15:35:07.904877 IP (tos 0x8, ttl 64, id 92, offset 0, flags [DF], proto TCP (6), length 1500)
    192.168.20.20.65535 > 192.168.20.1.ssh: Flags [.], cksum 0xb879 (incorrect -> 0xf039), seq 4963:6411, ack 2999, win 4197, options [nop,nop,TS val 16 ecr 14], length 1448
15:35:09.904732 IP (tos 0x8, ttl 64, id 93, offset 0, flags [DF], proto TCP (6), length 1500)
    192.168.20.20.65535 > 192.168.20.1.ssh: Flags [.], cksum 0xb875 (incorrect -> 0xf035), seq 4963:6411, ack 2999, win 4197, options [nop,nop,TS val 20 ecr 14], length 1448
15:35:13.904721 IP (tos 0x8, ttl 64, id 94, offset 0, flags [none], proto TCP (6), length 1500)
    192.168.20.20.65535 > 192.168.20.1.ssh: Flags [.], cksum 0xb86d (incorrect -> 0xf02d), seq 4963:6411, ack 2999, win 4197, options [nop,nop,TS val 28 ecr 14], length 1448
15:35:21.904492 IP (tos 0x8, ttl 64, id 95, offset 0, flags [none], proto TCP (6), length 1500)
    192.168.20.20.65535 > 192.168.20.1.ssh: Flags [.], cksum 0xb85d (incorrect -> 0xf01d), seq 4963:6411, ack 2999, win 4197, options [nop,nop,TS val 44 ecr 14], length 1448

 :
---

ping -s 4068:
---
15:45:15.409823 IP (tos 0x0, ttl 255, id 141, offset 0, flags [+], proto ICMP (1), length 1500)
    192.168.20.20 > 192.168.20.1: ICMP echo request, id 59290, seq 0, length 1480
15:45:15.409830 IP (tos 0x0, ttl 255, id 141, offset 1480, flags [+], proto ICMP (1), length 1500)
    192.168.20.20 > 192.168.20.1: icmp
15:45:15.410023 IP (tos 0x0, ttl 255, id 141, offset 2960, flags [none], proto ICMP (1), length 1135)
    192.168.20.20 > 192.168.20.1: icmp
15:45:15.410053 IP (tos 0x0, ttl 255, id 48283, offset 0, flags [+], proto ICMP (1), length 1500)
    192.168.20.1 > 192.168.20.20: ICMP echo reply, id 59290, seq 0, length 1480
15:45:15.410056 IP (tos 0x0, ttl 255, id 48283, offset 1480, flags [+], proto ICMP (1), length 1500)
    192.168.20.1 > 192.168.20.20: icmp
15:45:15.410060 IP (tos 0x0, ttl 255, id 48283, offset 2960, flags [none], proto ICMP (1), length 1135)
    192.168.20.1 > 192.168.20.20: icmp
15:45:16.421871 IP (tos 0x0, ttl 255, id 142, offset 0, flags [+], proto ICMP (1), length 1500)
---

ping -s 4068:
---
15:45:21.372487 IP (tos 0x0, ttl 255, id 145, offset 0, flags [+], proto ICMP (1), length 1500)
    192.168.20.20 > 192.168.20.1: ICMP echo request, id 49440, seq 0, length 1480
15:45:21.372493 IP (tos 0x0, ttl 255, id 145, offset 1480, flags [+], proto ICMP (1), length 1500)
    192.168.20.20 > 192.168.20.1: icmp
15:45:21.372685 IP (tos 0x0, ttl 255, id 145, offset 2960, flags [none], proto ICMP (1), length 1136)
    192.168.20.20 > 192.168.20.1: icmp
15:45:22.381713 IP (tos 0x0, ttl 255, id 146, offset 0, flags [+], proto ICMP (1), length 1500)
    192.168.20.20 > 192.168.20.1: ICMP echo request, id 49440, seq 1, length 1480
15:45:22.381719 IP (tos 0x0, ttl 255, id 146, offset 1480, flags [+], proto ICMP (1), length 1500)
    192.168.20.20 > 192.168.20.1: icmp
15:45:22.381912 IP (tos 0x0, ttl 255, id 146, offset 2960, flags [none], proto ICMP (1), length 1136)
    192.168.20.20 > 192.168.20.1: icmp
 :
---

>How-To-Repeat:
ping -s 4068 o2
or
scp o2:somefile .

>Fix:

>Audit-Trail:
From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: gnats-bugs@netbsd.org
Cc: tsutsui@ceres.dti.ne.jp
Subject: Re: port-sgimips/54194: TX stall in mec(4)
Date: Fri, 3 Jan 2020 07:53:53 +0900

 > >How-To-Repeat:
 > ping -s 4068 o2

 Note there is a report that ping -s 4068 works on INSTALL32_IP3x kernel.

 I guess something wrong in cache flush ops.

 ---
 Izumi Tsutsui

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: gnats-bugs@netbsd.org
Cc: tsutsui@ceres.dti.ne.jp
Subject: Re: port-sgimips/54194: TX stall in mec(4)
Date: Wed, 1 Apr 2020 23:23:51 +0900

 (Finally I've managed to repair my O2 PSU)

 > > >How-To-Repeat:
 > > ping -s 4068 o2
 > 
 > Note there is a report that ping -s 4068 works on INSTALL32_IP3x kernel.

 More notes:

 - "ping -s 4068" fails on GENERIC32_IP3x with "root on sd0a"
   but works with "root on nfs" (not only on RAMDISK)
 - it fails both on 8.0 and 9.0 GENERIC32_IP3x
 - it works on GENERIC32_IP3x kernel built with "options ENABLE_MIPS_4KB_PAGE"

 ---
 Izumi Tsutsui

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.