NetBSD Problem Report #53940
From www@NetBSD.org Mon Feb 4 05:26:25 2019
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id CA0307A111
for <gnats-bugs@gnats.NetBSD.org>; Mon, 4 Feb 2019 05:26:25 +0000 (UTC)
Message-Id: <20190204052624.8060E7A175@mollari.NetBSD.org>
Date: Mon, 4 Feb 2019 05:26:24 +0000 (UTC)
From: aravind_m1@dell.com
Reply-To: aravind_m1@dell.com
To: gnats-bugs@NetBSD.org
Subject: WM0 device timeout issue in NetBSD 7.1
X-Send-Pr-Version: www-1.0
>Number: 53940
>Category: kern
>Synopsis: WM0 device timeout issue in NetBSD 7.1
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Feb 04 05:30:00 +0000 2019
>Closed-Date: Tue May 03 00:04:42 +0000 2022
>Last-Modified: Tue May 03 00:04:42 +0000 2022
>Originator: Aravind Mani
>Release: NetBSD 7.1
>Organization:
Dell
>Environment:
WM0 timeout issue in NetBSD 7.1
>Description:
We use WM_T_I354 chip type.When we reload continuously,we could able to observe device timeout issue. wm_init(),wm_reset() doesn't help to recover from problem state.The only way to recover is to reload the switch.There was no initialization error.
From wm_print_stats() and wm_pkt_stats(),i don't see any value in the registers listed and the packets are not hitting the hardware.
wm_reset also didn't help to recover the issue.
We didn't remove wm_print_stats and wm_pkt_stats in NetBSD 7.1
I've taken the kernel live core last time when we faced this issue and i could see that the WM PHY was active.
WM0 device timeout keeps on piling after wm_reset.
The management port(WM0)is UP but can't able to ping external network.
Please provide your view on this issue and do you need any other logs to investigate?.
logs:
wm0: device timeout (txfree 4095 txsfree 63 txnext 141)
SStk-1 # vmstat -e
event total rate type
bus_dma loads 95451577 319 misc
vmcmd kills 661 0 misc
vmcmd calls 3731 0 misc
vmem static_bt_inuse 200 0 misc
vmem static_bt_count 200 0 misc
TLB shootdown 182842 0 intr
cpu0 runqueue pull 16763601 56 misc
cpu0 runqueue push 218455 0 misc
cpu0 runqueue stay 29807214 99 misc
cpu0 runqueue localize 199719304 669 misc
softint net/0 1172158 3 misc
softint net block/0 46424 0 misc
softint bio/0 6245 0 misc
softint bio block/0 4 0 misc
softint clk/0 29819349 99 misc
softint clk block/0 145137 0 misc
softint ser/0 44794 0 misc
callout late/0 38366 0 misc
crosscall unicast 11 0 misc
crosscall broadcast 4 0 misc
namecache entries collected 13850 0 misc
namecache under scan target 298154 0 misc
cpu0 timer 29826661 99 intr
cpu0 generic IPI 548755 1 misc
cpu0 FPU synch IPI 3116 0 misc
cpu0 kpreempt IPI 235125 0 misc
cpu1 runqueue pull 18640375 62 misc
cpu1 runqueue push 2168053 7 misc
cpu1 runqueue stay 30124219 100 misc
cpu1 runqueue localize 158923916 532 misc
softint net/1 365 0 misc
softint net block/1 360 0 misc
softint clk/1 29817170 99 misc
softint clk block/1 28745 0 misc
softint ser/1 8658 0 misc
callout late/1 18516 0 misc
cpu1 timer 29826661 99 misc
cpu1 FPU synch IPI 4340 0 misc
cpu1 kpreempt IPI 173706 0 misc
ioapic0 pin 20 172536 0 intr
wm0 txsstall 1088 0 misc
wm0 txdw 183747 0 intr
wm0 txseg0 255914 0 misc
ioapic0 pin 23 18 0 intr
ioapic0 pin 19 6797 0 intr
ioapic0 pin 4 33936 0 intr
kpreempt defer: critical section 7776 0 misc
kpreempt defer: kernel_lock 2793374 9 misc
kpreempt immediate 493760 1 misc
SStk-1 # sysctl -w ddb.command="call wm_pkt_stats(0)"
Total Pkts Recv =0
Missed Pkts Recv =0
Good Pkts Recv =0
No Buff Pkts Recv =0
Mgmt Pkt Recv =0
Mgmt Buff Drop Recv =0
Interrupt Assertion =80
wm_print_stats:
0x4000 : 0
0x4004 : 0
0x4008 : 0
0x400c : 0
0x4010 : 0
0x4014 : 0
0x4018 : 0
0x401c : 0
0x4020 : 0
0x4024 : 0
0x4028 : 0
0x402c : 0
0x4030 : 0
0x4034 : 0
0x4038 : 0
0x403c : 0
0x4040 : 0
0x4044 : 0
0x4048 : 0
0x404c : 0
0x4050 : 0
0x4054 : 0
0x4058 : 0
0x405c : 0
0x4060 : 0
0x4064 : 0
0x4068 : 0
0x406c : 0
0x4070 : 0
0x4074 : 0
0x4078 : 0
0x407c : 0
0x4080 : 0
0x4084 : 0
0x4088 : 0
0x408c : 0
0x4090 : 0
0x4094 : 0
0x4098 : 0
0x409c : 0
0x40a0 : 0
0x40a4 : 0
0x40a8 : 0
0x40ac : 0
0x40b0 : 0
0x40b4 : 0
0x40b8 : 0
0x40bc : 0
0x40c0 : 0
0x40c4 : 0
0x40c8 : 0
0x40cc : 0
0x40d0 : 0
0x40d4 : 0
0x40d8 : 0
0x40dc : 0
0x40e0 : 0
0x40e4 : 0
0x40e8 : 0
0x40ec : 0
0x40f0 : 0
0x40f4 : 0
0x40f8 : 0
0x40fc : 0
0x4100 : 0x24
0x4104 : 0
0x4108 : 0
0x410c : 0
0x4110 : 0
0x4114 : 0
0x4118 : 0
0x411c : 0
0x4120 : 0
0x4124 : 0
0x4128 : 0
0x412c : 0
0x4130 : 0
0x4134 : 0
0x4138 : 0
0x413c : 0
0x4140 : 0
0x4144 : 0
0x4148 : 0
0x414c : 0
0x4150 : 0
0x4154 : 0
>How-To-Repeat:
Reload the switch continuously.
>Fix:
>Release-Note:
>Audit-Trail:
From: <Aravind.M1@dell.com>
To: <gnats-bugs@NetBSD.org>, <kern-bug-people@netbsd.org>,
<gnats-admin@netbsd.org>, <netbsd-bugs@netbsd.org>
Cc:
Subject: RE: kern/53940: WM0 device timeout issue in NetBSD 7.1
Date: Thu, 14 Feb 2019 17:10:09 +0000
Hi Team,
Is anyone looking into this issue?.
Do you need any other output to investigate further?
Regards,
Aravind.
-----Original Message-----
From: gnats-admin@netbsd.org <gnats-admin@netbsd.org>=20
Sent: Monday, February 4, 2019 11:00 AM
To: M1, Aravind
Subject: Re: kern/53940: WM0 device timeout issue in NetBSD 7.1
[EXTERNAL EMAIL]=20
Thank you very much for your problem report.
It has the internal identification `kern/53940'.
The individual assigned to look at your
report is: kern-bug-people.=20
>Category: kern
>Responsible: kern-bug-people
>Synopsis: WM0 device timeout issue in NetBSD 7.1
>Arrival-Date: Mon Feb 04 05:30:00 +0000 2019
From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc:
Subject: Re: kern/53940: WM0 device timeout issue in NetBSD 7.1
Date: Thu, 14 Feb 2019 20:27:50 +0100
Can you please test with later kernel? There were some changes in 8.0
which might not be in 7.1, and many further changes in -current.
You can download -current kernel from the daily builds and boot it
with your existing userland.
Jaromir
Le lun. 4 f=C3=A9vr. 2019 =C3=A0 06:54, <aravind_m1@dell.com> a =C3=A9crit =
:
>
> >Number: 53940
> >Category: kern
> >Synopsis: WM0 device timeout issue in NetBSD 7.1
> >Confidential: no
> >Severity: serious
> >Priority: high
> >Responsible: kern-bug-people
> >State: open
> >Class: sw-bug
> >Submitter-Id: net
> >Arrival-Date: Mon Feb 04 05:30:00 +0000 2019
> >Originator: Aravind Mani
> >Release: NetBSD 7.1
> >Organization:
> Dell
> >Environment:
> WM0 timeout issue in NetBSD 7.1
> >Description:
> We use WM_T_I354 chip type.When we reload continuously,we could able to o=
bserve device timeout issue. wm_init(),wm_reset() doesn't help to recover f=
rom problem state.The only way to recover is to reload the switch.There was=
no initialization error.
> >From wm_print_stats() and wm_pkt_stats(),i don't see any value in the re=
gisters listed and the packets are not hitting the hardware.
> wm_reset also didn't help to recover the issue.
> We didn't remove wm_print_stats and wm_pkt_stats in NetBSD 7.1
>
> I've taken the kernel live core last time when we faced this issue and i =
could see that the WM PHY was active.
> WM0 device timeout keeps on piling after wm_reset.
> The management port(WM0)is UP but can't able to ping external network.
> Please provide your view on this issue and do you need any other logs to =
investigate?.
>
>
> logs:
> wm0: device timeout (txfree 4095 txsfree 63 txnext 141)
>
> SStk-1 # vmstat -e
> event total rate type
> bus_dma loads 95451577 319 misc
> vmcmd kills 661 0 misc
> vmcmd calls 3731 0 misc
> vmem static_bt_inuse 200 0 misc
> vmem static_bt_count 200 0 misc
> TLB shootdown 182842 0 intr
> cpu0 runqueue pull 16763601 56 misc
> cpu0 runqueue push 218455 0 misc
> cpu0 runqueue stay 29807214 99 misc
> cpu0 runqueue localize 199719304 669 misc
> softint net/0 1172158 3 misc
> softint net block/0 46424 0 misc
> softint bio/0 6245 0 misc
> softint bio block/0 4 0 misc
> softint clk/0 29819349 99 misc
> softint clk block/0 145137 0 misc
> softint ser/0 44794 0 misc
> callout late/0 38366 0 misc
> crosscall unicast 11 0 misc
> crosscall broadcast 4 0 misc
> namecache entries collected 13850 0 misc
> namecache under scan target 298154 0 misc
> cpu0 timer 29826661 99 intr
> cpu0 generic IPI 548755 1 misc
> cpu0 FPU synch IPI 3116 0 misc
> cpu0 kpreempt IPI 235125 0 misc
> cpu1 runqueue pull 18640375 62 misc
> cpu1 runqueue push 2168053 7 misc
> cpu1 runqueue stay 30124219 100 misc
> cpu1 runqueue localize 158923916 532 misc
> softint net/1 365 0 misc
> softint net block/1 360 0 misc
> softint clk/1 29817170 99 misc
> softint clk block/1 28745 0 misc
> softint ser/1 8658 0 misc
> callout late/1 18516 0 misc
> cpu1 timer 29826661 99 misc
> cpu1 FPU synch IPI 4340 0 misc
> cpu1 kpreempt IPI 173706 0 misc
> ioapic0 pin 20 172536 0 intr
> wm0 txsstall 1088 0 misc
> wm0 txdw 183747 0 intr
> wm0 txseg0 255914 0 misc
> ioapic0 pin 23 18 0 intr
> ioapic0 pin 19 6797 0 intr
> ioapic0 pin 4 33936 0 intr
> kpreempt defer: critical section 7776 0 misc
> kpreempt defer: kernel_lock 2793374 9 misc
> kpreempt immediate 493760 1 misc
>
>
> SStk-1 # sysctl -w ddb.command=3D"call wm_pkt_stats(0)"
> Total Pkts Recv =3D0
> Missed Pkts Recv =3D0
> Good Pkts Recv =3D0
> No Buff Pkts Recv =3D0
> Mgmt Pkt Recv =3D0
> Mgmt Buff Drop Recv =3D0
> Interrupt Assertion =3D80
>
> wm_print_stats:
>
> 0x4000 : 0
> 0x4004 : 0
> 0x4008 : 0
> 0x400c : 0
> 0x4010 : 0
> 0x4014 : 0
> 0x4018 : 0
> 0x401c : 0
> 0x4020 : 0
> 0x4024 : 0
> 0x4028 : 0
> 0x402c : 0
> 0x4030 : 0
> 0x4034 : 0
> 0x4038 : 0
> 0x403c : 0
> 0x4040 : 0
> 0x4044 : 0
> 0x4048 : 0
> 0x404c : 0
> 0x4050 : 0
> 0x4054 : 0
> 0x4058 : 0
> 0x405c : 0
> 0x4060 : 0
> 0x4064 : 0
> 0x4068 : 0
> 0x406c : 0
> 0x4070 : 0
> 0x4074 : 0
> 0x4078 : 0
> 0x407c : 0
> 0x4080 : 0
> 0x4084 : 0
> 0x4088 : 0
> 0x408c : 0
> 0x4090 : 0
> 0x4094 : 0
> 0x4098 : 0
> 0x409c : 0
> 0x40a0 : 0
> 0x40a4 : 0
> 0x40a8 : 0
> 0x40ac : 0
> 0x40b0 : 0
> 0x40b4 : 0
> 0x40b8 : 0
> 0x40bc : 0
> 0x40c0 : 0
> 0x40c4 : 0
> 0x40c8 : 0
> 0x40cc : 0
> 0x40d0 : 0
> 0x40d4 : 0
> 0x40d8 : 0
> 0x40dc : 0
> 0x40e0 : 0
> 0x40e4 : 0
> 0x40e8 : 0
> 0x40ec : 0
> 0x40f0 : 0
> 0x40f4 : 0
> 0x40f8 : 0
> 0x40fc : 0
> 0x4100 : 0x24
> 0x4104 : 0
> 0x4108 : 0
> 0x410c : 0
> 0x4110 : 0
> 0x4114 : 0
> 0x4118 : 0
> 0x411c : 0
> 0x4120 : 0
> 0x4124 : 0
> 0x4128 : 0
> 0x412c : 0
> 0x4130 : 0
> 0x4134 : 0
> 0x4138 : 0
> 0x413c : 0
> 0x4140 : 0
> 0x4144 : 0
> 0x4148 : 0
> 0x414c : 0
> 0x4150 : 0
> 0x4154 : 0
>
>
> >How-To-Repeat:
> Reload the switch continuously.
> >Fix:
>
State-Changed-From-To: open->closed
State-Changed-By: gutteridge@NetBSD.org
State-Changed-When: Tue, 03 May 2022 00:04:42 +0000
State-Changed-Why:
NetBSD 7 is long EOL, and improvements have been made to wm(4) since. (We can re-open if still reproducible.)
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.