NetBSD Problem Report #53940

From www@NetBSD.org  Mon Feb  4 05:26:25 2019
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id CA0307A111
	for <gnats-bugs@gnats.NetBSD.org>; Mon,  4 Feb 2019 05:26:25 +0000 (UTC)
Message-Id: <20190204052624.8060E7A175@mollari.NetBSD.org>
Date: Mon,  4 Feb 2019 05:26:24 +0000 (UTC)
From: aravind_m1@dell.com
Reply-To: aravind_m1@dell.com
To: gnats-bugs@NetBSD.org
Subject: WM0 device timeout issue in NetBSD 7.1
X-Send-Pr-Version: www-1.0

>Number:         53940
>Category:       kern
>Synopsis:       WM0 device timeout issue in NetBSD 7.1
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Feb 04 05:30:00 +0000 2019
>Closed-Date:    Tue May 03 00:04:42 +0000 2022
>Last-Modified:  Tue May 03 00:04:42 +0000 2022
>Originator:     Aravind Mani
>Release:        NetBSD 7.1
>Organization:
Dell
>Environment:
WM0 timeout issue in NetBSD 7.1
>Description:
We use WM_T_I354 chip type.When we reload continuously,we could able to observe device timeout issue. wm_init(),wm_reset() doesn't help to recover from problem state.The only way to recover is to reload the switch.There was no initialization error.
From wm_print_stats() and wm_pkt_stats(),i don't see any value in the registers listed and the packets are not hitting the hardware.
wm_reset also didn't help to recover the issue.
We didn't remove wm_print_stats and wm_pkt_stats in NetBSD 7.1

I've taken the kernel live core last time when we faced this issue and i could see that the WM PHY was active.
WM0 device timeout keeps on piling after wm_reset.
The management port(WM0)is UP but can't able to ping external network.
Please provide your view on this issue and do you need any other logs to investigate?.


logs:
wm0: device timeout (txfree 4095 txsfree 63 txnext 141)

SStk-1 # vmstat -e
 event                                         total     rate type
 bus_dma loads                              95451577      319 misc
 vmcmd kills                                     661        0 misc
 vmcmd calls                                    3731        0 misc
 vmem static_bt_inuse                            200        0 misc
 vmem static_bt_count                            200        0 misc
 TLB shootdown                                182842        0 intr
 cpu0 runqueue pull                         16763601       56 misc
 cpu0 runqueue push                           218455        0 misc
 cpu0 runqueue stay                         29807214       99 misc
 cpu0 runqueue localize                    199719304      669 misc
 softint net/0                               1172158        3 misc
 softint net block/0                           46424        0 misc
 softint bio/0                                  6245        0 misc
 softint bio block/0                               4        0 misc
 softint clk/0                              29819349       99 misc
 softint clk block/0                          145137        0 misc
 softint ser/0                                 44794        0 misc
 callout late/0                                38366        0 misc
 crosscall unicast                                11        0 misc
 crosscall broadcast                               4        0 misc
 namecache entries collected                   13850        0 misc
 namecache under scan target                  298154        0 misc
 cpu0 timer                                 29826661       99 intr
 cpu0 generic IPI                             548755        1 misc
 cpu0 FPU synch IPI                             3116        0 misc
 cpu0 kpreempt IPI                            235125        0 misc
 cpu1 runqueue pull                         18640375       62 misc
 cpu1 runqueue push                          2168053        7 misc
 cpu1 runqueue stay                         30124219      100 misc
 cpu1 runqueue localize                    158923916      532 misc
 softint net/1                                   365        0 misc
 softint net block/1                             360        0 misc
 softint clk/1                              29817170       99 misc
 softint clk block/1                           28745        0 misc
 softint ser/1                                  8658        0 misc
 callout late/1                                18516        0 misc
 cpu1 timer                                 29826661       99 misc
 cpu1 FPU synch IPI                             4340        0 misc
 cpu1 kpreempt IPI                            173706        0 misc
 ioapic0 pin 20                               172536        0 intr
 wm0 txsstall                                   1088        0 misc
 wm0 txdw                                     183747        0 intr
 wm0 txseg0                                   255914        0 misc
 ioapic0 pin 23                                   18        0 intr
 ioapic0 pin 19                                 6797        0 intr
 ioapic0 pin 4                                 33936        0 intr
 kpreempt defer: critical section               7776        0 misc
 kpreempt defer: kernel_lock                 2793374        9 misc
 kpreempt immediate                           493760        1 misc


 SStk-1 # sysctl -w ddb.command="call wm_pkt_stats(0)"
 Total Pkts Recv     =0
 Missed Pkts Recv    =0
 Good Pkts Recv      =0
 No Buff Pkts Recv   =0
 Mgmt Pkt Recv       =0
 Mgmt Buff Drop Recv =0
 Interrupt Assertion =80

 wm_print_stats:

 0x4000 : 0
 0x4004 : 0
 0x4008 : 0
 0x400c : 0
 0x4010 : 0
 0x4014 : 0
 0x4018 : 0
 0x401c : 0
 0x4020 : 0
 0x4024 : 0
 0x4028 : 0
 0x402c : 0
 0x4030 : 0
 0x4034 : 0
 0x4038 : 0
 0x403c : 0
 0x4040 : 0
 0x4044 : 0
 0x4048 : 0
 0x404c : 0
 0x4050 : 0
 0x4054 : 0
 0x4058 : 0
 0x405c : 0
 0x4060 : 0
 0x4064 : 0
 0x4068 : 0
 0x406c : 0
 0x4070 : 0
 0x4074 : 0
 0x4078 : 0
 0x407c : 0
 0x4080 : 0
 0x4084 : 0
 0x4088 : 0
 0x408c : 0
 0x4090 : 0
 0x4094 : 0
 0x4098 : 0
 0x409c : 0
 0x40a0 : 0
 0x40a4 : 0
 0x40a8 : 0
 0x40ac : 0
 0x40b0 : 0
 0x40b4 : 0
 0x40b8 : 0
 0x40bc : 0
 0x40c0 : 0
 0x40c4 : 0
 0x40c8 : 0
 0x40cc : 0
 0x40d0 : 0
 0x40d4 : 0
 0x40d8 : 0
 0x40dc : 0
 0x40e0 : 0
 0x40e4 : 0
 0x40e8 : 0
 0x40ec : 0
 0x40f0 : 0
 0x40f4 : 0
 0x40f8 : 0
 0x40fc : 0
 0x4100 : 0x24
 0x4104 : 0
 0x4108 : 0
 0x410c : 0
 0x4110 : 0
 0x4114 : 0
 0x4118 : 0
 0x411c : 0
 0x4120 : 0
 0x4124 : 0
 0x4128 : 0
 0x412c : 0
 0x4130 : 0
 0x4134 : 0
 0x4138 : 0
 0x413c : 0
 0x4140 : 0
 0x4144 : 0
 0x4148 : 0
 0x414c : 0
 0x4150 : 0
 0x4154 : 0


>How-To-Repeat:
Reload the switch continuously.
>Fix:

>Release-Note:

>Audit-Trail:
From: <Aravind.M1@dell.com>
To: <gnats-bugs@NetBSD.org>, <kern-bug-people@netbsd.org>,
        <gnats-admin@netbsd.org>, <netbsd-bugs@netbsd.org>
Cc: 
Subject: RE: kern/53940: WM0 device timeout issue in NetBSD 7.1
Date: Thu, 14 Feb 2019 17:10:09 +0000

 Hi Team,

 Is anyone looking into this issue?.
 Do you need any other output to investigate further?

 Regards,
 Aravind.

 -----Original Message-----
 From: gnats-admin@netbsd.org <gnats-admin@netbsd.org>=20
 Sent: Monday, February 4, 2019 11:00 AM
 To: M1, Aravind
 Subject: Re: kern/53940: WM0 device timeout issue in NetBSD 7.1


 [EXTERNAL EMAIL]=20

 Thank you very much for your problem report.
 It has the internal identification `kern/53940'.
 The individual assigned to look at your
 report is: kern-bug-people.=20

 >Category:       kern
 >Responsible:    kern-bug-people
 >Synopsis:       WM0 device timeout issue in NetBSD 7.1
 >Arrival-Date:   Mon Feb 04 05:30:00 +0000 2019

From: =?UTF-8?B?SmFyb23DrXIgRG9sZcSNZWs=?= <jaromir.dolecek@gmail.com>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc: 
Subject: Re: kern/53940: WM0 device timeout issue in NetBSD 7.1
Date: Thu, 14 Feb 2019 20:27:50 +0100

 Can you please test with later kernel? There were some changes in 8.0
 which might not be in 7.1, and many further changes in -current.

 You can download -current kernel from the daily builds and boot it
 with your existing userland.

 Jaromir

 Le lun. 4 f=C3=A9vr. 2019 =C3=A0 06:54, <aravind_m1@dell.com> a =C3=A9crit =
 :
 >
 > >Number:         53940
 > >Category:       kern
 > >Synopsis:       WM0 device timeout issue in NetBSD 7.1
 > >Confidential:   no
 > >Severity:       serious
 > >Priority:       high
 > >Responsible:    kern-bug-people
 > >State:          open
 > >Class:          sw-bug
 > >Submitter-Id:   net
 > >Arrival-Date:   Mon Feb 04 05:30:00 +0000 2019
 > >Originator:     Aravind Mani
 > >Release:        NetBSD 7.1
 > >Organization:
 > Dell
 > >Environment:
 > WM0 timeout issue in NetBSD 7.1
 > >Description:
 > We use WM_T_I354 chip type.When we reload continuously,we could able to o=
 bserve device timeout issue. wm_init(),wm_reset() doesn't help to recover f=
 rom problem state.The only way to recover is to reload the switch.There was=
  no initialization error.
 > >From wm_print_stats() and wm_pkt_stats(),i don't see any value in the re=
 gisters listed and the packets are not hitting the hardware.
 > wm_reset also didn't help to recover the issue.
 > We didn't remove wm_print_stats and wm_pkt_stats in NetBSD 7.1
 >
 > I've taken the kernel live core last time when we faced this issue and i =
 could see that the WM PHY was active.
 > WM0 device timeout keeps on piling after wm_reset.
 > The management port(WM0)is UP but can't able to ping external network.
 > Please provide your view on this issue and do you need any other logs to =
 investigate?.
 >
 >
 > logs:
 > wm0: device timeout (txfree 4095 txsfree 63 txnext 141)
 >
 > SStk-1 # vmstat -e
 >  event                                         total     rate type
 >  bus_dma loads                              95451577      319 misc
 >  vmcmd kills                                     661        0 misc
 >  vmcmd calls                                    3731        0 misc
 >  vmem static_bt_inuse                            200        0 misc
 >  vmem static_bt_count                            200        0 misc
 >  TLB shootdown                                182842        0 intr
 >  cpu0 runqueue pull                         16763601       56 misc
 >  cpu0 runqueue push                           218455        0 misc
 >  cpu0 runqueue stay                         29807214       99 misc
 >  cpu0 runqueue localize                    199719304      669 misc
 >  softint net/0                               1172158        3 misc
 >  softint net block/0                           46424        0 misc
 >  softint bio/0                                  6245        0 misc
 >  softint bio block/0                               4        0 misc
 >  softint clk/0                              29819349       99 misc
 >  softint clk block/0                          145137        0 misc
 >  softint ser/0                                 44794        0 misc
 >  callout late/0                                38366        0 misc
 >  crosscall unicast                                11        0 misc
 >  crosscall broadcast                               4        0 misc
 >  namecache entries collected                   13850        0 misc
 >  namecache under scan target                  298154        0 misc
 >  cpu0 timer                                 29826661       99 intr
 >  cpu0 generic IPI                             548755        1 misc
 >  cpu0 FPU synch IPI                             3116        0 misc
 >  cpu0 kpreempt IPI                            235125        0 misc
 >  cpu1 runqueue pull                         18640375       62 misc
 >  cpu1 runqueue push                          2168053        7 misc
 >  cpu1 runqueue stay                         30124219      100 misc
 >  cpu1 runqueue localize                    158923916      532 misc
 >  softint net/1                                   365        0 misc
 >  softint net block/1                             360        0 misc
 >  softint clk/1                              29817170       99 misc
 >  softint clk block/1                           28745        0 misc
 >  softint ser/1                                  8658        0 misc
 >  callout late/1                                18516        0 misc
 >  cpu1 timer                                 29826661       99 misc
 >  cpu1 FPU synch IPI                             4340        0 misc
 >  cpu1 kpreempt IPI                            173706        0 misc
 >  ioapic0 pin 20                               172536        0 intr
 >  wm0 txsstall                                   1088        0 misc
 >  wm0 txdw                                     183747        0 intr
 >  wm0 txseg0                                   255914        0 misc
 >  ioapic0 pin 23                                   18        0 intr
 >  ioapic0 pin 19                                 6797        0 intr
 >  ioapic0 pin 4                                 33936        0 intr
 >  kpreempt defer: critical section               7776        0 misc
 >  kpreempt defer: kernel_lock                 2793374        9 misc
 >  kpreempt immediate                           493760        1 misc
 >
 >
 >  SStk-1 # sysctl -w ddb.command=3D"call wm_pkt_stats(0)"
 >  Total Pkts Recv     =3D0
 >  Missed Pkts Recv    =3D0
 >  Good Pkts Recv      =3D0
 >  No Buff Pkts Recv   =3D0
 >  Mgmt Pkt Recv       =3D0
 >  Mgmt Buff Drop Recv =3D0
 >  Interrupt Assertion =3D80
 >
 >  wm_print_stats:
 >
 >  0x4000 : 0
 >  0x4004 : 0
 >  0x4008 : 0
 >  0x400c : 0
 >  0x4010 : 0
 >  0x4014 : 0
 >  0x4018 : 0
 >  0x401c : 0
 >  0x4020 : 0
 >  0x4024 : 0
 >  0x4028 : 0
 >  0x402c : 0
 >  0x4030 : 0
 >  0x4034 : 0
 >  0x4038 : 0
 >  0x403c : 0
 >  0x4040 : 0
 >  0x4044 : 0
 >  0x4048 : 0
 >  0x404c : 0
 >  0x4050 : 0
 >  0x4054 : 0
 >  0x4058 : 0
 >  0x405c : 0
 >  0x4060 : 0
 >  0x4064 : 0
 >  0x4068 : 0
 >  0x406c : 0
 >  0x4070 : 0
 >  0x4074 : 0
 >  0x4078 : 0
 >  0x407c : 0
 >  0x4080 : 0
 >  0x4084 : 0
 >  0x4088 : 0
 >  0x408c : 0
 >  0x4090 : 0
 >  0x4094 : 0
 >  0x4098 : 0
 >  0x409c : 0
 >  0x40a0 : 0
 >  0x40a4 : 0
 >  0x40a8 : 0
 >  0x40ac : 0
 >  0x40b0 : 0
 >  0x40b4 : 0
 >  0x40b8 : 0
 >  0x40bc : 0
 >  0x40c0 : 0
 >  0x40c4 : 0
 >  0x40c8 : 0
 >  0x40cc : 0
 >  0x40d0 : 0
 >  0x40d4 : 0
 >  0x40d8 : 0
 >  0x40dc : 0
 >  0x40e0 : 0
 >  0x40e4 : 0
 >  0x40e8 : 0
 >  0x40ec : 0
 >  0x40f0 : 0
 >  0x40f4 : 0
 >  0x40f8 : 0
 >  0x40fc : 0
 >  0x4100 : 0x24
 >  0x4104 : 0
 >  0x4108 : 0
 >  0x410c : 0
 >  0x4110 : 0
 >  0x4114 : 0
 >  0x4118 : 0
 >  0x411c : 0
 >  0x4120 : 0
 >  0x4124 : 0
 >  0x4128 : 0
 >  0x412c : 0
 >  0x4130 : 0
 >  0x4134 : 0
 >  0x4138 : 0
 >  0x413c : 0
 >  0x4140 : 0
 >  0x4144 : 0
 >  0x4148 : 0
 >  0x414c : 0
 >  0x4150 : 0
 >  0x4154 : 0
 >
 >
 > >How-To-Repeat:
 > Reload the switch continuously.
 > >Fix:
 >

State-Changed-From-To: open->closed
State-Changed-By: gutteridge@NetBSD.org
State-Changed-When: Tue, 03 May 2022 00:04:42 +0000
State-Changed-Why:
NetBSD 7 is long EOL, and improvements have been made to wm(4) since. (We can re-open if still reproducible.)

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.