NetBSD Problem Report #57561

From www@netbsd.org  Fri Aug  4 05:32:20 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id EF87D1A9238
	for <gnats-bugs@gnats.NetBSD.org>; Fri,  4 Aug 2023 05:32:19 +0000 (UTC)
Message-Id: <20230804053148.D62EB1A923A@mollari.NetBSD.org>
Date: Fri,  4 Aug 2023 05:31:48 +0000 (UTC)
From: bsdprg@tuta.io
Reply-To: bsdprg@tuta.io
To: gnats-bugs@NetBSD.org
Subject: iwm device timeouts
X-Send-Pr-Version: www-1.0

>Number:         57561
>Category:       kern
>Synopsis:       iwm device timeouts
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Aug 04 05:35:00 +0000 2023
>Last-Modified:  Fri Aug 11 08:55:01 +0000 2023
>Originator:     Salil Wadnerkar
>Release:        HEAD
>Organization:
>Environment:
NetBSD latitude 10.99.7 NetBSD 10.99.7 (MYKERNEL) #1: Thu Aug  3 19:24:20 PDT 2023  swadnerkar@latitude:/usr/obj/sys/arch/amd64/compile/MYKERNEL amd64
>Description:
Every tens of minutes or sometimes after a few minutes, iwm interface times out resulting in loss of connectivity. 
>How-To-Repeat:
This is an intermittent problem. It occurs at least once in an hour, and sometimes multiple times in an hour. I believe anyone using iwm interface should be experiementing these frustrating timeouts which result in loss of connectivity.
>Fix:
The workaround is to execute the following twice:
doas ifconfig iwm0 down
doas ifconfig iwm0 up

It needs to be executed twice, because after the first down and up, I get:
ifconfig: exec_matches: Resource temporarily unavailable

>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/57561: iwm device timeouts
Date: Wed, 9 Aug 2023 08:01:47 +0200

 [please reply to gnats-bugs, do not mail netbsd-bugs directly, and make
 sure to keep the subject formatted properly]

 I don't understand two things:

  - why do you consider a TX queue "getting stuck" not a serious issue?
    Are the queues just blocked for longer than our single watchdog
    expects? Do they recover later? How fast?

  - why does the reset of the interface cause loss of network connection?
    (sounds like a serious issue in the driver reset code, probably not
    syncing current queue pointers with the hardware or something like
    that) - independent of the issue you see, this bug needs to be fixed.

 This sounds like several bugs and your patch just papering over it.
 Maybe (in the end) it is the best way forward, but it should be analyzed
 in more detail.

 Martin

From: bsdprg@tuta.io
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/57561: iwm device timeouts
Date: Thu, 10 Aug 2023 21:27:00 +0200 (CEST)

 Aug 9, 2023, 06:05 by martin@duskware.de:

 > The following reply was made to PR kern/57561; it has been noted by GNATS.
 >
 > From: Martin Husemann <martin@duskware.de>
 > To: gnats-bugs@NetBSD.org
 > Cc:
 > Subject: Re: kern/57561: iwm device timeouts
 > Date: Wed, 9 Aug 2023 08:01:47 +0200
 >
 > [please reply to gnats-bugs, do not mail netbsd-bugs directly, and make
 > sure to keep the subject formatted properly]
 >
 >
 First of all, thanks a lot for letting me know the right mailing list, looking into the PR and providing feedback.

 > I don't understand two things:
 >
 > - why do you consider a TX queue "getting stuck" not a serious issue?
 > Are the queues just blocked for longer than our single watchdog
 > expects? Do they recover later? How fast?
 >
 >
 You are right. It's a serious issue. After thinking about it, I don't understand why this patch seems to work. I will dig deeper and add more diagnostics before creating a patch.


 > - why does the reset of the interface cause loss of network connection?
 > (sounds like a serious issue in the driver reset code, probably not
 > syncing current queue pointers with the hardware or something like
 > that) - independent of the issue you see, this bug needs to be fixed.
 >

 I think I used the word "reset" when I meant "stop". When the watchdog kicks in, the watchdog handler function (iwm_watchdog) calls iwm_stop, which is the "if_stop" handler, and it stops the interface.


 > This sounds like several bugs and your patch just papering over it.
 > Maybe (in the end) it is the best way forward, but it should be analyzed
 > in more detail.
 >
 > Martin
 >

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/57561: iwm device timeouts
Date: Fri, 11 Aug 2023 10:51:01 +0200

 On Thu, Aug 10, 2023 at 07:30:03PM +0000, bsdprg@tuta.io wrote:
 >  I think I used the word "reset" when I meant "stop". When the
 > watchdog kicks in, the watchdog handler function (iwm_watchdog) calls
 > iwm_stop, which is the "if_stop" handler, and it stops the interface.

 But the next transmit packet queued should then start it again, at least
 that is the usual construct used in similar drivers.

 Martin

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.