NetBSD Problem Report #57965

From www@netbsd.org  Mon Feb 26 19:48:03 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id DF0331A9239
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 26 Feb 2024 19:48:03 +0000 (UTC)
Message-Id: <20240226194732.C9E531A923A@mollari.NetBSD.org>
Date: Mon, 26 Feb 2024 19:47:32 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: urtwn(4) command queue can overflow and deadlock
X-Send-Pr-Version: www-1.0

>Number:         57965
>Category:       kern
>Synopsis:       urtwn(4) command queue can overflow and deadlock
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Feb 26 19:50:00 +0000 2024
>Closed-Date:    Mon Mar 11 22:24:04 +0000 2024
>Last-Modified:  Mon Mar 11 22:24:04 +0000 2024
>Originator:     Taylor R Campbell
>Release:        current, 10, 9, 8
>Organization:
The NetURTWN Foundation
>Environment:
>Description:
urtwn_do_async does nothing to prevent the command queue from overflowing, which leads to ring->queued exceeding the number that the task will ever decrement back to zero:

	mutex_spin_enter(&sc->sc_task_mtx);
	cmd = &ring->cmd[ring->cur];
	cmd->cb = cb;
	KASSERT(len <= sizeof(cmd->data));
	memcpy(cmd->data, arg, len);
	ring->cur = (ring->cur + 1) % URTWN_HOST_CMD_RING_COUNT;

	/* If there is no pending command already, schedule a task. */
	if (!sc->sc_dying && ++ring->queued == 1) {
		mutex_spin_exit(&sc->sc_task_mtx);
		usb_add_task(sc->sc_udev, &sc->sc_task, USB_TASKQ_DRIVER);
	} else
		mutex_spin_exit(&sc->sc_task_mtx);
	splx(s);

When this happens, urtwn_wait_async hangs forever, so you can't bring the interface down or do anything else with the interface once you make the mistake of trying.
>How-To-Repeat:
exist in a wifi environment that provokes a bunch of state changes (and whatever else) faster than the kernel can process them
>Fix:
(a) stop-gap measure: if ring->queued is already URTWN_HOST_CMD_RING_COUNT, drop the oldest queued command and don't increment ring->queued (but do print a message about overflow)

(b) real fix: stop using a command ring and just use three separate tasks, since two commands of the same type are redundant and can be compressed into one; that way there's no overflow condition

>Release-Note:

>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: netbsd-bugs@NetBSD.org
Subject: Re: kern/57965: urtwn(4) command queue can overflow and deadlock
Date: Mon, 26 Feb 2024 20:24:18 +0000

 This is a multi-part message in MIME format.
 --=_QJyEfqlAfncoM8fB56hLDQdX5vxg3caH

 The attached patch is a candidate workaround, not yet tested.

 --=_QJyEfqlAfncoM8fB56hLDQdX5vxg3caH
 Content-Type: text/plain; charset="ISO-8859-1"; name="pr57965-urtwncmdqoverflow"
 Content-Transfer-Encoding: quoted-printable
 Content-Disposition: attachment; filename="pr57965-urtwncmdqoverflow.patch"

 From 491269b43e8cdbe70b174550cbd70fe8369baacf Mon Sep 17 00:00:00 2001
 From: Taylor R Campbell <riastradh@NetBSD.org>
 Date: Mon, 26 Feb 2024 19:51:13 +0000
 Subject: [PATCH] urtwn(4): Ditch old queued commands on overflow.

 Don't increment ring->queued past what the task will decrement.

 This is a stop-gap measure; really, we should just have one task for
 each operation that is deferred to the task thread.

 PR kern/57965
 ---
  sys/dev/usb/if_urtwn.c | 47 ++++++++++++++++++++++++++++++++++++------
  1 file changed, 41 insertions(+), 6 deletions(-)

 diff --git a/sys/dev/usb/if_urtwn.c b/sys/dev/usb/if_urtwn.c
 index 7e7c6c176d6d..f9fd12143fca 100644
 --- a/sys/dev/usb/if_urtwn.c
 +++ b/sys/dev/usb/if_urtwn.c
 @@ -871,6 +871,24 @@ urtwn_tx_beacon(struct urtwn_softc *sc, struct mbuf *m,
  	return urtwn_tx(sc, m, ni, data);
  }
 =20
 +static void
 +urtwn_cmdq_invariants(struct urtwn_softc *sc)
 +{
 +	struct urtwn_host_cmd_ring *const ring =3D &sc->cmdq;
 +
 +	KASSERT(mutex_owned(&sc->sc_task_mtx));
 +	KASSERTMSG((ring->cur >=3D 0 && ring->cur < URTWN_HOST_CMD_RING_COUNT),
 +	    "%s: cur=3D%d next=3D%d queued=3D%d",
 +	    device_xname(sc->sc_dev), ring->cur, ring->next, ring->queued);
 +	KASSERTMSG((ring->next >=3D 0 && ring->next < URTWN_HOST_CMD_RING_COUNT),
 +	    "%s: cur=3D%d next=3D%d queued=3D%d",
 +	    device_xname(sc->sc_dev), ring->cur, ring->next, ring->queued);
 +	KASSERTMSG((ring->queued >=3D 0 &&
 +		ring->queued <=3D URTWN_HOST_CMD_RING_COUNT),
 +	    "%s: %d commands queued",
 +	    device_xname(sc->sc_dev), ring->queued);
 +}
 +
  static void
  urtwn_task(void *arg)
  {
 @@ -903,7 +921,11 @@ urtwn_task(void *arg)
  	/* Process host commands. */
  	s =3D splusb();
  	mutex_spin_enter(&sc->sc_task_mtx);
 +	urtwn_cmdq_invariants(sc);
  	while (ring->next !=3D ring->cur) {
 +		KASSERTMSG(ring->queued > 0, "%s: cur=3D%d next=3D%d queued=3D%d",
 +		    device_xname(sc->sc_dev),
 +		    ring->cur, ring->next, ring->queued);
  		cmd =3D &ring->cmd[ring->next];
  		mutex_spin_exit(&sc->sc_task_mtx);
  		splx(s);
 @@ -911,6 +933,10 @@ urtwn_task(void *arg)
  		cmd->cb(sc, cmd->data);
  		s =3D splusb();
  		mutex_spin_enter(&sc->sc_task_mtx);
 +		urtwn_cmdq_invariants(sc);
 +		KASSERTMSG(ring->queued > 0, "%s: cur=3D%d next=3D%d queued=3D%d",
 +		    device_xname(sc->sc_dev),
 +		    ring->cur, ring->next, ring->queued);
  		ring->queued--;
  		ring->next =3D (ring->next + 1) % URTWN_HOST_CMD_RING_COUNT;
  	}
 @@ -925,6 +951,7 @@ urtwn_do_async(struct urtwn_softc *sc, void (*cb)(struc=
 t urtwn_softc *, void *),
  {
  	struct urtwn_host_cmd_ring *ring =3D &sc->cmdq;
  	struct urtwn_host_cmd *cmd;
 +	bool schedtask =3D false;
  	int s;
 =20
  	URTWNHIST_FUNC();
 @@ -933,19 +960,27 @@ urtwn_do_async(struct urtwn_softc *sc, void (*cb)(str=
 uct urtwn_softc *, void *),
 =20
  	s =3D splusb();
  	mutex_spin_enter(&sc->sc_task_mtx);
 +	urtwn_cmdq_invariants(sc);
  	cmd =3D &ring->cmd[ring->cur];
  	cmd->cb =3D cb;
  	KASSERT(len <=3D sizeof(cmd->data));
  	memcpy(cmd->data, arg, len);
  	ring->cur =3D (ring->cur + 1) % URTWN_HOST_CMD_RING_COUNT;
 =20
 -	/* If there is no pending command already, schedule a task. */
 -	if (!sc->sc_dying && ++ring->queued =3D=3D 1) {
 -		mutex_spin_exit(&sc->sc_task_mtx);
 -		usb_add_task(sc->sc_udev, &sc->sc_task, USB_TASKQ_DRIVER);
 -	} else
 -		mutex_spin_exit(&sc->sc_task_mtx);
 +	/*
 +	 * Schedule a task to process the command if need be.
 +	 */
 +	if (!sc->sc_dying) {
 +		if (ring->queued =3D=3D URTWN_HOST_CMD_RING_COUNT)
 +			device_printf(sc->sc_dev, "command queue overflow\n");
 +		else if (ring->queued++ =3D=3D 0)
 +			schedtask =3D true;
 +	}
 +	mutex_spin_exit(&sc->sc_task_mtx);
  	splx(s);
 +
 +	if (schedtask)
 +		usb_add_task(sc->sc_udev, &sc->sc_task, USB_TASKQ_DRIVER);
  }
 =20
  static void

 --=_QJyEfqlAfncoM8fB56hLDQdX5vxg3caH--

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57965 CVS commit: src/sys/dev/usb
Date: Wed, 28 Feb 2024 20:18:13 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Wed Feb 28 20:18:13 UTC 2024

 Modified Files:
 	src/sys/dev/usb: if_urtwn.c

 Log Message:
 urtwn(4): Ditch old queued commands on overflow.

 Don't increment ring->queued past what the task will decrement.

 This is a stop-gap measure; really, we should just have one task for
 each operation that is deferred to the task thread.

 PR kern/57965


 To generate a diff of this commit:
 cvs rdiff -u -r1.108 -r1.109 src/sys/dev/usb/if_urtwn.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->needs-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 28 Feb 2024 21:02:03 +0000
State-Changed-Why:
needs pullup-10, pullup-9, pullup-8


From: Ramiro Aceves <ea1abz@gmail.com>
To: gnats-bugs@netbsd.org, Taylor R Campbell <riastradh@netbsd.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/57965: urtwn(4) command queue can overflow and deadlock
Date: Thu, 29 Feb 2024 13:05:09 +0100

 El lun, 26 feb 2024 a las 20:50, <campbell+netbsd@mumble.net> escribi=C3=B3=
 :
 >
 > >Number:         57965
 > >Category:       kern
 > >Synopsis:       urtwn(4) command queue can overflow and deadlock
 > >Confidential:   no
 > >Severity:       serious
 > >Priority:       medium
 > >Responsible:    kern-bug-people
 > >State:          open
 > >Class:          sw-bug
 > >Submitter-Id:   net
 > >Arrival-Date:   Mon Feb 26 19:50:00 +0000 2024
 > >Originator:     Taylor R Campbell
 > >Release:        current, 10, 9, 8
 > >Organization:
 > The NetURTWN Foundation
 > >Environment:
 > >Description:
 > urtwn_do_async does nothing to prevent the command queue from overflowing=
 , which leads to ring->queued exceeding the number that the task will ever =
 decrement back to zero:
 >
 >         mutex_spin_enter(&sc->sc_task_mtx);
 >         cmd =3D &ring->cmd[ring->cur];
 >         cmd->cb =3D cb;
 >         KASSERT(len <=3D sizeof(cmd->data));
 >         memcpy(cmd->data, arg, len);
 >         ring->cur =3D (ring->cur + 1) % URTWN_HOST_CMD_RING_COUNT;
 >
 >         /* If there is no pending command already, schedule a task. */
 >         if (!sc->sc_dying && ++ring->queued =3D=3D 1) {
 >                 mutex_spin_exit(&sc->sc_task_mtx);
 >                 usb_add_task(sc->sc_udev, &sc->sc_task, USB_TASKQ_DRIVER)=
 ;
 >         } else
 >                 mutex_spin_exit(&sc->sc_task_mtx);
 >         splx(s);
 >
 > When this happens, urtwn_wait_async hangs forever, so you can't bring the=
  interface down or do anything else with the interface once you make the mi=
 stake of trying.
 > >How-To-Repeat:
 > exist in a wifi environment that provokes a bunch of state changes (and w=
 hatever else) faster than the kernel can process them
 > >Fix:
 > (a) stop-gap measure: if ring->queued is already URTWN_HOST_CMD_RING_COUN=
 T, drop the oldest queued command and don't increment ring->queued (but do =
 print a message about overflow)
 >
 > (b) real fix: stop using a command ring and just use three separate tasks=
 , since two commands of the same type are redundant and can be compressed i=
 nto one; that way there's no overflow condition
 >

 Hello Taylor,

 I do not know if this could be related to the problem you describe
 with urtwn driver but just in case it is useful I tell you my
 findings.
 My RaspberryPi Zero W is using now the urtwn driver with a USB WIFI
 adaptor because of bwfm driver does not function well. I made a
 service that checks www.google.com and reboots the system if there is
 no ping,  checking 3 times during  5 min intervals.

 With urtwn, network worked fine but router automatic changing channels
 leaded to complete network loss, needing reboot.

 I then changed the router settings to fixed 11th channel and the
 system was running very fine for 12 days, but I let the rebooting
 script for recovering the network if everything failed.

 Tonight and after 12 days and 17:42 hours, the script restarted the system:


 Thu Feb 29 01:32:11 CET 2024
  1:32AM  up 12 days, 17:42, 0 users, load averages: 0.05, 0.01, 0.00

 After running 46 minutes, restarted again:

 reboot at:
 Thu Feb 29 02:18:00 CET 2024
  2:18AM  up 46 mins, 0 users, load averages: 0.00, 0.00, 0.00

 After running 2:01 hours, restarted again:

 reboot at:
 Thu Feb 29 04:19:08 CET 2024
  4:19AM  up  2:01, 0 users, load averages: 0.99, 0.56, 0.25


 After running 1:01 hours, restarted again:

 Thu Feb 29 05:20:08 CET 2024
  5:20AM  up  1:01, 0 users, load averages: 0.08, 0.05, 0.02

 Now it is running fine for 7 hours:

 netbsd-raspa# uptime
 12:55PM  up  7:35, 2 users, load averages: 0.00, 0.00, 0.00



 netbsd-raspa# uname -a
 NetBSD netbsd-raspa 10.0_RC3 NetBSD 10.0_RC3 (RPI) #0: Tue Jan 16
 08:28:51 UTC 2024
 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/evbarm/compile/RPI evbarm
 netbsd-raspa#

 There is nothing useful in NetBSD  /var/log/messages and /var
 directory, but I have a Debian GNU/Linux Raspberry Zero W also running
 and recorded this interesting wpa_supplicant events and the times of
 the automatic script reboots:


 feb 24 03:10:03 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-REGDOM-CHANGE init=3DCORE type=3DWORLD
 feb 24 03:10:03 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-REGDOM-CHANGE init=3DUSER type=3DCOUNTRY alpha2=3DGB
 feb 24 03:10:04 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 Associated with 60:8d:26:32:34:24
 feb 24 03:10:04 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-CONNECTED - Connection to 60:8d:26:32:34:24 completed [id=3D0
 id_str=3D]
 feb 24 03:10:04 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-SUBNET-STATUS-UPDATE status=3D0
 feb 24 03:10:04 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-REGDOM-CHANGE init=3DCOUNTRY_IE type=3DCOUNTRY alpha2=3DES
 feb 29 01:16:06 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-DISCONNECTED bssid=3D60:8d:26:32:34:24 reason=3D0
 locally_generated=3D1
 feb 29 01:16:06 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-REGDOM-CHANGE init=3DCORE type=3DWORLD
 feb 29 01:16:06 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-REGDOM-CHANGE init=3DUSER type=3DCOUNTRY alpha2=3DGB
 feb 29 01:16:18 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:16:23 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:16:29 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:16:35 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:16:40 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:16:46 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:16:52 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:16:57 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:17:03 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:17:09 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:17:14 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:17:20 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:17:25 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:17:31 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:17:37 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:17:42 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:17:48 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:17:54 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:17:59 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:18:05 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:18:10 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:18:16 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:18:22 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:18:27 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:18:33 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:18:39 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:18:44 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:18:50 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:18:55 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:19:01 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:19:07 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:19:12 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:19:18 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:19:24 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:19:29 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:19:35 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
 to initiate sched scan
 feb 29 01:19:41 raspberrypi-meteo wpa_supplicant[597]: wlan0: Trying
 to associate with SSID 'MiFibra-3422'
 feb 29 01:19:42 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 Associated with 60:8d:26:32:34:24
 feb 29 01:19:42 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-CONNECTED - Connection to 60:8d:26:32:34:24 completed [id=3D0
 id_str=3D]
 feb 29 01:19:42 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-SUBNET-STATUS-UPDATE status=3D0
 feb 29 01:19:42 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-REGDOM-CHANGE init=3DCOUNTRY_IE type=3DCOUNTRY alpha2=3DES
 feb 29 01:20:25 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-DISCONNECTED bssid=3D60:8d:26:32:34:24 reason=3D7
 feb 29 01:20:25 raspberrypi-meteo wpa_supplicant[597]: wlan0: Trying
 to associate with SSID 'MiFibra-3422'
 feb 29 01:20:25 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-REGDOM-CHANGE init=3DCORE type=3DWORLD
 feb 29 01:20:25 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-REGDOM-CHANGE init=3DUSER type=3DCOUNTRY alpha2=3DGB
 feb 29 01:20:26 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 Associated with 60:8d:26:32:34:24
 feb 29 01:20:26 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-CONNECTED - Connection to 60:8d:26:32:34:24 completed [id=3D0
 id_str=3D]
 feb 29 01:20:26 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-SUBNET-STATUS-UPDATE status=3D0
 feb 29 01:20:26 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-REGDOM-CHANGE init=3DCOUNTRY_IE type=3DCOUNTRY alpha2=3DES
 feb 29 02:00:16 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 Associated with 60:8d:26:32:34:24
 feb 29 02:00:16 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-CONNECTED - Connection to 60:8d:26:32:34:24 completed [id=3D0
 id_str=3D]
 feb 29 02:00:16 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-SUBNET-STATUS-UPDATE status=3D0
 feb 29 04:00:16 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 Associated with 60:8d:26:32:34:24
 feb 29 04:00:16 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-CONNECTED - Connection to 60:8d:26:32:34:24 completed [id=3D0
 id_str=3D]
 feb 29 04:00:16 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-SUBNET-STATUS-UPDATE status=3D0
 feb 29 05:00:14 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-DISCONNECTED bssid=3D60:8d:26:32:34:24 reason=3D0
 locally_generated=3D1
 feb 29 05:00:14 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-REGDOM-CHANGE init=3DCORE type=3DWORLD
 feb 29 05:00:14 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-REGDOM-CHANGE init=3DUSER type=3DCOUNTRY alpha2=3DGB
 feb 29 05:00:15 raspberrypi-meteo wpa_supplicant[597]: wlan0: Trying
 to associate with SSID 'MiFibra-3422'
 feb 29 05:00:15 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-ASSOC-REJECT bssid=3D00:00:00:00:00:00 status_code=3D16
 feb 29 05:00:16 raspberrypi-meteo wpa_supplicant[597]: wlan0: Trying
 to associate with SSID 'MiFibra-3422'
 feb 29 05:00:17 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 Associated with 60:8d:26:32:34:24
 feb 29 05:00:17 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-CONNECTED - Connection to 60:8d:26:32:34:24 completed [id=3D0
 id_str=3D]
 feb 29 05:00:17 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-SUBNET-STATUS-UPDATE status=3D0
 feb 29 05:00:17 raspberrypi-meteo wpa_supplicant[597]: wlan0:
 CTRL-EVENT-REGDOM-CHANGE init=3DCOUNTRY_IE type=3DCOUNTRY alpha2=3DES

 Hope this could be useful.

 Regards.
 Ramiro.

From: Taylor R Campbell <riastradh@NetBSD.org>
To: Ramiro Aceves <ea1abz@gmail.com>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org,
	gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/57965: urtwn(4) command queue can overflow and deadlock
Date: Thu, 29 Feb 2024 15:13:28 +0000

 > Date: Thu, 29 Feb 2024 13:05:09 +0100
 > From: Ramiro Aceves <ea1abz@gmail.com>
 > 
 > My RaspberryPi Zero W is using now the urtwn driver with a USB WIFI
 > adaptor because of bwfm driver does not function well. I made a
 > service that checks www.google.com and reboots the system if there is
 > no ping,  checking 3 times during  5 min intervals.
 > 
 > With urtwn, network worked fine but router automatic changing channels
 > leaded to complete network loss, needing reboot.

 Can you share the output of these commands when it fails, before you
 reboot?

 ifconfig urtwn0
 wpa_cli status
 dhcpcd -U urtwn0

 The symptom in this PR is that the commands will actually just hang in
 tstile or mutex, and examination with `ps' (either ps(1), or in
 crash(8) or ddb(4)) will show that some process is waiting for
 urtwntsk; the stack trace in crash/ddb will show it is waiting in
 urtwn_stop.

 If your symptom does match, there's a good chance this will resolve
 it.

 If your symptom doesn't match, can you file a new PR for it?

From: Ramiro Aceves <ea1abz@gmail.com>
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, campbell+netbsd@mumble.net,
 riastradh@netbsd.org
Cc: 
Subject: Re: kern/57965: urtwn(4) command queue can overflow and deadlock
Date: Thu, 29 Feb 2024 22:40:49 +0100

 El 2/29/24 a las 16:15, Taylor R Campbell escribió:
 > The following reply was made to PR kern/57965; it has been noted by GNATS.
 > 
 > From: Taylor R Campbell <riastradh@NetBSD.org>
 > To: Ramiro Aceves <ea1abz@gmail.com>
 > Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org,
 > 	gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
 > Subject: Re: kern/57965: urtwn(4) command queue can overflow and deadlock
 > Date: Thu, 29 Feb 2024 15:13:28 +0000
 > 
 >   > Date: Thu, 29 Feb 2024 13:05:09 +0100
 >   > From: Ramiro Aceves <ea1abz@gmail.com>
 >   >
 >   > My RaspberryPi Zero W is using now the urtwn driver with a USB WIFI
 >   > adaptor because of bwfm driver does not function well. I made a
 >   > service that checks www.google.com and reboots the system if there is
 >   > no ping,  checking 3 times during  5 min intervals.
 >   >
 >   > With urtwn, network worked fine but router automatic changing channels
 >   > leaded to complete network loss, needing reboot.
 >   
 >   Can you share the output of these commands when it fails, before you
 >   reboot?
 >   
 >   ifconfig urtwn0
 >   wpa_cli status
 >   dhcpcd -U urtwn0
 >   
 >   The symptom in this PR is that the commands will actually just hang in
 >   tstile or mutex, and examination with `ps' (either ps(1), or in
 >   crash(8) or ddb(4)) will show that some process is waiting for
 >   urtwntsk; the stack trace in crash/ddb will show it is waiting in
 >   urtwn_stop.
 >   
 >   If your symptom does match, there's a good chance this will resolve
 >   it.
 >   
 >   If your symptom doesn't match, can you file a new PR for it?
 >   

 Thanks Taylor for the instructions.

 I have reproduced the problem forcing a channel change from the router. 
 I made a simple diagnose.sh script that is called from the automatic 
 reboot script just before issuing "shutdown -r" now command.

 I do not use dhcpcd daemon, I use static IP address.

 a diagnose.log file is recorded:

 netbsd-raspa$ cat /root/diagnose.log
 Thu Feb 29 22:32:36 CET 2024
 # ifconfig urtwn0
 urtwn0: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
 	ssid MiFibra-3422 nwkey 65536:"",0x433cc28429aedde0f4f22d14c5032697,"",""
 	powersave off
 	bssid 60:8d:26:32:34:24 chan 1
 	address: e4:fa:c4:52:ac:4c
 	media: IEEE802.11 autoselect (OFDM54 mode 11g)
 	status: active
 	inet6 fe80::e6fa:c4ff:fe52:ac4c%urtwn0/64 flags 0 scopeid 0x1
 	inet 192.168.1.230/24 broadcast 192.168.1.255 flags 0
 # wpa_cli status
 Selected interface 'urtwn0'
 22:32:36.067: bssid=60:8d:26:32:34:24
 freq=0
 ssid=MiFibra-3422
 id=0
 mode=station
 pairwise_cipher=CCMP
 group_cipher=CCMP
 key_mgmt=WPA2-PSK
 wpa_state=COMPLETED
 ip_address=192.168.1.230
 address=e4:fa:c4:52:ac:4c

 # dhcpcd -U urtwn0
 dhcpcd is not running
 # ps axu
 USER     PID %CPU %MEM   VSZ   RSS TTY    STAT STARTED    TIME COMMAND
 root       0  0.0  1.7     0  8008 ?      DKl  10:01PM 0:03.66 [system]
 root       1  0.0  0.3  5836  1464 ?      Is   10:01PM 0:00.06 init
 root     214  0.0  0.3  5840  1512 ?      S    10:32PM 0:00.02 /bin/sh 
 /root/diagnose.sh
 root     293  0.0  0.7 12184  3252 ?      Ss   10:02PM 0:00.07 
 /usr/sbin/wpa_supplicant -B -B -s -i urtwn0 -D bsd -c /etc/wpa_suppl
 root     346  0.0  0.2  5640   768 ?      Is   10:02PM 0:00.00 
 /sbin/devpubd
 root     456  0.0  0.5 11124  2164 ?      Ss   10:02PM 0:00.12 
 /usr/sbin/syslogd -s
 root     563  0.0  2.2 11256 10312 ?      Ss   10:02PM 0:00.58 
 /usr/sbin/ntpd -p /var/run/ntpd.pid -g
 root     793  0.0  0.6 15128  2768 ?      Is   10:02PM 0:00.03 sshd: 
 /usr/sbin/sshd [listener] 0 of 10-100 startups (sshd)
 root     809  0.0  1.1 74620  5088 ?      Ss   10:02PM 0:00.06 
 /usr/pkg/sbin/minidlnad -f /usr/pkg/etc/minidlna.conf
 postfix  810  0.0  0.8 14948  3836 ?      S    10:02PM 0:00.11 pickup -l 
 -t unix -u
 postfix  811  0.0  0.8 15000  3852 ?      S    10:02PM 0:00.11 qmgr -l 
 -t unix -u
 root     823  0.0  0.3  5908  1324 ?      O    10:32PM 0:00.01 ps -axuw
 root     844  0.0  0.5 15360  2488 ?      Ss   10:02PM 0:00.09 
 /usr/libexec/postfix/master -w
 root     860  0.0  0.3  5952  1452 ?      Is   10:02PM 0:00.03 
 /usr/sbin/cron
 ramiro   919  0.0  1.1 16120  5100 ?      I    10:08PM 0:02.82 sshd: 
 ramiro@pts/0 (sshd)
 root     950  0.0  0.5 11392  2232 ?      S    10:21PM 0:00.05 ntpd: 
 asynchronous dns resolver
 root     961  0.0  1.5 15980  6692 ?      Is   10:07PM 0:00.37 sshd: 
 ramiro [priv]
 root    1048  0.0  0.3  6160  1284 ?      Is   10:02PM 0:00.01 
 /usr/sbin/inetd -l
 ramiro   847  0.0  0.4  6192  1868 pts/0  Is   10:08PM 0:00.09 -sh
 root    1141  0.0  0.4  6192  1880 pts/0  I+   10:13PM 0:00.26 sh
 root     560  0.0  0.3  5888  1556 ttyE0- S    10:02PM 0:00.03 /bin/sh 
 /root/nettest
 root     963  0.0  0.3  5616  1396 ttyE0  Is+  10:02PM 0:00.02 
 /usr/libexec/getty default constty
 root    1041  0.0  0.3  5612  1392 ttyE1  Is+  10:02PM 0:00.02 
 /usr/libexec/getty Pc ttyE1
 root     967  0.0  0.3  5616  1396 ttyE2  Is+  10:02PM 0:00.02 
 /usr/libexec/getty Pc ttyE2
 root    1112  0.0  0.3  5616  1400 ttyE3  Is+  10:02PM 0:00.02 
 /usr/libexec/getty Pc ttyE3
 netbsd-raspa$

 Commands do not hang, I think...

 Regards.
 Ramiro






State-Changed-From-To: needs-pullups->pending-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Fri, 01 Mar 2024 15:15:25 +0000
State-Changed-Why:
pullup-10 #616
pullup-9 #1810
pullup-8 #1942


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57965 CVS commit: [netbsd-10] src/sys/dev/usb
Date: Sun, 10 Mar 2024 18:57:11 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sun Mar 10 18:57:11 UTC 2024

 Modified Files:
 	src/sys/dev/usb [netbsd-10]: if_urtwn.c

 Log Message:
 Pull up following revision(s) (requested by riastradh in ticket #616):

 	sys/dev/usb/if_urtwn.c: revision 1.109

 urtwn(4): Ditch old queued commands on overflow.
 Don't increment ring->queued past what the task will decrement.

 This is a stop-gap measure; really, we should just have one task for
 each operation that is deferred to the task thread.

 PR kern/57965


 To generate a diff of this commit:
 cvs rdiff -u -r1.105 -r1.105.4.1 src/sys/dev/usb/if_urtwn.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57965 CVS commit: [netbsd-9] src/sys/dev/usb
Date: Sun, 10 Mar 2024 18:59:19 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sun Mar 10 18:59:19 UTC 2024

 Modified Files:
 	src/sys/dev/usb [netbsd-9]: if_urtwn.c

 Log Message:
 Pull up following revision(s) (requested by riastradh in ticket #1810):

 	sys/dev/usb/if_urtwn.c: revision 1.109 (patch)

 urtwn(4): Ditch old queued commands on overflow.
 Don't increment ring->queued past what the task will decrement.

 This is a stop-gap measure; really, we should just have one task for
 each operation that is deferred to the task thread.

 PR kern/57965


 To generate a diff of this commit:
 cvs rdiff -u -r1.71.2.6 -r1.71.2.7 src/sys/dev/usb/if_urtwn.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57965 CVS commit: [netbsd-8] src/sys/dev/usb
Date: Sun, 10 Mar 2024 19:00:27 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Sun Mar 10 19:00:27 UTC 2024

 Modified Files:
 	src/sys/dev/usb [netbsd-8]: if_urtwn.c

 Log Message:
 Pull up following revision(s) (requested by riastradh in ticket #1942):

 	sys/dev/usb/if_urtwn.c: revision 1.109 (patch)

 urtwn(4): Ditch old queued commands on overflow.
 Don't increment ring->queued past what the task will decrement.

 This is a stop-gap measure; really, we should just have one task for
 each operation that is deferred to the task thread.

 PR kern/57965


 To generate a diff of this commit:
 cvs rdiff -u -r1.53.2.6 -r1.53.2.7 src/sys/dev/usb/if_urtwn.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Mon, 11 Mar 2024 22:24:04 +0000
State-Changed-Why:
deadlock fixed and pulled up
(underlying queue mechanism to be dealt with in wifi renovation)


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.