NetBSD Problem Report #57965
From www@netbsd.org Mon Feb 26 19:48:03 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id DF0331A9239
for <gnats-bugs@gnats.NetBSD.org>; Mon, 26 Feb 2024 19:48:03 +0000 (UTC)
Message-Id: <20240226194732.C9E531A923A@mollari.NetBSD.org>
Date: Mon, 26 Feb 2024 19:47:32 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: urtwn(4) command queue can overflow and deadlock
X-Send-Pr-Version: www-1.0
>Number: 57965
>Category: kern
>Synopsis: urtwn(4) command queue can overflow and deadlock
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Feb 26 19:50:00 +0000 2024
>Closed-Date: Mon Mar 11 22:24:04 +0000 2024
>Last-Modified: Mon Mar 11 22:24:04 +0000 2024
>Originator: Taylor R Campbell
>Release: current, 10, 9, 8
>Organization:
The NetURTWN Foundation
>Environment:
>Description:
urtwn_do_async does nothing to prevent the command queue from overflowing, which leads to ring->queued exceeding the number that the task will ever decrement back to zero:
mutex_spin_enter(&sc->sc_task_mtx);
cmd = &ring->cmd[ring->cur];
cmd->cb = cb;
KASSERT(len <= sizeof(cmd->data));
memcpy(cmd->data, arg, len);
ring->cur = (ring->cur + 1) % URTWN_HOST_CMD_RING_COUNT;
/* If there is no pending command already, schedule a task. */
if (!sc->sc_dying && ++ring->queued == 1) {
mutex_spin_exit(&sc->sc_task_mtx);
usb_add_task(sc->sc_udev, &sc->sc_task, USB_TASKQ_DRIVER);
} else
mutex_spin_exit(&sc->sc_task_mtx);
splx(s);
When this happens, urtwn_wait_async hangs forever, so you can't bring the interface down or do anything else with the interface once you make the mistake of trying.
>How-To-Repeat:
exist in a wifi environment that provokes a bunch of state changes (and whatever else) faster than the kernel can process them
>Fix:
(a) stop-gap measure: if ring->queued is already URTWN_HOST_CMD_RING_COUNT, drop the oldest queued command and don't increment ring->queued (but do print a message about overflow)
(b) real fix: stop using a command ring and just use three separate tasks, since two commands of the same type are redundant and can be compressed into one; that way there's no overflow condition
>Release-Note:
>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: netbsd-bugs@NetBSD.org
Subject: Re: kern/57965: urtwn(4) command queue can overflow and deadlock
Date: Mon, 26 Feb 2024 20:24:18 +0000
This is a multi-part message in MIME format.
--=_QJyEfqlAfncoM8fB56hLDQdX5vxg3caH
The attached patch is a candidate workaround, not yet tested.
--=_QJyEfqlAfncoM8fB56hLDQdX5vxg3caH
Content-Type: text/plain; charset="ISO-8859-1"; name="pr57965-urtwncmdqoverflow"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="pr57965-urtwncmdqoverflow.patch"
From 491269b43e8cdbe70b174550cbd70fe8369baacf Mon Sep 17 00:00:00 2001
From: Taylor R Campbell <riastradh@NetBSD.org>
Date: Mon, 26 Feb 2024 19:51:13 +0000
Subject: [PATCH] urtwn(4): Ditch old queued commands on overflow.
Don't increment ring->queued past what the task will decrement.
This is a stop-gap measure; really, we should just have one task for
each operation that is deferred to the task thread.
PR kern/57965
---
sys/dev/usb/if_urtwn.c | 47 ++++++++++++++++++++++++++++++++++++------
1 file changed, 41 insertions(+), 6 deletions(-)
diff --git a/sys/dev/usb/if_urtwn.c b/sys/dev/usb/if_urtwn.c
index 7e7c6c176d6d..f9fd12143fca 100644
--- a/sys/dev/usb/if_urtwn.c
+++ b/sys/dev/usb/if_urtwn.c
@@ -871,6 +871,24 @@ urtwn_tx_beacon(struct urtwn_softc *sc, struct mbuf *m,
return urtwn_tx(sc, m, ni, data);
}
=20
+static void
+urtwn_cmdq_invariants(struct urtwn_softc *sc)
+{
+ struct urtwn_host_cmd_ring *const ring =3D &sc->cmdq;
+
+ KASSERT(mutex_owned(&sc->sc_task_mtx));
+ KASSERTMSG((ring->cur >=3D 0 && ring->cur < URTWN_HOST_CMD_RING_COUNT),
+ "%s: cur=3D%d next=3D%d queued=3D%d",
+ device_xname(sc->sc_dev), ring->cur, ring->next, ring->queued);
+ KASSERTMSG((ring->next >=3D 0 && ring->next < URTWN_HOST_CMD_RING_COUNT),
+ "%s: cur=3D%d next=3D%d queued=3D%d",
+ device_xname(sc->sc_dev), ring->cur, ring->next, ring->queued);
+ KASSERTMSG((ring->queued >=3D 0 &&
+ ring->queued <=3D URTWN_HOST_CMD_RING_COUNT),
+ "%s: %d commands queued",
+ device_xname(sc->sc_dev), ring->queued);
+}
+
static void
urtwn_task(void *arg)
{
@@ -903,7 +921,11 @@ urtwn_task(void *arg)
/* Process host commands. */
s =3D splusb();
mutex_spin_enter(&sc->sc_task_mtx);
+ urtwn_cmdq_invariants(sc);
while (ring->next !=3D ring->cur) {
+ KASSERTMSG(ring->queued > 0, "%s: cur=3D%d next=3D%d queued=3D%d",
+ device_xname(sc->sc_dev),
+ ring->cur, ring->next, ring->queued);
cmd =3D &ring->cmd[ring->next];
mutex_spin_exit(&sc->sc_task_mtx);
splx(s);
@@ -911,6 +933,10 @@ urtwn_task(void *arg)
cmd->cb(sc, cmd->data);
s =3D splusb();
mutex_spin_enter(&sc->sc_task_mtx);
+ urtwn_cmdq_invariants(sc);
+ KASSERTMSG(ring->queued > 0, "%s: cur=3D%d next=3D%d queued=3D%d",
+ device_xname(sc->sc_dev),
+ ring->cur, ring->next, ring->queued);
ring->queued--;
ring->next =3D (ring->next + 1) % URTWN_HOST_CMD_RING_COUNT;
}
@@ -925,6 +951,7 @@ urtwn_do_async(struct urtwn_softc *sc, void (*cb)(struc=
t urtwn_softc *, void *),
{
struct urtwn_host_cmd_ring *ring =3D &sc->cmdq;
struct urtwn_host_cmd *cmd;
+ bool schedtask =3D false;
int s;
=20
URTWNHIST_FUNC();
@@ -933,19 +960,27 @@ urtwn_do_async(struct urtwn_softc *sc, void (*cb)(str=
uct urtwn_softc *, void *),
=20
s =3D splusb();
mutex_spin_enter(&sc->sc_task_mtx);
+ urtwn_cmdq_invariants(sc);
cmd =3D &ring->cmd[ring->cur];
cmd->cb =3D cb;
KASSERT(len <=3D sizeof(cmd->data));
memcpy(cmd->data, arg, len);
ring->cur =3D (ring->cur + 1) % URTWN_HOST_CMD_RING_COUNT;
=20
- /* If there is no pending command already, schedule a task. */
- if (!sc->sc_dying && ++ring->queued =3D=3D 1) {
- mutex_spin_exit(&sc->sc_task_mtx);
- usb_add_task(sc->sc_udev, &sc->sc_task, USB_TASKQ_DRIVER);
- } else
- mutex_spin_exit(&sc->sc_task_mtx);
+ /*
+ * Schedule a task to process the command if need be.
+ */
+ if (!sc->sc_dying) {
+ if (ring->queued =3D=3D URTWN_HOST_CMD_RING_COUNT)
+ device_printf(sc->sc_dev, "command queue overflow\n");
+ else if (ring->queued++ =3D=3D 0)
+ schedtask =3D true;
+ }
+ mutex_spin_exit(&sc->sc_task_mtx);
splx(s);
+
+ if (schedtask)
+ usb_add_task(sc->sc_udev, &sc->sc_task, USB_TASKQ_DRIVER);
}
=20
static void
--=_QJyEfqlAfncoM8fB56hLDQdX5vxg3caH--
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57965 CVS commit: src/sys/dev/usb
Date: Wed, 28 Feb 2024 20:18:13 +0000
Module Name: src
Committed By: riastradh
Date: Wed Feb 28 20:18:13 UTC 2024
Modified Files:
src/sys/dev/usb: if_urtwn.c
Log Message:
urtwn(4): Ditch old queued commands on overflow.
Don't increment ring->queued past what the task will decrement.
This is a stop-gap measure; really, we should just have one task for
each operation that is deferred to the task thread.
PR kern/57965
To generate a diff of this commit:
cvs rdiff -u -r1.108 -r1.109 src/sys/dev/usb/if_urtwn.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->needs-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 28 Feb 2024 21:02:03 +0000
State-Changed-Why:
needs pullup-10, pullup-9, pullup-8
From: Ramiro Aceves <ea1abz@gmail.com>
To: gnats-bugs@netbsd.org, Taylor R Campbell <riastradh@netbsd.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/57965: urtwn(4) command queue can overflow and deadlock
Date: Thu, 29 Feb 2024 13:05:09 +0100
El lun, 26 feb 2024 a las 20:50, <campbell+netbsd@mumble.net> escribi=C3=B3=
:
>
> >Number: 57965
> >Category: kern
> >Synopsis: urtwn(4) command queue can overflow and deadlock
> >Confidential: no
> >Severity: serious
> >Priority: medium
> >Responsible: kern-bug-people
> >State: open
> >Class: sw-bug
> >Submitter-Id: net
> >Arrival-Date: Mon Feb 26 19:50:00 +0000 2024
> >Originator: Taylor R Campbell
> >Release: current, 10, 9, 8
> >Organization:
> The NetURTWN Foundation
> >Environment:
> >Description:
> urtwn_do_async does nothing to prevent the command queue from overflowing=
, which leads to ring->queued exceeding the number that the task will ever =
decrement back to zero:
>
> mutex_spin_enter(&sc->sc_task_mtx);
> cmd =3D &ring->cmd[ring->cur];
> cmd->cb =3D cb;
> KASSERT(len <=3D sizeof(cmd->data));
> memcpy(cmd->data, arg, len);
> ring->cur =3D (ring->cur + 1) % URTWN_HOST_CMD_RING_COUNT;
>
> /* If there is no pending command already, schedule a task. */
> if (!sc->sc_dying && ++ring->queued =3D=3D 1) {
> mutex_spin_exit(&sc->sc_task_mtx);
> usb_add_task(sc->sc_udev, &sc->sc_task, USB_TASKQ_DRIVER)=
;
> } else
> mutex_spin_exit(&sc->sc_task_mtx);
> splx(s);
>
> When this happens, urtwn_wait_async hangs forever, so you can't bring the=
interface down or do anything else with the interface once you make the mi=
stake of trying.
> >How-To-Repeat:
> exist in a wifi environment that provokes a bunch of state changes (and w=
hatever else) faster than the kernel can process them
> >Fix:
> (a) stop-gap measure: if ring->queued is already URTWN_HOST_CMD_RING_COUN=
T, drop the oldest queued command and don't increment ring->queued (but do =
print a message about overflow)
>
> (b) real fix: stop using a command ring and just use three separate tasks=
, since two commands of the same type are redundant and can be compressed i=
nto one; that way there's no overflow condition
>
Hello Taylor,
I do not know if this could be related to the problem you describe
with urtwn driver but just in case it is useful I tell you my
findings.
My RaspberryPi Zero W is using now the urtwn driver with a USB WIFI
adaptor because of bwfm driver does not function well. I made a
service that checks www.google.com and reboots the system if there is
no ping, checking 3 times during 5 min intervals.
With urtwn, network worked fine but router automatic changing channels
leaded to complete network loss, needing reboot.
I then changed the router settings to fixed 11th channel and the
system was running very fine for 12 days, but I let the rebooting
script for recovering the network if everything failed.
Tonight and after 12 days and 17:42 hours, the script restarted the system:
Thu Feb 29 01:32:11 CET 2024
1:32AM up 12 days, 17:42, 0 users, load averages: 0.05, 0.01, 0.00
After running 46 minutes, restarted again:
reboot at:
Thu Feb 29 02:18:00 CET 2024
2:18AM up 46 mins, 0 users, load averages: 0.00, 0.00, 0.00
After running 2:01 hours, restarted again:
reboot at:
Thu Feb 29 04:19:08 CET 2024
4:19AM up 2:01, 0 users, load averages: 0.99, 0.56, 0.25
After running 1:01 hours, restarted again:
Thu Feb 29 05:20:08 CET 2024
5:20AM up 1:01, 0 users, load averages: 0.08, 0.05, 0.02
Now it is running fine for 7 hours:
netbsd-raspa# uptime
12:55PM up 7:35, 2 users, load averages: 0.00, 0.00, 0.00
netbsd-raspa# uname -a
NetBSD netbsd-raspa 10.0_RC3 NetBSD 10.0_RC3 (RPI) #0: Tue Jan 16
08:28:51 UTC 2024
mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/evbarm/compile/RPI evbarm
netbsd-raspa#
There is nothing useful in NetBSD /var/log/messages and /var
directory, but I have a Debian GNU/Linux Raspberry Zero W also running
and recorded this interesting wpa_supplicant events and the times of
the automatic script reboots:
feb 24 03:10:03 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-REGDOM-CHANGE init=3DCORE type=3DWORLD
feb 24 03:10:03 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-REGDOM-CHANGE init=3DUSER type=3DCOUNTRY alpha2=3DGB
feb 24 03:10:04 raspberrypi-meteo wpa_supplicant[597]: wlan0:
Associated with 60:8d:26:32:34:24
feb 24 03:10:04 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-CONNECTED - Connection to 60:8d:26:32:34:24 completed [id=3D0
id_str=3D]
feb 24 03:10:04 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-SUBNET-STATUS-UPDATE status=3D0
feb 24 03:10:04 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-REGDOM-CHANGE init=3DCOUNTRY_IE type=3DCOUNTRY alpha2=3DES
feb 29 01:16:06 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-DISCONNECTED bssid=3D60:8d:26:32:34:24 reason=3D0
locally_generated=3D1
feb 29 01:16:06 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-REGDOM-CHANGE init=3DCORE type=3DWORLD
feb 29 01:16:06 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-REGDOM-CHANGE init=3DUSER type=3DCOUNTRY alpha2=3DGB
feb 29 01:16:18 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:16:23 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:16:29 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:16:35 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:16:40 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:16:46 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:16:52 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:16:57 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:17:03 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:17:09 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:17:14 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:17:20 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:17:25 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:17:31 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:17:37 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:17:42 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:17:48 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:17:54 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:17:59 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:18:05 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:18:10 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:18:16 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:18:22 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:18:27 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:18:33 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:18:39 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:18:44 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:18:50 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:18:55 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:19:01 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:19:07 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:19:12 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:19:18 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:19:24 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:19:29 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:19:35 raspberrypi-meteo wpa_supplicant[597]: wlan0: Failed
to initiate sched scan
feb 29 01:19:41 raspberrypi-meteo wpa_supplicant[597]: wlan0: Trying
to associate with SSID 'MiFibra-3422'
feb 29 01:19:42 raspberrypi-meteo wpa_supplicant[597]: wlan0:
Associated with 60:8d:26:32:34:24
feb 29 01:19:42 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-CONNECTED - Connection to 60:8d:26:32:34:24 completed [id=3D0
id_str=3D]
feb 29 01:19:42 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-SUBNET-STATUS-UPDATE status=3D0
feb 29 01:19:42 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-REGDOM-CHANGE init=3DCOUNTRY_IE type=3DCOUNTRY alpha2=3DES
feb 29 01:20:25 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-DISCONNECTED bssid=3D60:8d:26:32:34:24 reason=3D7
feb 29 01:20:25 raspberrypi-meteo wpa_supplicant[597]: wlan0: Trying
to associate with SSID 'MiFibra-3422'
feb 29 01:20:25 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-REGDOM-CHANGE init=3DCORE type=3DWORLD
feb 29 01:20:25 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-REGDOM-CHANGE init=3DUSER type=3DCOUNTRY alpha2=3DGB
feb 29 01:20:26 raspberrypi-meteo wpa_supplicant[597]: wlan0:
Associated with 60:8d:26:32:34:24
feb 29 01:20:26 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-CONNECTED - Connection to 60:8d:26:32:34:24 completed [id=3D0
id_str=3D]
feb 29 01:20:26 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-SUBNET-STATUS-UPDATE status=3D0
feb 29 01:20:26 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-REGDOM-CHANGE init=3DCOUNTRY_IE type=3DCOUNTRY alpha2=3DES
feb 29 02:00:16 raspberrypi-meteo wpa_supplicant[597]: wlan0:
Associated with 60:8d:26:32:34:24
feb 29 02:00:16 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-CONNECTED - Connection to 60:8d:26:32:34:24 completed [id=3D0
id_str=3D]
feb 29 02:00:16 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-SUBNET-STATUS-UPDATE status=3D0
feb 29 04:00:16 raspberrypi-meteo wpa_supplicant[597]: wlan0:
Associated with 60:8d:26:32:34:24
feb 29 04:00:16 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-CONNECTED - Connection to 60:8d:26:32:34:24 completed [id=3D0
id_str=3D]
feb 29 04:00:16 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-SUBNET-STATUS-UPDATE status=3D0
feb 29 05:00:14 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-DISCONNECTED bssid=3D60:8d:26:32:34:24 reason=3D0
locally_generated=3D1
feb 29 05:00:14 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-REGDOM-CHANGE init=3DCORE type=3DWORLD
feb 29 05:00:14 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-REGDOM-CHANGE init=3DUSER type=3DCOUNTRY alpha2=3DGB
feb 29 05:00:15 raspberrypi-meteo wpa_supplicant[597]: wlan0: Trying
to associate with SSID 'MiFibra-3422'
feb 29 05:00:15 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-ASSOC-REJECT bssid=3D00:00:00:00:00:00 status_code=3D16
feb 29 05:00:16 raspberrypi-meteo wpa_supplicant[597]: wlan0: Trying
to associate with SSID 'MiFibra-3422'
feb 29 05:00:17 raspberrypi-meteo wpa_supplicant[597]: wlan0:
Associated with 60:8d:26:32:34:24
feb 29 05:00:17 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-CONNECTED - Connection to 60:8d:26:32:34:24 completed [id=3D0
id_str=3D]
feb 29 05:00:17 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-SUBNET-STATUS-UPDATE status=3D0
feb 29 05:00:17 raspberrypi-meteo wpa_supplicant[597]: wlan0:
CTRL-EVENT-REGDOM-CHANGE init=3DCOUNTRY_IE type=3DCOUNTRY alpha2=3DES
Hope this could be useful.
Regards.
Ramiro.
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Ramiro Aceves <ea1abz@gmail.com>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org,
gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/57965: urtwn(4) command queue can overflow and deadlock
Date: Thu, 29 Feb 2024 15:13:28 +0000
> Date: Thu, 29 Feb 2024 13:05:09 +0100
> From: Ramiro Aceves <ea1abz@gmail.com>
>
> My RaspberryPi Zero W is using now the urtwn driver with a USB WIFI
> adaptor because of bwfm driver does not function well. I made a
> service that checks www.google.com and reboots the system if there is
> no ping, checking 3 times during 5 min intervals.
>
> With urtwn, network worked fine but router automatic changing channels
> leaded to complete network loss, needing reboot.
Can you share the output of these commands when it fails, before you
reboot?
ifconfig urtwn0
wpa_cli status
dhcpcd -U urtwn0
The symptom in this PR is that the commands will actually just hang in
tstile or mutex, and examination with `ps' (either ps(1), or in
crash(8) or ddb(4)) will show that some process is waiting for
urtwntsk; the stack trace in crash/ddb will show it is waiting in
urtwn_stop.
If your symptom does match, there's a good chance this will resolve
it.
If your symptom doesn't match, can you file a new PR for it?
From: Ramiro Aceves <ea1abz@gmail.com>
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, campbell+netbsd@mumble.net,
riastradh@netbsd.org
Cc:
Subject: Re: kern/57965: urtwn(4) command queue can overflow and deadlock
Date: Thu, 29 Feb 2024 22:40:49 +0100
El 2/29/24 a las 16:15, Taylor R Campbell escribió:
> The following reply was made to PR kern/57965; it has been noted by GNATS.
>
> From: Taylor R Campbell <riastradh@NetBSD.org>
> To: Ramiro Aceves <ea1abz@gmail.com>
> Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org,
> gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
> Subject: Re: kern/57965: urtwn(4) command queue can overflow and deadlock
> Date: Thu, 29 Feb 2024 15:13:28 +0000
>
> > Date: Thu, 29 Feb 2024 13:05:09 +0100
> > From: Ramiro Aceves <ea1abz@gmail.com>
> >
> > My RaspberryPi Zero W is using now the urtwn driver with a USB WIFI
> > adaptor because of bwfm driver does not function well. I made a
> > service that checks www.google.com and reboots the system if there is
> > no ping, checking 3 times during 5 min intervals.
> >
> > With urtwn, network worked fine but router automatic changing channels
> > leaded to complete network loss, needing reboot.
>
> Can you share the output of these commands when it fails, before you
> reboot?
>
> ifconfig urtwn0
> wpa_cli status
> dhcpcd -U urtwn0
>
> The symptom in this PR is that the commands will actually just hang in
> tstile or mutex, and examination with `ps' (either ps(1), or in
> crash(8) or ddb(4)) will show that some process is waiting for
> urtwntsk; the stack trace in crash/ddb will show it is waiting in
> urtwn_stop.
>
> If your symptom does match, there's a good chance this will resolve
> it.
>
> If your symptom doesn't match, can you file a new PR for it?
>
Thanks Taylor for the instructions.
I have reproduced the problem forcing a channel change from the router.
I made a simple diagnose.sh script that is called from the automatic
reboot script just before issuing "shutdown -r" now command.
I do not use dhcpcd daemon, I use static IP address.
a diagnose.log file is recorded:
netbsd-raspa$ cat /root/diagnose.log
Thu Feb 29 22:32:36 CET 2024
# ifconfig urtwn0
urtwn0: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
ssid MiFibra-3422 nwkey 65536:"",0x433cc28429aedde0f4f22d14c5032697,"",""
powersave off
bssid 60:8d:26:32:34:24 chan 1
address: e4:fa:c4:52:ac:4c
media: IEEE802.11 autoselect (OFDM54 mode 11g)
status: active
inet6 fe80::e6fa:c4ff:fe52:ac4c%urtwn0/64 flags 0 scopeid 0x1
inet 192.168.1.230/24 broadcast 192.168.1.255 flags 0
# wpa_cli status
Selected interface 'urtwn0'
22:32:36.067: bssid=60:8d:26:32:34:24
freq=0
ssid=MiFibra-3422
id=0
mode=station
pairwise_cipher=CCMP
group_cipher=CCMP
key_mgmt=WPA2-PSK
wpa_state=COMPLETED
ip_address=192.168.1.230
address=e4:fa:c4:52:ac:4c
# dhcpcd -U urtwn0
dhcpcd is not running
# ps axu
USER PID %CPU %MEM VSZ RSS TTY STAT STARTED TIME COMMAND
root 0 0.0 1.7 0 8008 ? DKl 10:01PM 0:03.66 [system]
root 1 0.0 0.3 5836 1464 ? Is 10:01PM 0:00.06 init
root 214 0.0 0.3 5840 1512 ? S 10:32PM 0:00.02 /bin/sh
/root/diagnose.sh
root 293 0.0 0.7 12184 3252 ? Ss 10:02PM 0:00.07
/usr/sbin/wpa_supplicant -B -B -s -i urtwn0 -D bsd -c /etc/wpa_suppl
root 346 0.0 0.2 5640 768 ? Is 10:02PM 0:00.00
/sbin/devpubd
root 456 0.0 0.5 11124 2164 ? Ss 10:02PM 0:00.12
/usr/sbin/syslogd -s
root 563 0.0 2.2 11256 10312 ? Ss 10:02PM 0:00.58
/usr/sbin/ntpd -p /var/run/ntpd.pid -g
root 793 0.0 0.6 15128 2768 ? Is 10:02PM 0:00.03 sshd:
/usr/sbin/sshd [listener] 0 of 10-100 startups (sshd)
root 809 0.0 1.1 74620 5088 ? Ss 10:02PM 0:00.06
/usr/pkg/sbin/minidlnad -f /usr/pkg/etc/minidlna.conf
postfix 810 0.0 0.8 14948 3836 ? S 10:02PM 0:00.11 pickup -l
-t unix -u
postfix 811 0.0 0.8 15000 3852 ? S 10:02PM 0:00.11 qmgr -l
-t unix -u
root 823 0.0 0.3 5908 1324 ? O 10:32PM 0:00.01 ps -axuw
root 844 0.0 0.5 15360 2488 ? Ss 10:02PM 0:00.09
/usr/libexec/postfix/master -w
root 860 0.0 0.3 5952 1452 ? Is 10:02PM 0:00.03
/usr/sbin/cron
ramiro 919 0.0 1.1 16120 5100 ? I 10:08PM 0:02.82 sshd:
ramiro@pts/0 (sshd)
root 950 0.0 0.5 11392 2232 ? S 10:21PM 0:00.05 ntpd:
asynchronous dns resolver
root 961 0.0 1.5 15980 6692 ? Is 10:07PM 0:00.37 sshd:
ramiro [priv]
root 1048 0.0 0.3 6160 1284 ? Is 10:02PM 0:00.01
/usr/sbin/inetd -l
ramiro 847 0.0 0.4 6192 1868 pts/0 Is 10:08PM 0:00.09 -sh
root 1141 0.0 0.4 6192 1880 pts/0 I+ 10:13PM 0:00.26 sh
root 560 0.0 0.3 5888 1556 ttyE0- S 10:02PM 0:00.03 /bin/sh
/root/nettest
root 963 0.0 0.3 5616 1396 ttyE0 Is+ 10:02PM 0:00.02
/usr/libexec/getty default constty
root 1041 0.0 0.3 5612 1392 ttyE1 Is+ 10:02PM 0:00.02
/usr/libexec/getty Pc ttyE1
root 967 0.0 0.3 5616 1396 ttyE2 Is+ 10:02PM 0:00.02
/usr/libexec/getty Pc ttyE2
root 1112 0.0 0.3 5616 1400 ttyE3 Is+ 10:02PM 0:00.02
/usr/libexec/getty Pc ttyE3
netbsd-raspa$
Commands do not hang, I think...
Regards.
Ramiro
State-Changed-From-To: needs-pullups->pending-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Fri, 01 Mar 2024 15:15:25 +0000
State-Changed-Why:
pullup-10 #616
pullup-9 #1810
pullup-8 #1942
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57965 CVS commit: [netbsd-10] src/sys/dev/usb
Date: Sun, 10 Mar 2024 18:57:11 +0000
Module Name: src
Committed By: martin
Date: Sun Mar 10 18:57:11 UTC 2024
Modified Files:
src/sys/dev/usb [netbsd-10]: if_urtwn.c
Log Message:
Pull up following revision(s) (requested by riastradh in ticket #616):
sys/dev/usb/if_urtwn.c: revision 1.109
urtwn(4): Ditch old queued commands on overflow.
Don't increment ring->queued past what the task will decrement.
This is a stop-gap measure; really, we should just have one task for
each operation that is deferred to the task thread.
PR kern/57965
To generate a diff of this commit:
cvs rdiff -u -r1.105 -r1.105.4.1 src/sys/dev/usb/if_urtwn.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57965 CVS commit: [netbsd-9] src/sys/dev/usb
Date: Sun, 10 Mar 2024 18:59:19 +0000
Module Name: src
Committed By: martin
Date: Sun Mar 10 18:59:19 UTC 2024
Modified Files:
src/sys/dev/usb [netbsd-9]: if_urtwn.c
Log Message:
Pull up following revision(s) (requested by riastradh in ticket #1810):
sys/dev/usb/if_urtwn.c: revision 1.109 (patch)
urtwn(4): Ditch old queued commands on overflow.
Don't increment ring->queued past what the task will decrement.
This is a stop-gap measure; really, we should just have one task for
each operation that is deferred to the task thread.
PR kern/57965
To generate a diff of this commit:
cvs rdiff -u -r1.71.2.6 -r1.71.2.7 src/sys/dev/usb/if_urtwn.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57965 CVS commit: [netbsd-8] src/sys/dev/usb
Date: Sun, 10 Mar 2024 19:00:27 +0000
Module Name: src
Committed By: martin
Date: Sun Mar 10 19:00:27 UTC 2024
Modified Files:
src/sys/dev/usb [netbsd-8]: if_urtwn.c
Log Message:
Pull up following revision(s) (requested by riastradh in ticket #1942):
sys/dev/usb/if_urtwn.c: revision 1.109 (patch)
urtwn(4): Ditch old queued commands on overflow.
Don't increment ring->queued past what the task will decrement.
This is a stop-gap measure; really, we should just have one task for
each operation that is deferred to the task thread.
PR kern/57965
To generate a diff of this commit:
cvs rdiff -u -r1.53.2.6 -r1.53.2.7 src/sys/dev/usb/if_urtwn.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Mon, 11 Mar 2024 22:24:04 +0000
State-Changed-Why:
deadlock fixed and pulled up
(underlying queue mechanism to be dealt with in wifi renovation)
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.