NetBSD Problem Report #56988
From www@netbsd.org Tue Aug 30 10:57:58 2022
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 599E61A921F
for <gnats-bugs@gnats.NetBSD.org>; Tue, 30 Aug 2022 10:57:58 +0000 (UTC)
Message-Id: <20220830105756.A601A1A923A@mollari.NetBSD.org>
Date: Tue, 30 Aug 2022 10:57:56 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: Bluetooth stack initializes bt_lock too late
X-Send-Pr-Version: www-1.0
>Number: 56988
>Category: kern
>Synopsis: Bluetooth stack initializes bt_lock too late
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Aug 30 11:00:00 +0000 2022
>Last-Modified: Mon Aug 07 13:35:01 +0000 2023
>Originator: Taylor R Campbell
>Release: current
>Organization:
The WetBSD Fountation
>Environment:
>Description:
The Bluetooth stack initializes bt_lock as part of the Bluetooth socket domain initialization routine.
However, the Bluetooth stack also uses bt_lock when attaching Bluetooth HCI devices, which can be detected by autoconf before domaininint. This leads to a null pointer dereference.
>How-To-Repeat:
Boot a machine with ubt(4) and enough of a delay in lwp0 during autoconf that ubt(4) attaches before domaininit in init_main.c.
>Fix:
The following patch creates a driver-class module `netbt' whose initialization routine initializes bt_lock. This happens before configure().
Maybe kinda grody for netbt to be a driver-class module; maybe this should live somewhere in sys/dev/bluetooth.
From 23dfe7b917889c5421f39e4651e1018c8f073d89 Mon Sep 17 00:00:00 2001
From: Taylor R Campbell <riastradh@NetBSD.org>
Date: Mon, 29 Aug 2022 17:46:49 +0000
Subject: [PATCH] WIP: netbt(4): Initialize bt_lock earlier.
Use a driver-class module modcmd init function, instead of a socket
domain init function; the socket-domain ones don't run until after
configure, but we need this to be initialized before configure so
that Bluetooth HCI drivers like ubt(4) can use it.
---
sys/netbt/bt_proto.c | 18 +++++++++++++++++-
1 file changed, 17 insertions(+), 1 deletion(-)
diff --git a/sys/netbt/bt_proto.c b/sys/netbt/bt_proto.c
index 15e2c99d7411..d2ab7ee45161 100644
--- a/sys/netbt/bt_proto.c
+++ b/sys/netbt/bt_proto.c
@@ -36,6 +36,7 @@ __KERNEL_RCSID(0, "$NetBSD: bt_proto.c,v 1.16 2016/01/21 15:41:30 riastradh Exp
#include <sys/param.h>
#include <sys/domain.h>
#include <sys/kernel.h>
+#include <sys/module.h>
#include <sys/protosw.h>
#include <sys/socket.h>
#include <sys/systm.h>
@@ -112,7 +113,22 @@ kmutex_t *bt_lock;
static void
bt_init(void)
+{
+}
+
+MODULE(MODULE_CLASS_DRIVER, netbt, NULL);
+
+static int
+netbt_modcmd(modcmd_t cmd, void *aux)
{
- bt_lock = mutex_obj_alloc(MUTEX_DEFAULT, IPL_NONE);
+ switch (cmd) {
+ case MODULE_CMD_INIT:
+ bt_lock = mutex_obj_alloc(MUTEX_DEFAULT, IPL_NONE);
+ return 0;
+ case MODULE_CMD_FINI:
+ return EBUSY; /* XXX */
+ default:
+ return ENOTTY;
+ }
}
>Audit-Trail:
From: Iain Hibbert <plunky@ogmig.net>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/56988 (Bluetooth stack initializes bt_lock too late)
Date: Sun, 1 Jan 2023 13:24:39 +0000 (GMT)
Hi
I agree this is a bit grody, as the Bluetooth stack is not a module
I guess pcmcia devices could also be affected (bt3c and btbc) though I no
longer have PCMCIA slots available
devices under dev/bluetooth are pseudo and attached via the stack so
nothing there can access this before domain init.
It seems that perhaps the domain init routines should be called before the
interfaces attach, but the other uses of .dom_init don't seem to have
actual devices attaching to the base of the domain and it hasn't been an
issue for net/ as ifinit1() is listed especially in main() before
configure gets called.
so, simpler solution could be to wrap the alloc in RUN_ONCE and also call
bt_init() from hci_attach_pcb() to ensure the mutex is ready; this avoids
entangling with module subsystem, does that sound ok?
iain
Index: bluetooth.h
===================================================================
RCS file: /cvsroot/src/sys/netbt/bluetooth.h,v
retrieving revision 1.12
diff -u -p -r1.12 bluetooth.h
--- bluetooth.h 18 May 2014 14:46:16 -0000 1.12
+++ bluetooth.h 1 Jan 2023 13:01:36 -0000
@@ -128,6 +128,7 @@ extern const struct pr_usrreqs l2cap_usr
extern const struct pr_usrreqs rfcomm_usrreqs;
extern kmutex_t *bt_lock;
+void bt_init(void);
/*
* Debugging stuff
Index: bt_proto.c
===================================================================
RCS file: /cvsroot/src/sys/netbt/bt_proto.c,v
retrieving revision 1.16
diff -u -p -r1.16 bt_proto.c
--- bt_proto.c 21 Jan 2016 15:41:30 -0000 1.16
+++ bt_proto.c 1 Jan 2023 13:01:36 -0000
@@ -36,6 +36,7 @@ __KERNEL_RCSID(0, "$NetBSD: bt_proto.c,v
#include <sys/param.h>
#include <sys/domain.h>
#include <sys/kernel.h>
+#include <sys/once.h>
#include <sys/protosw.h>
#include <sys/socket.h>
#include <sys/systm.h>
@@ -50,8 +51,6 @@ __KERNEL_RCSID(0, "$NetBSD: bt_proto.c,v
DOMAIN_DEFINE(btdomain); /* forward declare and add to link set */
-static void bt_init(void);
-
PR_WRAP_CTLOUTPUT(hci_ctloutput)
PR_WRAP_CTLOUTPUT(sco_ctloutput)
PR_WRAP_CTLOUTPUT(l2cap_ctloutput)
@@ -110,9 +109,19 @@ struct domain btdomain = {
kmutex_t *bt_lock;
-static void
-bt_init(void)
+static int
+bt_init_once(void)
{
bt_lock = mutex_obj_alloc(MUTEX_DEFAULT, IPL_NONE);
+
+ return 0;
+}
+
+void
+bt_init(void)
+{
+ static ONCE_DECL(control);
+
+ RUN_ONCE(&control, bt_init_once);
}
Index: hci_unit.c
===================================================================
RCS file: /cvsroot/src/sys/netbt/hci_unit.c,v
retrieving revision 1.16
diff -u -p -r1.16 hci_unit.c
--- hci_unit.c 7 Aug 2021 16:19:18 -0000 1.16
+++ hci_unit.c 1 Jan 2023 13:01:36 -0000
@@ -93,6 +93,9 @@ hci_attach_pcb(const struct hci_if *hci_
KASSERT(hci_if->output_sco != NULL);
KASSERT(hci_if->get_stats != NULL);
+ /* we can be reached via autoconf before the stack is initialized */
+ bt_init();
+
unit = malloc(sizeof(struct hci_unit), M_BLUETOOTH, M_ZERO | M_WAITOK);
KASSERT(unit != NULL);
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Iain Hibbert <plunky@ogmig.net>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/56988 (Bluetooth stack initializes bt_lock too late)
Date: Sun, 1 Jan 2023 19:03:00 +0000
> Date: Sun, 1 Jan 2023 13:24:39 +0000 (GMT)
> From: Iain Hibbert <plunky@ogmig.net>
>=20
> I agree this is a bit grody, as the Bluetooth stack is not a module
The stack doesn't have to be _loadable_ as a module to take advantage
of module initialization. (See, e.g., dev/cons.c.)
The only grody part about what I suggested, in my opinion, is making
it a _driver-class_ module, since it's not actually an autoconf device
driver -- it just needs its initialization to run before configure
runs.
Using modules can also help to formalize dependencies between
subsystems -- dependencies which are historically only informally
implied by the topological sort encoded in main.
> It seems that perhaps the domain init routines should be called before th=
e=20
> interfaces attach, but the other uses of .dom_init don't seem to have=20
> actual devices attaching to the base of the domain and it hasn't been an=
=20
> issue for net/ as ifinit1() is listed especially in main() before=20
> configure gets called.
Calling domaininit earlier in main sounds plausible to me. That might
obviate the need for the weird `XXX must be after domaininit()'
function ifinit_post. Unclear if ifinit/domaininit have any ordering
requirements, or if that XXX is even still applicable.
> so, simpler solution could be to wrap the alloc in RUN_ONCE and also call=
=20
> bt_init() from hci_attach_pcb() to ensure the mutex is ready; this avoids=
=20
> entangling with module subsystem, does that sound ok?
I'd like to avoid RUN_ONCE, except as a last resort; it generally
indicates an incoherent design, and we have too many of those already.
However, this should be fine for pullup to netbsd-10, to avoid any
complicated fallout from changing module initialization order or
domaininit/ifinit ordering in main.
So here's a straw proposal, subject to actual testing:
1. Pull up the RUN_ONCE approach to netbsd-10.
Test: Insert kpause(10*hz) in bt_init_once, verify a machine with
ubt(4) boots without crashing on the mutex.
2. Try changing main from
ifinit1
...
configure
...
ifinit
domaininit
ifinit_post
to
ifinit1
domaininit
...
configure
...
ifinit
ifinit_post
and then merge ifinit and ifinit_post. I skimmed all the .dom_init
functions, and they look likely to be safe to use early on. I also
found a probably-unnecessary RUN_ONCE in key_init (arising from the
time when percpu(9) couldn't be used that early).
After that, I'm tempted to move everything from ifinit1 and
ifinit_post into a single ifinit that happens before domaininit --
what's left in ifinit and ifinit_post is...not much!
Sound reasonable?
From: Iain Hibbert <plunky@ogmig.net>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/56988 (Bluetooth stack initializes bt_lock too late)
Date: Fri, 6 Jan 2023 20:58:12 +0000 (GMT)
On Sun, 1 Jan 2023, Taylor R Campbell wrote:
> > Date: Sun, 1 Jan 2023 13:24:39 +0000 (GMT)
> > From: Iain Hibbert <plunky@ogmig.net>
> >
> > I agree this is a bit grody, as the Bluetooth stack is not a module
>
> The stack doesn't have to be _loadable_ as a module to take advantage
> of module initialization. (See, e.g., dev/cons.c.)
Well ok, but I think that might also be grody :)
> Using modules can also help to formalize dependencies between
> subsystems -- dependencies which are historically only informally
> implied by the topological sort encoded in main.
Its true, and I have thought that as a subsytem netbt/ may be more suited
to modularity than many others. However, it is not one as yet and I
thought of something before which could be difficult though I can't recall
it just now.
> > so, simpler solution could be to wrap the alloc in RUN_ONCE and also call
> > bt_init() from hci_attach_pcb() to ensure the mutex is ready; this avoids
> > entangling with module subsystem, does that sound ok?
>
> I'd like to avoid RUN_ONCE, except as a last resort; it generally
> indicates an incoherent design, and we have too many of those already.
Ok, and I note that RUN_ONCE is a little racy as it doesn't hold
another caller back, and this function should be quick but it does
alloc(PR_WAITOK)
> However, this should be fine for pullup to netbsd-10, to avoid any
> complicated fallout from changing module initialization order or
> domaininit/ifinit ordering in main.
>
> So here's a straw proposal, subject to actual testing:
>
> 1. Pull up the RUN_ONCE approach to netbsd-10.
>
> Test: Insert kpause(10*hz) in bt_init_once, verify a machine with
> ubt(4) boots without crashing on the mutex.
I can do that, but I'm not sure its useful? Thinking it through if kpause
was before the alloc the second caller would crash taking the mutex, and
if after the alloc it would never crash just delay the boot whichever got
there first.
The alloc should not sleep this early in boot, but if it did then a second
caller would also crash.
In fact, I'm coming around to the module_init method. The mutex should be
ready before anything can look at it. To be clear, I don't like using
module init, its a cheat to attach an init function to something like that
just because it happens at the right time. But, using RUN_ONCE doesn't
seem correct either; it is not for synchronization.
iain
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Iain Hibbert <plunky@ogmig.net>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/56988 (Bluetooth stack initializes bt_lock too late)
Date: Fri, 6 Jan 2023 22:19:47 +0000
> Date: Fri, 6 Jan 2023 20:58:12 +0000 (GMT)
> From: Iain Hibbert <plunky@ogmig.net>
>
> Ok, and I note that RUN_ONCE is a little racy as it doesn't hold
> another caller back, and this function should be quick but it does
> alloc(PR_WAITOK)
RUN_ONCE isn't racy -- there is an underlying lock and condvar, which
are initialized very early in main. But it scatters large-scale
subsystem dependencies deep inside the logic of the code, instead of
at a higher level like the module declarations.
It may be nice for modules to be loadable and unloadable, but we can
also take advantage of the concept for organizing monolithic kernels
even if the components can't be loaded or unloaded.
> > 1. Pull up the RUN_ONCE approach to netbsd-10.
> >
> > Test: Insert kpause(10*hz) in bt_init_once, verify a machine with
> > ubt(4) boots without crashing on the mutex.
>
> I can do that, but I'm not sure its useful? Thinking it through if kpause
> was before the alloc the second caller would crash taking the mutex, and
> if after the alloc it would never crash just delay the boot whichever got
> there first.
That's exactly the idea -- do it before initializing bt_lock.
- If this approach works, it will prevent the race -- anyone trying to
use bt_lock will wait until the first RUN_ONCE call has completed.
- If this approach doesn't work, it will likely crash again -- there's
still a path to using bt_lock that doesn't wait for the RUN_ONCE.
We don't keep the kpause(10*hz) committed, of course -- just use it to
test, and then commit without the kpause.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/56988 CVS commit: src/sys/netbt
Date: Mon, 7 Aug 2023 13:31:54 +0000
Module Name: src
Committed By: riastradh
Date: Mon Aug 7 13:31:54 UTC 2023
Modified Files:
src/sys/netbt: bt_proto.c
Log Message:
netbt(4): Initialize bt_lock earlier.
Use a driver-class module modcmd init function, instead of a socket
domain init function; the socket-domain ones don't run until after
configure, but we need this to be initialized before configure so
that Bluetooth HCI drivers like ubt(4) can use it.
This is suboptimal but it's the least intrusive way I've thought of
to get this working, even if it's a little grody to make netbt a
`driver-class' (builtin) module. Note that this doesn't mean netbt
becomes dynamically loadable or unloadable; we're just using a module
for initialization ordering.
PR kern/56988
XXX pullup-10
To generate a diff of this commit:
cvs rdiff -u -r1.16 -r1.17 src/sys/netbt/bt_proto.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.