NetBSD Problem Report #56988

From www@netbsd.org  Tue Aug 30 10:57:58 2022
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 599E61A921F
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 30 Aug 2022 10:57:58 +0000 (UTC)
Message-Id: <20220830105756.A601A1A923A@mollari.NetBSD.org>
Date: Tue, 30 Aug 2022 10:57:56 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: Bluetooth stack initializes bt_lock too late
X-Send-Pr-Version: www-1.0

>Number:         56988
>Category:       kern
>Synopsis:       Bluetooth stack initializes bt_lock too late
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Aug 30 11:00:00 +0000 2022
>Last-Modified:  Mon Aug 07 13:35:01 +0000 2023
>Originator:     Taylor R Campbell
>Release:        current
>Organization:
The WetBSD Fountation
>Environment:
>Description:
The Bluetooth stack initializes bt_lock as part of the Bluetooth socket domain initialization routine.

However, the Bluetooth stack also uses bt_lock when attaching Bluetooth HCI devices, which can be detected by autoconf before domaininint.  This leads to a null pointer dereference.
>How-To-Repeat:
Boot a machine with ubt(4) and enough of a delay in lwp0 during autoconf that ubt(4) attaches before domaininit in init_main.c.
>Fix:
The following patch creates a driver-class module `netbt' whose initialization routine initializes bt_lock.  This happens before configure().

Maybe kinda grody for netbt to be a driver-class module; maybe this should live somewhere in sys/dev/bluetooth.

From 23dfe7b917889c5421f39e4651e1018c8f073d89 Mon Sep 17 00:00:00 2001
From: Taylor R Campbell <riastradh@NetBSD.org>
Date: Mon, 29 Aug 2022 17:46:49 +0000
Subject: [PATCH] WIP: netbt(4): Initialize bt_lock earlier.

Use a driver-class module modcmd init function, instead of a socket
domain init function; the socket-domain ones don't run until after
configure, but we need this to be initialized before configure so
that Bluetooth HCI drivers like ubt(4) can use it.
---
 sys/netbt/bt_proto.c | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)

diff --git a/sys/netbt/bt_proto.c b/sys/netbt/bt_proto.c
index 15e2c99d7411..d2ab7ee45161 100644
--- a/sys/netbt/bt_proto.c
+++ b/sys/netbt/bt_proto.c
@@ -36,6 +36,7 @@ __KERNEL_RCSID(0, "$NetBSD: bt_proto.c,v 1.16 2016/01/21 15:41:30 riastradh Exp
 #include <sys/param.h>
 #include <sys/domain.h>
 #include <sys/kernel.h>
+#include <sys/module.h>
 #include <sys/protosw.h>
 #include <sys/socket.h>
 #include <sys/systm.h>
@@ -112,7 +113,22 @@ kmutex_t *bt_lock;

 static void
 bt_init(void)
+{
+}
+
+MODULE(MODULE_CLASS_DRIVER, netbt, NULL);
+
+static int
+netbt_modcmd(modcmd_t cmd, void *aux)
 {

-	bt_lock = mutex_obj_alloc(MUTEX_DEFAULT, IPL_NONE);
+	switch (cmd) {
+	case MODULE_CMD_INIT:
+		bt_lock = mutex_obj_alloc(MUTEX_DEFAULT, IPL_NONE);
+		return 0;
+	case MODULE_CMD_FINI:
+		return EBUSY;	/* XXX */
+	default:
+		return ENOTTY;
+	}
 }

>Audit-Trail:
From: Iain Hibbert <plunky@ogmig.net>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56988 (Bluetooth stack initializes bt_lock too late)
Date: Sun, 1 Jan 2023 13:24:39 +0000 (GMT)

 Hi

 I agree this is a bit grody, as the Bluetooth stack is not a module

 I guess pcmcia devices could also be affected (bt3c and btbc) though I no 
 longer have PCMCIA slots available

 devices under dev/bluetooth are pseudo and attached via the stack so 
 nothing there can access this before domain init.

 It seems that perhaps the domain init routines should be called before the 
 interfaces attach, but the other uses of .dom_init don't seem to have 
 actual devices attaching to the base of the domain and it hasn't been an 
 issue for net/ as ifinit1() is listed especially in main() before 
 configure gets called.

 so, simpler solution could be to wrap the alloc in RUN_ONCE and also call 
 bt_init() from hci_attach_pcb() to ensure the mutex is ready; this avoids 
 entangling with module subsystem, does that sound ok?

 iain

 Index: bluetooth.h
 ===================================================================
 RCS file: /cvsroot/src/sys/netbt/bluetooth.h,v
 retrieving revision 1.12
 diff -u -p -r1.12 bluetooth.h
 --- bluetooth.h	18 May 2014 14:46:16 -0000	1.12
 +++ bluetooth.h	1 Jan 2023 13:01:36 -0000
 @@ -128,6 +128,7 @@ extern const struct pr_usrreqs l2cap_usr
  extern const struct pr_usrreqs rfcomm_usrreqs;

  extern kmutex_t *bt_lock;
 +void bt_init(void);

  /*
   * Debugging stuff
 Index: bt_proto.c
 ===================================================================
 RCS file: /cvsroot/src/sys/netbt/bt_proto.c,v
 retrieving revision 1.16
 diff -u -p -r1.16 bt_proto.c
 --- bt_proto.c	21 Jan 2016 15:41:30 -0000	1.16
 +++ bt_proto.c	1 Jan 2023 13:01:36 -0000
 @@ -36,6 +36,7 @@ __KERNEL_RCSID(0, "$NetBSD: bt_proto.c,v
  #include <sys/param.h>
  #include <sys/domain.h>
  #include <sys/kernel.h>
 +#include <sys/once.h>
  #include <sys/protosw.h>
  #include <sys/socket.h>
  #include <sys/systm.h>
 @@ -50,8 +51,6 @@ __KERNEL_RCSID(0, "$NetBSD: bt_proto.c,v

  DOMAIN_DEFINE(btdomain);	/* forward declare and add to link set */

 -static void	bt_init(void);
 -
  PR_WRAP_CTLOUTPUT(hci_ctloutput)
  PR_WRAP_CTLOUTPUT(sco_ctloutput)
  PR_WRAP_CTLOUTPUT(l2cap_ctloutput)
 @@ -110,9 +109,19 @@ struct domain btdomain = {

  kmutex_t *bt_lock;

 -static void
 -bt_init(void)
 +static int
 +bt_init_once(void)
  {

  	bt_lock = mutex_obj_alloc(MUTEX_DEFAULT, IPL_NONE);
 +
 +	return 0;
 +}
 +
 +void
 +bt_init(void)
 +{
 +	static ONCE_DECL(control);
 +
 +	RUN_ONCE(&control, bt_init_once);	
  }
 Index: hci_unit.c
 ===================================================================
 RCS file: /cvsroot/src/sys/netbt/hci_unit.c,v
 retrieving revision 1.16
 diff -u -p -r1.16 hci_unit.c
 --- hci_unit.c	7 Aug 2021 16:19:18 -0000	1.16
 +++ hci_unit.c	1 Jan 2023 13:01:36 -0000
 @@ -93,6 +93,9 @@ hci_attach_pcb(const struct hci_if *hci_
  	KASSERT(hci_if->output_sco != NULL);
  	KASSERT(hci_if->get_stats != NULL);

 +	/* we can be reached via autoconf before the stack is initialized */
 +	bt_init();
 +
  	unit = malloc(sizeof(struct hci_unit), M_BLUETOOTH, M_ZERO | M_WAITOK);
  	KASSERT(unit != NULL);


From: Taylor R Campbell <riastradh@NetBSD.org>
To: Iain Hibbert <plunky@ogmig.net>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/56988 (Bluetooth stack initializes bt_lock too late)
Date: Sun, 1 Jan 2023 19:03:00 +0000

 > Date: Sun, 1 Jan 2023 13:24:39 +0000 (GMT)
 > From: Iain Hibbert <plunky@ogmig.net>
 >=20
 > I agree this is a bit grody, as the Bluetooth stack is not a module

 The stack doesn't have to be _loadable_ as a module to take advantage
 of module initialization.  (See, e.g., dev/cons.c.)

 The only grody part about what I suggested, in my opinion, is making
 it a _driver-class_ module, since it's not actually an autoconf device
 driver -- it just needs its initialization to run before configure
 runs.

 Using modules can also help to formalize dependencies between
 subsystems -- dependencies which are historically only informally
 implied by the topological sort encoded in main.

 > It seems that perhaps the domain init routines should be called before th=
 e=20
 > interfaces attach, but the other uses of .dom_init don't seem to have=20
 > actual devices attaching to the base of the domain and it hasn't been an=
 =20
 > issue for net/ as ifinit1() is listed especially in main() before=20
 > configure gets called.

 Calling domaininit earlier in main sounds plausible to me.  That might
 obviate the need for the weird `XXX must be after domaininit()'
 function ifinit_post.  Unclear if ifinit/domaininit have any ordering
 requirements, or if that XXX is even still applicable.

 > so, simpler solution could be to wrap the alloc in RUN_ONCE and also call=
 =20
 > bt_init() from hci_attach_pcb() to ensure the mutex is ready; this avoids=
 =20
 > entangling with module subsystem, does that sound ok?

 I'd like to avoid RUN_ONCE, except as a last resort; it generally
 indicates an incoherent design, and we have too many of those already.
 However, this should be fine for pullup to netbsd-10, to avoid any
 complicated fallout from changing module initialization order or
 domaininit/ifinit ordering in main.

 So here's a straw proposal, subject to actual testing:

 1. Pull up the RUN_ONCE approach to netbsd-10.

    Test: Insert kpause(10*hz) in bt_init_once, verify a machine with
    ubt(4) boots without crashing on the mutex.

 2. Try changing main from

         ifinit1
         ...
         configure
         ...
         ifinit
         domaininit
         ifinit_post

    to

         ifinit1
         domaininit
         ...
         configure
         ...
         ifinit
         ifinit_post

    and then merge ifinit and ifinit_post.  I skimmed all the .dom_init
    functions, and they look likely to be safe to use early on.  I also
    found a probably-unnecessary RUN_ONCE in key_init (arising from the
    time when percpu(9) couldn't be used that early).

 After that, I'm tempted to move everything from ifinit1 and
 ifinit_post into a single ifinit that happens before domaininit --
 what's left in ifinit and ifinit_post is...not much!

 Sound reasonable?

From: Iain Hibbert <plunky@ogmig.net>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/56988 (Bluetooth stack initializes bt_lock too late)
Date: Fri, 6 Jan 2023 20:58:12 +0000 (GMT)

 On Sun, 1 Jan 2023, Taylor R Campbell wrote:

 > > Date: Sun, 1 Jan 2023 13:24:39 +0000 (GMT)
 > > From: Iain Hibbert <plunky@ogmig.net>
 > > 
 > > I agree this is a bit grody, as the Bluetooth stack is not a module
 > 
 > The stack doesn't have to be _loadable_ as a module to take advantage
 > of module initialization.  (See, e.g., dev/cons.c.)

 Well ok, but I think that might also be grody :)

 > Using modules can also help to formalize dependencies between
 > subsystems -- dependencies which are historically only informally
 > implied by the topological sort encoded in main.

 Its true, and I have thought that as a subsytem netbt/ may be more suited 
 to modularity than many others. However, it is not one as yet and I 
 thought of something before which could be difficult though I can't recall 
 it just now.

 > > so, simpler solution could be to wrap the alloc in RUN_ONCE and also call 
 > > bt_init() from hci_attach_pcb() to ensure the mutex is ready; this avoids 
 > > entangling with module subsystem, does that sound ok?
 > 
 > I'd like to avoid RUN_ONCE, except as a last resort; it generally
 > indicates an incoherent design, and we have too many of those already.

 Ok, and I note that RUN_ONCE is a little racy as it doesn't hold 
 another caller back, and this function should be quick but it does 
 alloc(PR_WAITOK)

 > However, this should be fine for pullup to netbsd-10, to avoid any
 > complicated fallout from changing module initialization order or
 > domaininit/ifinit ordering in main.
 > 
 > So here's a straw proposal, subject to actual testing:
 > 
 > 1. Pull up the RUN_ONCE approach to netbsd-10.
 > 
 >    Test: Insert kpause(10*hz) in bt_init_once, verify a machine with
 >    ubt(4) boots without crashing on the mutex.

 I can do that, but I'm not sure its useful? Thinking it through if kpause 
 was before the alloc the second caller would crash taking the mutex, and 
 if after the alloc it would never crash just delay the boot whichever got 
 there first.

 The alloc should not sleep this early in boot, but if it did then a second 
 caller would also crash.

 In fact, I'm coming around to the module_init method. The mutex should be 
 ready before anything can look at it. To be clear, I don't like using 
 module init, its a cheat to attach an init function to something like that 
 just because it happens at the right time. But, using RUN_ONCE doesn't 
 seem correct either; it is not for synchronization.

 iain

From: Taylor R Campbell <riastradh@NetBSD.org>
To: Iain Hibbert <plunky@ogmig.net>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/56988 (Bluetooth stack initializes bt_lock too late)
Date: Fri, 6 Jan 2023 22:19:47 +0000

 > Date: Fri, 6 Jan 2023 20:58:12 +0000 (GMT)
 > From: Iain Hibbert <plunky@ogmig.net>
 > 
 > Ok, and I note that RUN_ONCE is a little racy as it doesn't hold 
 > another caller back, and this function should be quick but it does 
 > alloc(PR_WAITOK)

 RUN_ONCE isn't racy -- there is an underlying lock and condvar, which
 are initialized very early in main.  But it scatters large-scale
 subsystem dependencies deep inside the logic of the code, instead of
 at a higher level like the module declarations.

 It may be nice for modules to be loadable and unloadable, but we can
 also take advantage of the concept for organizing monolithic kernels
 even if the components can't be loaded or unloaded.

 > > 1. Pull up the RUN_ONCE approach to netbsd-10.
 > > 
 > >    Test: Insert kpause(10*hz) in bt_init_once, verify a machine with
 > >    ubt(4) boots without crashing on the mutex.
 > 
 > I can do that, but I'm not sure its useful? Thinking it through if kpause 
 > was before the alloc the second caller would crash taking the mutex, and 
 > if after the alloc it would never crash just delay the boot whichever got 
 > there first.

 That's exactly the idea -- do it before initializing bt_lock.

 - If this approach works, it will prevent the race -- anyone trying to
   use bt_lock will wait until the first RUN_ONCE call has completed.

 - If this approach doesn't work, it will likely crash again -- there's
   still a path to using bt_lock that doesn't wait for the RUN_ONCE.

 We don't keep the kpause(10*hz) committed, of course -- just use it to
 test, and then commit without the kpause.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56988 CVS commit: src/sys/netbt
Date: Mon, 7 Aug 2023 13:31:54 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Mon Aug  7 13:31:54 UTC 2023

 Modified Files:
 	src/sys/netbt: bt_proto.c

 Log Message:
 netbt(4): Initialize bt_lock earlier.

 Use a driver-class module modcmd init function, instead of a socket
 domain init function; the socket-domain ones don't run until after
 configure, but we need this to be initialized before configure so
 that Bluetooth HCI drivers like ubt(4) can use it.

 This is suboptimal but it's the least intrusive way I've thought of
 to get this working, even if it's a little grody to make netbt a
 `driver-class' (builtin) module.  Note that this doesn't mean netbt
 becomes dynamically loadable or unloadable; we're just using a module
 for initialization ordering.

 PR kern/56988

 XXX pullup-10


 To generate a diff of this commit:
 cvs rdiff -u -r1.16 -r1.17 src/sys/netbt/bt_proto.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.