NetBSD Problem Report #56438

From brad@anduin.eldar.org  Sun Oct  3 22:50:48 2021
Return-Path: <brad@anduin.eldar.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id E85CB1A921F
	for <gnats-bugs@gnats.NetBSD.org>; Sun,  3 Oct 2021 22:50:47 +0000 (UTC)
Message-Id: <202110032250.193MogQC005797@anduin.eldar.org>
Date: Sun, 3 Oct 2021 18:50:42 -0400 (EDT)
From: brad@anduin.eldar.org
Reply-To: brad@anduin.eldar.org
To: gnats-bugs@NetBSD.org
Subject: panic when trying to use gpioiic in -current 9.99.90
X-Send-Pr-Version: 3.95

>Number:         56438
>Category:       kern
>Synopsis:       panic when trying to use gpioiic in -current 9.99.90
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Oct 03 22:55:00 +0000 2021
>Last-Modified:  Fri Oct 08 09:55:01 +0000 2021
>Originator:     brad@anduin.eldar.org
>Release:        NetBSD-current 9.99.90
>Organization:
	eldar.org
>Environment:
RPI 3 evbarm6hf
Architecture: evbarm
Machine: evbarm6hf
>Description:

I have been using gpioiic for many releases of NetBSD on various types
of RPI devices.  I attempted to use it on -current pulled from source
2021-09-26 or so and a panic can be produced when /etc/rc.d/gpio runs.

From a boot:

Creating a.out runtime link editor directory cache.
Checking quotas: done.
Configuring GPIO.
[  13.2358565] panic: kernel diagnostic assertion "KERNEL_LOCKED_P()" failed: file "../../../../kern/subr_autoconf.c", line 1053
[  13.2487462] cpu0: Begin traceback...
[  13.2487462] 0xd1cf1b5c: netbsd:db_panic+0x14
[  13.2562360] 0xd1cf1b74: netbsd:vpanic+0x148
[  13.2562360] 0xd1cf1b8c: netbsd:__aeabi_uldivmod
[  13.2562360] 0xd1cf1bc4: netbsd:config_match+0x88
[  13.2680444] 0xd1cf1be4: netbsd:mapply+0x4c
[  13.2680444] 0xd1cf1c34: netbsd:config_search_internal+0x1a8
[  13.2771142] 0xd1cf1c5c: netbsd:config_search+0x90
[  13.2771142] 0xd1cf1cf4: netbsd:gpioioctl+0x3b0
[  13.2875069] 0xd1cf1d1c: netbsd:spec_ioctl+0xe8
[  13.2875069] 0xd1cf1d4c: netbsd:VOP_IOCTL+0x50
[  13.2971192] 0xd1cf1e24: netbsd:vn_ioctl+0xd8
[  13.2971192] 0xd1cf1eec: netbsd:sys_ioctl+0x468
[  13.3075899] 0xd1cf1fac: netbsd:syscall+0x188
[  13.3075899] cpu0: End traceback...
Stopped in pid 681.681 (gpioctl) at     netbsd:cpu_Debugger+0x4:        bx
r14

in this case, modules=YES, gpio=YES in /etc/rc.conf.  In
/etc/modules.conf gpioiic and sht4xtemp was included.  In
/etc/gpio.conf a single line:

gpio0 attach gpioiic 5 0x03 0x0

which is the usual way I have done this.  The kernel is a evbarm6hf
build without MULTIPROCESSOR to work around kern/56433 where the RPI3
won't boot in MP.

>How-To-Repeat:

This can be repeated without actually having physical hardware by
using gpiosim.  The following will produce the panic:

In /etc/rc.conf, use modules=YES and gpio=YES

In /etc/modules.conf include:
gpiosim
gpioiic

In /etc/gpio.conf put the following:

gpio0 attach gpioiic 5 0x03 0x0

Change gpio0 to whatever gpiosim attaches to.  If you do not have any
physical gpio devices it will likely be gpio0, if you do, it will be
+1 beyond what is there physically.  So a RPI, it would end up gpio1.

Boot the system and you will see the panic when /etc/rc.d/gpio runs.
The test for this was performed on a RPI3 with a kernel without
MULTIPROCESSOR enabled to work around kern/56433

>Fix:

Don't know.  The attachment framework is a bit of a black box to me.

>Audit-Trail:
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56438: panic when trying to use gpioiic in -current 9.99.90
Date: Mon, 4 Oct 2021 05:42:14 -0000 (UTC)

 brad@anduin.eldar.org writes:

 >[  13.2358565] panic: kernel diagnostic assertion "KERNEL_LOCKED_P()" failed: file "../../../../kern/subr_autoconf.c", line 1053
 >[  13.2487462] cpu0: Begin traceback...
 >[  13.2487462] 0xd1cf1b5c: netbsd:db_panic+0x14
 >[  13.2562360] 0xd1cf1b74: netbsd:vpanic+0x148
 >[  13.2562360] 0xd1cf1b8c: netbsd:__aeabi_uldivmod
 >[  13.2562360] 0xd1cf1bc4: netbsd:config_match+0x88
 >[  13.2680444] 0xd1cf1be4: netbsd:mapply+0x4c
 >[  13.2680444] 0xd1cf1c34: netbsd:config_search_internal+0x1a8
 >[  13.2771142] 0xd1cf1c5c: netbsd:config_search+0x90
 >[  13.2771142] 0xd1cf1cf4: netbsd:gpioioctl+0x3b0

 The attachment framework nowadays enforces that it is called with
 the kernel lock held. Since gpio(4) is tagged as 'mpsafe' it has
 to care itself about taking the kernel lock.

From: Brad Spencer <brad@anduin.eldar.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/56438: panic when trying to use gpioiic in -current 9.99.90
Date: Mon, 04 Oct 2021 10:04:23 -0400

 mlelstv@serpens.de (Michael van Elst) writes:

 > The following reply was made to PR kern/56438; it has been noted by GNATS.
 >
 > From: mlelstv@serpens.de (Michael van Elst)
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: kern/56438: panic when trying to use gpioiic in -current 9.99.90
 > Date: Mon, 4 Oct 2021 05:42:14 -0000 (UTC)
 >
 >  brad@anduin.eldar.org writes:
 >  
 >  >[  13.2358565] panic: kernel diagnostic assertion "KERNEL_LOCKED_P()" failed: file "../../../../kern/subr_autoconf.c", line 1053
 >  >[  13.2487462] cpu0: Begin traceback...
 >  >[  13.2487462] 0xd1cf1b5c: netbsd:db_panic+0x14
 >  >[  13.2562360] 0xd1cf1b74: netbsd:vpanic+0x148
 >  >[  13.2562360] 0xd1cf1b8c: netbsd:__aeabi_uldivmod
 >  >[  13.2562360] 0xd1cf1bc4: netbsd:config_match+0x88
 >  >[  13.2680444] 0xd1cf1be4: netbsd:mapply+0x4c
 >  >[  13.2680444] 0xd1cf1c34: netbsd:config_search_internal+0x1a8
 >  >[  13.2771142] 0xd1cf1c5c: netbsd:config_search+0x90
 >  >[  13.2771142] 0xd1cf1cf4: netbsd:gpioioctl+0x3b0
 >  
 >  The attachment framework nowadays enforces that it is called with
 >  the kernel lock held. Since gpio(4) is tagged as 'mpsafe' it has
 >  to care itself about taking the kernel lock.
 >  


 So, something like the following..  I made this change and tested it and
 the panic appears to be gone and devices attach as expected again, but
 as I mentioned I am not that familar with the autoconfig framework.  I
 can commit this if there isn't any objections.

 --- gpio.c.ORIG 2021-09-26 13:41:58.259633657 -0400
 +++ gpio.c      2021-10-04 08:02:22.584845545 -0400
 @@ -849,6 +849,7 @@
                 locs[GPIOCF_MASK] = ga.ga_mask;
                 locs[GPIOCF_FLAG] = ga.ga_flags;

 +               KERNEL_LOCK(1, NULL);
                 cf = config_search(sc->sc_dev, &ga,
                     CFARGS(.locators = locs));
                 if (cf != NULL) {
 @@ -869,6 +870,7 @@
  #endif
                 } else
                         error = EINVAL;
 +               KERNEL_UNLOCK_ONE(NULL);
                 mutex_enter(&sc->sc_mtx);
                 sc->sc_attach_busy = 0;
                 cv_signal(&sc->sc_attach);





 -- 
 Brad Spencer - brad@anduin.eldar.org - KC8VKS - http://anduin.eldar.org

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56438: panic when trying to use gpioiic in -current 9.99.90
Date: Mon, 4 Oct 2021 17:20:02 -0000 (UTC)

 brad@anduin.eldar.org (Brad Spencer) writes:

 >mlelstv@serpens.de (Michael van Elst) writes:

 >So, something like the following..  I made this change and tested it and
 >the panic appears to be gone and devices attach as expected again,

 Yes, but there are a few more cases. And the compat50 code misses any kind
 of locking.

From: Brad Spencer <brad@anduin.eldar.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/56438: panic when trying to use gpioiic in -current 9.99.90
Date: Fri, 08 Oct 2021 05:51:48 -0400

 mlelstv@serpens.de (Michael van Elst) writes:

 > The following reply was made to PR kern/56438; it has been noted by GNATS.
 >
 > From: mlelstv@serpens.de (Michael van Elst)
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: kern/56438: panic when trying to use gpioiic in -current 9.99.90
 > Date: Mon, 4 Oct 2021 17:20:02 -0000 (UTC)
 >
 >  brad@anduin.eldar.org (Brad Spencer) writes:
 >  
 >  >mlelstv@serpens.de (Michael van Elst) writes:
 >  
 >  >So, something like the following..  I made this change and tested it and
 >  >the panic appears to be gone and devices attach as expected again,
 >  
 >  Yes, but there are a few more cases. And the compat50 code misses any kind
 >  of locking.
 >  

 After a bit of back and forth email with Michael van Elst who provided
 more explanation... included is a more complete patch.  I tested
 attaching and detaching on real hardware with real devices.  I can't
 test the 5.0 path as I don't have anything that old and I am not sure
 rescan at the gpio device level will do much as the busses lower down,
 in particular, gpioiic and gpioow do not support the concept of rescan
 and I think that they would have to.  In any case, the rescan case
 doesn't panic either.  This probably should be commited as it improves
 the situation with -current.

 --- src/sys/dev/gpio/gpio.c.ORIG	2021-09-26 13:41:58.259633657 -0400
 +++ src/sys/dev/gpio/gpio.c	2021-10-05 07:01:37.664529748 -0400
 @@ -172,12 +172,14 @@
  	if (error)
  		return;

 +	KERNEL_LOCK(1, NULL);
  	LIST_FOREACH(gdev, &sc->sc_devs, sc_next)
  		if (gdev->sc_dev == child) {
  			LIST_REMOVE(gdev, sc_next);
  			kmem_free(gdev, sizeof(struct gpio_dev));
  			break;
  		}
 +	KERNEL_UNLOCK_ONE(NULL);

  	mutex_enter(&sc->sc_mtx);
  	sc->sc_attach_busy = 0;
 @@ -190,8 +192,10 @@
  gpio_rescan(device_t self, const char *ifattr, const int *locators)
  {

 +	KERNEL_LOCK(1, NULL);
  	config_search(self, NULL,
  	    CFARGS(.search = gpio_search));
 +	KERNEL_UNLOCK_ONE(NULL);

  	return 0;
  }
 @@ -849,6 +853,7 @@
  		locs[GPIOCF_MASK] = ga.ga_mask;
  		locs[GPIOCF_FLAG] = ga.ga_flags;

 +		KERNEL_LOCK(1, NULL);
  		cf = config_search(sc->sc_dev, &ga,
  		    CFARGS(.locators = locs));
  		if (cf != NULL) {
 @@ -869,6 +874,8 @@
  #endif
  		} else
  			error = EINVAL;
 +		KERNEL_UNLOCK_ONE(NULL);
 +
  		mutex_enter(&sc->sc_mtx);
  		sc->sc_attach_busy = 0;
  		cv_signal(&sc->sc_attach);
 @@ -1106,6 +1113,7 @@
  		if (error)
  			return EBUSY;

 +		KERNEL_LOCK(1, NULL);
  		attach = data;
  		LIST_FOREACH(gdev, &sc->sc_devs, sc_next) {
  			if (strcmp(device_xname(gdev->sc_dev),
 @@ -1115,11 +1123,15 @@
  				cv_signal(&sc->sc_attach);
  				mutex_exit(&sc->sc_mtx);

 -				if (config_detach(gdev->sc_dev, 0) == 0)
 +				if (config_detach(gdev->sc_dev, 0) == 0) {
 +					KERNEL_UNLOCK_ONE(NULL);
  					return 0;
 +				}
  				break;
  			}
  		}
 +		KERNEL_UNLOCK_ONE(NULL);
 +
  		if (gdev == NULL) {
  			mutex_enter(&sc->sc_mtx);
  			sc->sc_attach_busy = 0;



 -- 
 Brad Spencer - brad@anduin.eldar.org - KC8VKS - http://anduin.eldar.org

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.