NetBSD Problem Report #58933

From dogcow@babymeat.com  Tue Dec 24 22:17:58 2024
Return-Path: <dogcow@babymeat.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits)
	 client-signature RSA-PSS (2048 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 39F571A9238
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 24 Dec 2024 22:17:58 +0000 (UTC)
Message-Id: <20241224221419.8FAD111D58A@chungus.babymeat.com>
Date: Tue, 24 Dec 2024 14:17:56 -0800
From: "T K Spindler (moof)" <dogcow@babymeat.com>
Reply-To:
To: gnats-bugs@netbsd.org
Subject: panic in athn after wifi router rebooted,  diagnostic assertion "ci->ci_mtx_count == -1" failed
X-Send-Pr-Version: 3.95

>Number:         58933
>Notify-List:    riastradh@NetBSD.org
>Category:       kern
>Synopsis:       panic in athn after wifi router rebooted,  diagnostic assertion "ci->ci_mtx_count == -1" failed
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Dec 24 22:20:00 +0000 2024
>Last-Modified:  Tue Dec 24 22:50:01 +0000 2024
>Originator:     "T K Spindler (moof)" <dogcow@babymeat.com>
>Release:        NetBSD 10.1
>Organization:

>Environment:
ystem: NetBSD chungus.babymeat.com 10.1 NetBSD 10.1 (CHUNGUS) #0: Tue Dec 17 11:44:53 CST 2024 dogcow@chungus.babymeat.com:/crap/obj/10obj/usr/src/sys/arch/amd64/compile/CHUNGUS amd64
Architecture: x86_64
Machine: amd64
>Description:
After rebooting my wifi router, got a panic
[ 313028.349718] panic: kernel diagnostic assertion "ci->ci_mtx_count == -1" failed: file "/usr/src/sys/kern/kern_synch.c", line 726 mi_switch: cpu1: ci_mtx_count (-2) != -1 (block with spin-mutex held)

backtrace:
(gdb) bt
#0  0xffffffff80233c25 in cpu_reboot (howto=howto@entry=260,
    bootstr=bootstr@entry=0x0) at /usr/src/sys/arch/amd64/amd64/machdep.c:717
#1  0xffffffff805381e4 in kern_reboot (howto=howto@entry=260,
    bootstr=bootstr@entry=0x0) at /usr/src/sys/kern/kern_reboot.c:73
#2  0xffffffff8057fe45 in vpanic (
    fmt=0xffffffff80945db8 "kernel %sassertion \"%s\" failed: file \"%s\", line %d %s: cpu%u: ci_mtx_count (%d) != -1 (block with spin-mutex held)",
    ap=ap@entry=0xffffd3070916a958) at /usr/src/sys/kern/subr_prf.c:291
#3  0xffffffff806e3e57 in kern_assert (
    fmt=fmt@entry=0xffffffff80945db8 "kernel %sassertion \"%s\" failed: file \"%s\", line %d %s: cpu%u: ci_mtx_count (%d) != -1 (block with spin-mutex held)")
    at /usr/src/sys/lib/libkern/kern_assert.c:51
#4  0xffffffff8054a50b in mi_switch (l=l@entry=0xfffff3c2f10551c0)
    at /usr/src/sys/sys/cpu.h:108
#5  0xffffffff805463fb in sleepq_block (timo=timo@entry=0,
    catch_p=catch_p@entry=false,
    syncobj=syncobj@entry=0xffffffff80c17740 <cv_syncobj>)
    at /usr/src/sys/kern/kern_sleepq.c:351
#6  0xffffffff80502238 in cv_wait (cv=0xffffd30030947cc0,
    mtx=0xffffd30030947cd0) at /usr/src/sys/kern/kern_condvar.c:174
#7  0xffffffff802e1d67 in athn_usb_wait_async (usc=0xffffd30030945000)
    at /usr/src/sys/dev/usb/if_athn_usb.c:907
#8  athn_usb_stop_locked (ifp=ifp@entry=0xffffd30030945d40)
    at /usr/src/sys/dev/usb/if_athn_usb.c:2943
#9  0xffffffff802e27df in athn_usb_stop (disable=0, ifp=0xffffd30030945d40)
    at /usr/src/sys/dev/usb/if_athn_usb.c:2924
#10 athn_usb_media_change (ifp=0xffffd30030945d40)
    at /usr/src/sys/dev/usb/if_athn_usb.c:1404
#11 athn_usb_media_change (ifp=0xffffd30030945d40)
    at /usr/src/sys/dev/usb/if_athn_usb.c:1391
#12 0xffffffff806406f7 in ifmedia_change (ifm=ifm@entry=0xffffd30030945a30,
    ifp=ifp@entry=0xffffd30030945d40) at /usr/src/sys/net/if_media.c:244
#13 0xffffffff80640c14 in ifmedia_ioctl (ifp=0xffffd30030945d40,
    ifr=<optimized out>, ifm=0xffffd30030945a30, cmd=<optimized out>)
    at /usr/src/sys/net/if_media.c:479
#14 0xffffffff802e3db1 in athn_usb_ioctl (ifp=0xffffd30030945d40,
    cmd=3230689591, data=0xfffff3c4a80db300)
    at /usr/src/sys/dev/usb/if_athn_usb.c:2726
#15 0xffffffff806340c4 in doifioctl (so=0xfffff3c2f114d038,
    cmd=<optimized out>, data=0xfffff3c4a80db300, l=0xfffff3c2f10551c0)
    at /usr/src/sys/net/if.c:3582
#16 0xffffffff8059270f in sys_ioctl (l=<optimized out>,
    uap=0xffffd3070916af00, retval=<optimized out>)
    at /usr/src/sys/kern/sys_generic.c:675
#17 0xffffffff8031f9a4 in sy_call (rval=0xffffd3070916aeb0,
    uap=0xffffd3070916af00, l=0xfffff3c2f10551c0,
    sy=0xffffffff80c14bf0 <sysent+1296>) at /usr/src/sys/sys/syscallvar.h:65
#18 sy_invoke (code=54, rval=0xffffd3070916aeb0, uap=0xffffd3070916af00,
    l=0xfffff3c2f10551c0, sy=0xffffffff80c14bf0 <sysent+1296>)
    at /usr/src/sys/sys/syscallvar.h:94
#19 syscall (frame=0xffffd3070916af00)
    at /usr/src/sys/arch/x86/x86/syscall.c:138
#20 0xffffffff8020b68d in handle_syscall ()
#21 0x0000000000000004 in ?? ()
#22 0x00000000c0906937 in ?? ()
#23 0x00007f7fff019750 in ?? ()
#24 0x00007f7fff01975f in ?? ()
#25 0x00007b2eb049a108 in ?? ()
#26 0x00007b2eb05194a8 in ?? ()
#27 0x0000000000000000 in ?? ()


>How-To-Repeat:
power-cycle wifi router again

>Fix:





>Release-Note:

>Audit-Trail:

From: Taylor R Campbell <riastradh@NetBSD.org>
To: Jason Thorpe <thorpej@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org,
	"T K Spindler (moof)" <dogcow@babymeat.com>
Subject: Re: kern/58933: panic in athn after wifi router rebooted,  diagnostic assertion "ci->ci_mtx_count == -1" failed
Date: Tue, 24 Dec 2024 22:46:01 +0000

 > #7  0xffffffff802e1d67 in athn_usb_wait_async (usc=3D0xffffd30030945000)
 >     at /usr/src/sys/dev/usb/if_athn_usb.c:907
 > ...
 > #11 athn_usb_media_change (ifp=3D0xffffd30030945d40)
 >     at /usr/src/sys/dev/usb/if_athn_usb.c:1391
 > #12 0xffffffff806406f7 in ifmedia_change (ifm=3Difm@entry=3D0xffffd300309=
 45a30,
 >     ifp=3Difp@entry=3D0xffffd30030945d40) at /usr/src/sys/net/if_media.c:=
 244

 This is happening because athn(4) still uses legacy ifmedia locking,
 which holds a spin lock.

 Fixing this properly requires teaching athn(4) to coordinate locking
 with ifmedia and ifmedia_init_with_lock, and that's going to be a pain
 to merge with the wifi branch (but we need to do it eventually,
 whether or not merge happens).

 But, athn-specific issues aside, I wonder whether ifmedia's approach
 to legacy drivers is right here.  Maybe for legacy drivers it should
 be the kernel lock (which is released across sleep, instead of
 forbidden to hold across sleep), like we usually do for legacy
 MP-unsafe subsystems.

 But I don't know enough about the ifmedia locking to know whether the
 lock has to serialize entire transactions (in which case the kernel
 lock is not good enough), or just has to make logic appear
 single-threaded.

 What do you think, thorpej?

 I think changing legacy ifmedia drivers from spin lock to kernel lock,
 if it's the right thing, should be pretty narrowly scoped and safe to
 pull up to release branches too.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.