NetBSD Problem Report #57181

From brad@anduin.eldar.org  Wed Jan 11 00:50:15 2023
Return-Path: <brad@anduin.eldar.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 0CD8C1A9239
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 11 Jan 2023 00:50:15 +0000 (UTC)
Message-Id: <202301110050.30B0oAU5009193@anduin.eldar.org>
Date: Tue, 10 Jan 2023 19:50:10 -0500 (EST)
From: brad@anduin.eldar.org
Reply-To: brad@anduin.eldar.org
To: gnats-bugs@NetBSD.org
Subject: LOCKDEBUG panic with npf
X-Send-Pr-Version: 3.95

>Number:         57181
>Category:       kern
>Synopsis:       LOCKDEBUG panic with npf
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jan 11 00:55:00 +0000 2023
>Closed-Date:    Tue Nov 28 02:12:04 +0000 2023
>Last-Modified:  Tue Nov 28 02:12:04 +0000 2023
>Originator:     Brad Spencer
>Release:        NetBSD 10.0_BETA
>Organization:
	eldar.org
>Environment:


System: NetBSD testcurrent.nat.eldar.org 10.0_BETA NetBSD 10.0_BETA (GENERIC_LOCKDEBUG) #0: Tue Jan 10 16:55:28 EST 2023  brad@samwise.nat.eldar.org:/usr/src/sys/arch/amd64/compile/GENERIC_LOCKDEBUG amd64
Architecture: x86_64
Machine: amd64
>Description:

LOCKDEBUG compiled into 10.0_BETA pulled on 2023-01-10 running on a
PVH DOMU.  Single processor, 8GB of memory.  Neither of those is
likely very important.

The following panic can be triggered with some set up pretty simply:

[ 118.6825506] Mutex error: rw_vector_enter,309: spin lock held

[ 118.6825506] lock address : ffff8f3f683a75b8
[ 118.6825506] type         : spin
[ 118.6825506] initialized  : netbsd:npf_table_create+0x99
[ 118.6825506] shared holds :                  0 exclusive:                  1
[ 118.6825506] shares wanted:                  0 exclusive:                  0
[ 118.6825506] relevant cpu :                  0 last held:                  0
[ 118.6825506] relevant lwp : 0xffff8f3f672a6300 last held: 0xffff8f3f672a6300
[ 118.6825506] last locked* : netbsd:npf_table_list+0x34
[ 118.6825506] unlocked     : netbsd:npf_table_list+0x62
[ 118.6825506] owner field  : 0x0000000000010600 wait/spin:                0/1

[ 118.6825506] panic: LOCKDEBUG: Mutex error: rw_vector_enter,309: spin lock held
[ 118.6825506] cpu0: Begin traceback...
[ 118.6825506] vpanic() at netbsd:vpanic+0x183
[ 118.6825506] panic() at netbsd:panic+0x3c
[ 118.6825506] lockdebug_abort1() at netbsd:lockdebug_abort1+0xe6
[ 118.6825506] rw_enter() at netbsd:rw_enter+0x43b
[ 118.6825506] uvm_fault_internal() at netbsd:uvm_fault_internal+0x111
[ 118.6825506] trap() at netbsd:trap+0x47d
[ 118.6825506] --- trap (number 6) ---
[ 118.6825506] copyout() at netbsd:copyout+0x33
[ 118.6825506] npf_table_list() at netbsd:npf_table_list+0x57
[ 118.6825506] npfctl_table() at netbsd:npfctl_table+0xf7
[ 118.6825506] cdev_ioctl() at netbsd:cdev_ioctl+0x99
[ 118.6825506] spec_ioctl() at netbsd:spec_ioctl+0x58
[ 118.6825506] VOP_IOCTL() at netbsd:VOP_IOCTL+0x47
[ 118.6825506] vn_ioctl() at netbsd:vn_ioctl+0xaf
[ 118.6825506] sys_ioctl() at netbsd:sys_ioctl+0x56d
[ 118.6825506] syscall() at netbsd:syscall+0x196
[ 118.6825506] --- syscall (number 54) ---
[ 118.6825506] netbsd:syscall+0x196:
[ 118.6825506] cpu0: End traceback...
[ 118.6825506] fatal breakpoint trap in supervisor mode
[ 118.6825506] trap type 1 code 0 rip 0xffffffff80235315 cs 0x8 rflags 0x202 cr2 0x724c29d9e180 ilevel 0x8 rsp 0xffffdb8240a7b5e0
[ 118.6825506] curlwp 0xffff8f3f672a6300 pid 1195.1195 lowest kstack 0xffffdb8240a772c0

>How-To-Repeat:
<code/input/activities to reproduce the problem
	(multiple lines)>

Given a /etc/npf.conf file that containes this:

table <blocklist> type ipset

procedure "log" {
          log: npflog0
}

group default {
      pass in all
      pass out all
}


Given a shell script that does this:

#!/bin/sh

for a in `cat /etc/blocklist`
do
    /sbin/npfctl table blocklist add $a > /dev/null 2>&1
done

The file /etc/blocklist contains a list of IP addresses.  The number
may not matter just too much, but needs to be large enough to cause
the script to run for a while (depending on how you run the test).  I
typically have one that is almost 200,000 addresses.  I also used one
that was 1000 addresses and the panic will happen with that amount.
However, leaving the table empty or just putting one address in didn't
trip the panic.

Set up the system as mentioned above, and run the script to load the
table.  The reason this is needed is that if there are a large number
of addresses present NPF has a problem with loading the table with
npfctl.  So, that part is a cheat... to work around the problem of
large tables and npfctl.  Note that the cheat isn't needed if you use
1000 addresses.

Now, either as the load is happening for a bit, or after it is done
loading do the following:

npfctl table blocklist list | wc

You will panic with the mentioned LOCKDEBUG panic above.

(I will also mention that if I load the 1000 addresses with
/etc/npf.conf and then break into ddb after the system is up and do a
"show locks" it does not show anything unusual, so something is
managing to hold the npf_table_create even though the table is already
there and only when npfctl list is performed)

>Fix:

I am VERY hopeful that someone can see what the fix is.  I also
suspect that it is the root of kern/57136, but that is a guess on my
part.  Further, as NPF is suppose to be the firewall of choice now
this should probably be looked into.

>Release-Note:

>Audit-Trail:
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57181 CVS commit: src/sys/net/npf
Date: Mon, 23 Jan 2023 13:40:05 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Mon Jan 23 13:40:05 UTC 2023

 Modified Files:
 	src/sys/net/npf: npf_tableset.c

 Log Message:
 npf(9): Drop table lock around copyout.

 It is forbidden to hold a spin lock around copyout, and t_lock is a
 spin lock.

 We need t_lock in order to iterate over the list of entries.
 However, during copyout itself, we only need to ensure that the
 object we're copying out isn't freed by npf_table_remove or
 npf_table_gc.

 Fortunately, the only caller of npf_table_list, npf_table_remove, and
 npf_table_gc is npfctl_table, and it serializes all of them by the
 npf config lock.  So we can safely drop t_lock across copyout.

 PR kern/57136
 PR kern/57181


 To generate a diff of this commit:
 cvs rdiff -u -r1.40 -r1.41 src/sys/net/npf/npf_tableset.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Konrad Schroder" <perseant@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57181
Date: Mon, 27 Mar 2023 15:21:05 -0700

 rc/sys/net/npf/npf_tableset.c rev 1.41 fixes the same issue for me under
 NetBSD 9.3, but the PR is still open and the patch has not been pulled up
 to the -9 or -10 branches.  Does the patch fix the issue for Brad?
 Assuming it does, can we please pull up to the release branches?

 Thanks,
 -Konrad

State-Changed-From-To: open->needs-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Sun, 20 Aug 2023 18:26:51 +0000
State-Changed-Why:
fix committed, need pullup to 10 and 9, maybe 8


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57181 CVS commit: [netbsd-10] src/sys/net/npf
Date: Mon, 21 Aug 2023 12:18:17 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Mon Aug 21 12:18:17 UTC 2023

 Modified Files:
 	src/sys/net/npf [netbsd-10]: npf_tableset.c

 Log Message:
 Pull up following revision(s) (requested by riastradh in ticket #332):

 	sys/net/npf/npf_tableset.c: revision 1.41

 npf(9): Drop table lock around copyout.

 It is forbidden to hold a spin lock around copyout, and t_lock is a
 spin lock.

 We need t_lock in order to iterate over the list of entries.
 However, during copyout itself, we only need to ensure that the
 object we're copying out isn't freed by npf_table_remove or
 npf_table_gc.

 Fortunately, the only caller of npf_table_list, npf_table_remove, and
 npf_table_gc is npfctl_table, and it serializes all of them by the
 npf config lock.  So we can safely drop t_lock across copyout.

 PR kern/57136
 PR kern/57181


 To generate a diff of this commit:
 cvs rdiff -u -r1.38 -r1.38.4.1 src/sys/net/npf/npf_tableset.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57181 CVS commit: [netbsd-9] src/sys/net/npf
Date: Mon, 21 Aug 2023 12:20:07 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Mon Aug 21 12:20:07 UTC 2023

 Modified Files:
 	src/sys/net/npf [netbsd-9]: npf_tableset.c

 Log Message:
 Pull up following revision(s) (requested by riastradh in ticket #1718):

 	sys/net/npf/npf_tableset.c: revision 1.41

 npf(9): Drop table lock around copyout.

 It is forbidden to hold a spin lock around copyout, and t_lock is a
 spin lock.

 We need t_lock in order to iterate over the list of entries.
 However, during copyout itself, we only need to ensure that the
 object we're copying out isn't freed by npf_table_remove or
 npf_table_gc.

 Fortunately, the only caller of npf_table_list, npf_table_remove, and
 npf_table_gc is npfctl_table, and it serializes all of them by the
 npf config lock.  So we can safely drop t_lock across copyout.

 PR kern/57136
 PR kern/57181


 To generate a diff of this commit:
 cvs rdiff -u -r1.33.2.2 -r1.33.2.3 src/sys/net/npf/npf_tableset.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: needs-pullups->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Tue, 28 Nov 2023 02:12:04 +0000
State-Changed-Why:
fixed and pulled up to 10 and 9, not needed in 8


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.