NetBSD Problem Report #57136

From brad@anduin.eldar.org  Fri Dec 23 20:15:45 2022
Return-Path: <brad@anduin.eldar.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id D63A31A921F
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 23 Dec 2022 20:15:44 +0000 (UTC)
Message-Id: <202212232015.2BNKFeDB019643@anduin.eldar.org>
Date: Fri, 23 Dec 2022 15:15:40 -0500 (EST)
From: brad@anduin.eldar.org
Reply-To: brad@anduin.eldar.org
To: gnats-bugs@NetBSD.org
Subject: NPF panic probably on a NPF table list call
X-Send-Pr-Version: 3.95

>Number:         57136
>Category:       kern
>Synopsis:       NPF panic probably on a NPF table list call
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Dec 23 20:20:00 +0000 2022
>Closed-Date:    Tue Nov 28 02:11:05 +0000 2023
>Last-Modified:  Tue Nov 28 02:11:05 +0000 2023
>Originator:     brad@anduin.eldar.org
>Release:        NetBSD 10.0_BETA
>Organization:
	eldar.org
>Environment:
System: NetBSD anduin.eldar.org 10.0_BETA NetBSD 10.0_BETA (ANDUIN) #0: Thu Dec 22 11:07:33 EST 2022 brad@samwise.nat.eldar.org:/usr/src/sys/arch/amd64/compile/ANDUIN
Architecture: x86_64
Machine: amd64
>Description:

Sorry for the lack of detail.  This is probably a KASSERT() in the npf
code for a table list.  I have had it happen a couple of times in the
last couple of days.

The following was copied from a image as I could not get a kernel
dump, or otherwise save the panic.  I also don't exactly know what
assert (if that was what it was) may have fired as the screen scrolled
off before I could see that and it wasn't saved anywhere:

breakpoint() at netbsd:breakpoint+0x5
vpanic() at netbsd:vpanic+0x177
kern_assert() at netbsd:kern_assert+0x4b
mi_switch() at netbsd:mi_switch+0x7e2
sleepq_block() at netbsd:sleepq_block+0x13a
mtsleep() at netbsd:mtsleep+0x17f
uvmfault_promote() at netbsd:uvmfault_promote+0x4b2
uvm_fault_internal() at netbsd:uvm_fault_internal+0x1488
trap() at netbsd:trap+0x46a
--- trap (number 6) ---
copyout() at netbsd:copyout+0x33
npf_table_list() at netbsd:npf_table_list+0x57
npfctl_table() at netbsd:npfctl_table+0xba
spec_ioctl() at netbsd:spec_ioctl+0x58
VOP_IOCTL() at netbsd:VOP_IOCTL+0x41
vn_ioctl() at netbsd:vn_ioctl+0xad
sys_ioctl() at netbsd:sys_ioctl+0x555
syscall() at netbsd:syscall+0x9c
--- syscall (number 54) ---

This panic occurred when the system was a DOMU pvh with pvshim enabled,
but has since been switched to a pure PVH guest for other reasons.

There is a cron job that runs pretty often on this system that pulls
the output from a particular npf table using npfctl, something like
"npfctl table badguys list > output_file" and compares this output to
a current list of badguys.  Changes to the table are then made with
"npfctl table badguys add ...." and remove.  After the changes have
been made, another "npfctl table badguys list" is done comparing that
output to the new list to make sure that they are the same.  From the
logs, it seems that the panic happened on this second list attempt.  I
can say with a pretty good certainty that nothing actually changed in
the table when this panic'ed.  So, this would have reduced to a table
list, a very short delay, and then another table list.

>How-To-Repeat:

I don't know what the situation is that triggers this.  The system is
pretty busy doing a LOT of other stuff all of the time (router, NFS
server, rabbitmq server, LDAP server, kerberos slave, etc...), and the
only unusual thing the last time was a copy of a bunch of big files
(well, a block-attach'ed thumb drive from DOM0).  The previous times
did not have anything unusual going on that I know of.

>Fix:

Don't know...

>Release-Note:

>Audit-Trail:
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57136 CVS commit: src/sys/net/npf
Date: Mon, 23 Jan 2023 13:40:05 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Mon Jan 23 13:40:05 UTC 2023

 Modified Files:
 	src/sys/net/npf: npf_tableset.c

 Log Message:
 npf(9): Drop table lock around copyout.

 It is forbidden to hold a spin lock around copyout, and t_lock is a
 spin lock.

 We need t_lock in order to iterate over the list of entries.
 However, during copyout itself, we only need to ensure that the
 object we're copying out isn't freed by npf_table_remove or
 npf_table_gc.

 Fortunately, the only caller of npf_table_list, npf_table_remove, and
 npf_table_gc is npfctl_table, and it serializes all of them by the
 npf config lock.  So we can safely drop t_lock across copyout.

 PR kern/57136
 PR kern/57181


 To generate a diff of this commit:
 cvs rdiff -u -r1.40 -r1.41 src/sys/net/npf/npf_tableset.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->needs-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Sun, 20 Aug 2023 18:27:15 +0000
State-Changed-Why:
fix committed, need pullup to 10 and 9, maybe 8


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57136 CVS commit: [netbsd-10] src/sys/net/npf
Date: Mon, 21 Aug 2023 12:18:17 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Mon Aug 21 12:18:17 UTC 2023

 Modified Files:
 	src/sys/net/npf [netbsd-10]: npf_tableset.c

 Log Message:
 Pull up following revision(s) (requested by riastradh in ticket #332):

 	sys/net/npf/npf_tableset.c: revision 1.41

 npf(9): Drop table lock around copyout.

 It is forbidden to hold a spin lock around copyout, and t_lock is a
 spin lock.

 We need t_lock in order to iterate over the list of entries.
 However, during copyout itself, we only need to ensure that the
 object we're copying out isn't freed by npf_table_remove or
 npf_table_gc.

 Fortunately, the only caller of npf_table_list, npf_table_remove, and
 npf_table_gc is npfctl_table, and it serializes all of them by the
 npf config lock.  So we can safely drop t_lock across copyout.

 PR kern/57136
 PR kern/57181


 To generate a diff of this commit:
 cvs rdiff -u -r1.38 -r1.38.4.1 src/sys/net/npf/npf_tableset.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57136 CVS commit: [netbsd-9] src/sys/net/npf
Date: Mon, 21 Aug 2023 12:20:07 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Mon Aug 21 12:20:07 UTC 2023

 Modified Files:
 	src/sys/net/npf [netbsd-9]: npf_tableset.c

 Log Message:
 Pull up following revision(s) (requested by riastradh in ticket #1718):

 	sys/net/npf/npf_tableset.c: revision 1.41

 npf(9): Drop table lock around copyout.

 It is forbidden to hold a spin lock around copyout, and t_lock is a
 spin lock.

 We need t_lock in order to iterate over the list of entries.
 However, during copyout itself, we only need to ensure that the
 object we're copying out isn't freed by npf_table_remove or
 npf_table_gc.

 Fortunately, the only caller of npf_table_list, npf_table_remove, and
 npf_table_gc is npfctl_table, and it serializes all of them by the
 npf config lock.  So we can safely drop t_lock across copyout.

 PR kern/57136
 PR kern/57181


 To generate a diff of this commit:
 cvs rdiff -u -r1.33.2.2 -r1.33.2.3 src/sys/net/npf/npf_tableset.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: needs-pullups->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Tue, 28 Nov 2023 02:11:05 +0000
State-Changed-Why:
pulled up to 10 and 9, not needed for 8


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.