NetBSD Problem Report #56705

From hauke@Espresso.Rhein-Neckar.DE  Sat Feb 12 17:08:33 2022
Return-Path: <hauke@Espresso.Rhein-Neckar.DE>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id D81491A923C
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 12 Feb 2022 17:08:33 +0000 (UTC)
Message-Id: <202202121707.21CH7wMv006526@pizza.causeuse.org>
Date: Sat, 12 Feb 2022 18:07:58 +0100 (CET)
From: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
Reply-To: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
To: gnats-bugs@NetBSD.org
Cc: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
Subject: wapbl lockdebug panic during tcpdump run
X-Send-Pr-Version: 3.95

>Number:         56705
>Category:       kern
>Synopsis:       wapbl lockdebug panic during tcpdump run
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Feb 12 17:10:00 +0000 2022
>Last-Modified:  Tue Feb 15 21:55:01 +0000 2022
>Originator:     Hauke Fath
>Release:        NetBSD 9.99.93
>Organization:
Falling Raindrops
>Environment:


System: NetBSD pizza 9.99.93 NetBSD 9.99.93 (BLACKBOX-$Revision: 1.85 $) #5: Fri Feb 11 21:11:10 CET 2022 hauke@pizza:/var/obj/netbsd-build-objects/developer/amd64/sys/arch/amd64/compile/BLACKBOX amd64
Architecture: x86_64
Machine: amd64
>Description:

	I am attempting to debug a client machine's hang during tcp
	transfers (here: an ftp session) by running tcpdump on the
	target (ftp server) machine. Unfortunately, the tcpdump
	frequently aborts with a 'no permission' error, and every few
	attempts the machine panics with a lockdebug error in a wapbl
	write

	<ftp://ftp.causeuse.org/pub/NetBSD/tcpdump-panic.gif>

	The USB console keyboard is dead at that point, and the
	machine swaps to a raidframe mirror, which cannot be dumped to.

	Strangely enough, the directory that tcpdump writes the pcap
	file to is on zfs, and so is the directory ftp'ed to.



>How-To-Repeat:

	tcpdump an incoming ftp transfer on the ftp server side. Watch
	the machine panic on (roughly) every fifth attempt.


>Fix:
	Yes, please.



>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/56705: wapbl lockdebug panic during tcpdump run
Date: Sat, 12 Feb 2022 17:16:44 +0000

 [resending to cc gnats-bugs]

 I bet wapbl is a red herring.

 Can you do `show panic'?  If it says `kernel lock spinout', that means
 that something else was hogging the kernel lock and the attempt to
 acquire it in bdev_strategy via wapbl happens to be the one that got
 bored of waiting and panicked.

 Can you show `ps' output, and then `bt/a ffff...' for all of the lines
 with `>' on them?

 Can you get a crash dump?

From: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/56705: wapbl lockdebug panic during tcpdump run
Date: Sat, 12 Feb 2022 18:28:14 +0100

 On Sat, 12 Feb 2022 17:20:01 +0000 (UTC), Taylor R Campbell wrote:
 >  Can you show [...]

 As mentioned, the USB console keyboard is dead at that point. I have 
 found and attached a ps2 keyboard (fortunately, the board is that 
 traditional), and am working on reproducing the panic.

 >  Can you get a crash dump?

 swap on raid0b, so no.

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56705: wapbl lockdebug panic during tcpdump run
Date: Sat, 12 Feb 2022 19:11:34 +0100

 On Sat, Feb 12, 2022 at 05:30:02PM +0000, Hauke Fath wrote:
 >  >  Can you get a crash dump?
 >  
 >  swap on raid0b, so no.

 Why "so no"? Have you tried?

 Martin

From: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org
Subject: Re: kern/56705: wapbl lockdebug panic during tcpdump run
Date: Sat, 12 Feb 2022 22:21:43 +0100

 On Sat, 12 Feb 2022 18:15:01 +0000 (UTC), Martin Husemann wrote:
 >  On Sat, Feb 12, 2022 at 05:30:02PM +0000, Hauke Fath wrote:
 >  >  >  Can you get a crash dump?
 >  >  
 >  >  swap on raid0b, so no.
 >  
 >  Why "so no"? Have you tried?

 Well, the machine tried (and failed)... and a quick search comes up 
 with <https://marc.info/?l=netbsd-port-i386&m=109042024503576>.

From: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
To: Greg Oster <oster@netbsd.org>
Cc: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/56705: wapbl lockdebug panic during tcpdump run
Date: Sat, 12 Feb 2022 22:55:47 +0100

 On Sat, 12 Feb 2022 15:35:16 -0600, Greg Oster wrote:
 > kernel core dumps to swap on RAID 1 sets should be working.  If they 
 > arn't, that's a bug.  (initial implementation was in 2007, with some 
 > fixes in 2016... but perhaps crash dumps to RAID 1 swap isn't as 
 > widely advertised as it might otherwise be...)

 Good to know, thanks. 

 The panic appears to leave the machine in bad shape, though; one time I 
 got recurring panics during network setup (an re(4) with several vlans 
 on it), until I power-cycled the machine.

From: Joerg Sonnenberger <joerg@bec.de>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
Subject: Re: kern/56705: wapbl lockdebug panic during tcpdump run
Date: Sat, 12 Feb 2022 23:36:08 +0100

 Am Sat, Feb 12, 2022 at 05:30:02PM +0000 schrieb Hauke Fath:
 > The following reply was made to PR kern/56705; it has been noted by GNATS.
 > 
 > From: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
 > To: gnats-bugs@netbsd.org
 > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org
 > Subject: Re: kern/56705: wapbl lockdebug panic during tcpdump run
 > Date: Sat, 12 Feb 2022 18:28:14 +0100
 > 
 >  On Sat, 12 Feb 2022 17:20:01 +0000 (UTC), Taylor R Campbell wrote:
 >  >  Can you show [...]
 >  
 >  As mentioned, the USB console keyboard is dead at that point. I have 
 >  found and attached a ps2 keyboard (fortunately, the board is that 
 >  traditional), and am working on reproducing the panic.

 ddb.commandonenter can be used if you can reproduce it.

 Joerg

From: Greg Oster <oster@netbsd.org>
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
 Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
Cc: 
Subject: Re: kern/56705: wapbl lockdebug panic during tcpdump run
Date: Sat, 12 Feb 2022 15:35:16 -0600

 On 2022-02-12 15:25, Hauke Fath wrote:
 > The following reply was made to PR kern/56705; it has been noted by GNATS.
 > 
 > From: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
 > To: gnats-bugs@NetBSD.org
 > Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org
 > Subject: Re: kern/56705: wapbl lockdebug panic during tcpdump run
 > Date: Sat, 12 Feb 2022 22:21:43 +0100
 > 
 >   On Sat, 12 Feb 2022 18:15:01 +0000 (UTC), Martin Husemann wrote:
 >   >  On Sat, Feb 12, 2022 at 05:30:02PM +0000, Hauke Fath wrote:
 >   >  >  >  Can you get a crash dump?
 >   >  >
 >   >  >  swap on raid0b, so no.
 >   >
 >   >  Why "so no"? Have you tried?
 >   
 >   Well, the machine tried (and failed)... and a quick search comes up
 >   with <https://marc.info/?l=netbsd-port-i386&m=109042024503576>.
 >   

 kernel core dumps to swap on RAID 1 sets should be working.  If they 
 arn't, that's a bug.  (initial implementation was in 2007, with some 
 fixes in 2016... but perhaps crash dumps to RAID 1 swap isn't as widely 
 advertised as it might otherwise be...)

 Later...

 Greg Oster

From: Hauke Fath <hauke@Espresso.Rhein-Neckar.DE>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org
Subject: Re: kern/56705: wapbl lockdebug panic during tcpdump run
Date: Tue, 15 Feb 2022 22:49:03 +0100

 Another panic, finally... a different one.

 At 17:20 Uhr +0000 12.02.2022, Taylor R Campbell wrote:
 > Can you do `show panic'?  If it says `kernel lock spinout', [...]

 It did.

 > Can you show `ps' output, and then `bt/a ffff...' for all of the lines
 > with `>' on them?

 Screenshots at <ftp://ftp.causeuse.org/pub/NetBSD/kern-56705/>. I should
 really set up the machine for serial console.

 > Can you get a crash dump?

 a 'reboot 0x100' resulted in 'bad dumpdev', so apparently not.

 Cheerio,
 Hauke


 --
 "It's never straight up and down"     (DEVO)


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.