NetBSD Problem Report #58387
From john@ziaspace.com Mon Jul 1 19:56:58 2024
Return-Path: <john@ziaspace.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits)
client-signature RSA-PSS (2048 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 11E521A923A
for <gnats-bugs@gnats.NetBSD.org>; Mon, 1 Jul 2024 19:56:58 +0000 (UTC)
Message-Id: <202407011956.461JunVW000353@anath.zia.io>
Date: Mon, 1 Jul 2024 19:56:49 GMT
From: john@ziaspace.com
Reply-To: john@ziaspace.com
To: gnats-bugs@NetBSD.org
Subject: Adding vlan to bridge causes lockup
X-Send-Pr-Version: 3.95
>Number: 58387
>Category: kern
>Synopsis: Adding vlan to bridge causes lockup
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Jul 01 20:00:05 +0000 2024
>Last-Modified: Mon Jul 01 20:50:02 +0000 2024
>Originator: John Klos
>Release: NetBSD 10.0_STABLE
>Organization:
>Environment:
System: NetBSD cat.zia.io 10.0_STABLE NetBSD 10.0_STABLE (GENERIC) #0: Thu Jun 27 05:26:35 UTC 2024 john@cat.zia.io:/usr/obj-amd64/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
When adding a vlan interface to a bridge, things work as expected for a minute or two.
However, after a short time, the kernel locks up completely. Can't get in to kernel
debugger because USB stops working.
How-To-Repeat:
<code/input/activities to reproduce the problem (multiple lines)>
ifconfig.vlan0:
create
vlan 108 vlanif bge1
up
Then:
ifconfig bridge0 create
ifconfig bridge0 up
ifconfig tap0 create
ifconfig tap0 up
brconfig bridge0 add vlan0 add tap0
It works for a very short amount of time (perhaps a minute or so).
savecore reported this:
Checking for core dump...
savecore: msgbuf magic incorrect (f000eef3f000eef3 != 63061)
savecore: reboot after panic: lock error: Mutex: mutex_vector_enter,548: locking against myself: lock 0xffffe77938557080 cpu 0 lwp 0xffffe779374eb480
savecore: system went down at Mon Jul 1 19:10:43 2024
savecore: /var/crash/bounds: No such file or directory
savecore: writing compressed core to /var/crash/netbsd.0.core.gz
savecore: writing compressed kernel to /var/crash/netbsd.0.gz
(ungzipped) netbsd.0 was only 2420857 bytes in size, which is clearly incorrect (/netbsd is 29513568). Also:
dmesg -M netbsd.0.core -N netbsd.0
dmesg: kvm_nlist: bad namelist
dmesg -M netbsd.0.core -N /netbsd
dmesg: magic number incorrect
>Fix:
>Audit-Trail:
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/58387: Adding vlan to bridge causes lockup
Date: Mon, 1 Jul 2024 22:32:58 +0200
On Mon, Jul 01, 2024 at 08:00:05PM +0000, john@ziaspace.com wrote:
> >Number: 58387
> >Category: kern
> >Synopsis: Adding vlan to bridge causes lockup
> >Confidential: no
> >Severity: serious
> >Priority: medium
> >Responsible: kern-bug-people
> >State: open
> >Class: sw-bug
> >Submitter-Id: net
> >Arrival-Date: Mon Jul 01 20:00:05 +0000 2024
> >Originator: John Klos
> >Release: NetBSD 10.0_STABLE
> >Organization:
>
> >Environment:
>
>
> System: NetBSD cat.zia.io 10.0_STABLE NetBSD 10.0_STABLE (GENERIC) #0: Thu Jun 27 05:26:35 UTC 2024 john@cat.zia.io:/usr/obj-amd64/sys/arch/amd64/compile/GENERIC amd64
> Architecture: x86_64
> Machine: amd64
> >Description:
>
> When adding a vlan interface to a bridge, things work as expected for a minute or two.
> However, after a short time, the kernel locks up completely. Can't get in to kernel
> debugger because USB stops working.
> How-To-Repeat:
> <code/input/activities to reproduce the problem (multiple lines)>
> ifconfig.vlan0:
> create
> vlan 108 vlanif bge1
> up
>
> Then:
> ifconfig bridge0 create
> ifconfig bridge0 up
> ifconfig tap0 create
> ifconfig tap0 up
> brconfig bridge0 add vlan0 add tap0
I'm not sure it's related to vlan; I'm using vlans with bridges here
(on wm or ixl interfaces) without issue.
It may be related to bge, or tap
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: John Klos <john@klos.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/58387: Adding vlan to bridge causes lockup
Date: Mon, 1 Jul 2024 20:47:33 +0000 (UTC)
> I'm not sure it's related to vlan; I'm using vlans with bridges here
> (on wm or ixl interfaces) without issue.
> It may be related to bge, or tap
That's true - it may be related to something other than vlan.
I should add, in that case, that this system, along with others, runs
bridges with multiple taps, also bridged to bge*, for various qemu VMs and
for tinc.
This came up when I tried to set up a bridge and vlan (on bge) to a tap
for tinc that was previously running for months with the tap bridged
directly to bge, and that never had issues.
OOB serial can be arranged if anyone wants to take a poke. It seems
quickly and easily reproducible.
Thanks,
John
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.