NetBSD Problem Report #51877
From hf@spg.tu-darmstadt.de Sat Jan 14 15:40:36 2017
Return-Path: <hf@spg.tu-darmstadt.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 864817A111
for <gnats-bugs@gnats.NetBSD.org>; Sat, 14 Jan 2017 15:40:36 +0000 (UTC)
Message-Id: <201701141540.v0EFeWVv001108@Zinnenwand.nt.e-technik.tu-darmstadt.de>
Date: Sat, 14 Jan 2017 16:40:32 +0100 (CET)
From: Hauke Fath <hf@spg.tu-darmstadt.de>
Reply-To: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org
Cc: Hauke Fath <hf@spg.tu-darmstadt.de>
Subject: carp related panic during shutdown
X-Send-Pr-Version: 3.95
>Number: 51877
>Category: kern
>Synopsis: carp related panic during shutdown
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Jan 14 15:45:00 +0000 2017
>Last-Modified: Thu Jan 19 08:30:00 +0000 2017
>Originator: Hauke Fath
>Release: NetBSD 7.99.58
>Organization:
Technische Universitaet Darmstadt
>Environment:
System: NetBSD Zinnenwand 7.99.58 NetBSD 7.99.58 (FIFI-$Revision$) #4: Fri Jan 13 13:20:31 CET 2017 hf@Hochstuhl:/var/obj/netbsd-builds/developer/amd64/sys/arch/amd64/compile/NFIFI amd64
Architecture: x86_64
Machine: amd64
>Description:
A router set up with carp(4) for redundancy panics
reproducibly during shutdown:
[...]
igphy3: detached
wm3: detached
igphy2: detached
wm2: detached
igphy1: detached
wm1: detached
igphy0: detached
carp0: state transition from: MASTER -> to: INIT
uvm_fault(0xfffffe821bbfde70, 0x10000, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff8040783d cs 8 rflags 10206 cr2 101e7 ilevel 6 rsp fffffe810f979650
curlwp 0xfffffe821d3d48c0 pid 178.1 lowest kstack 0xfffffe810f9762c0
Skipping crash dump on recursive panic
panic: trap
cpu3: Begin traceback...
vpanic() at netbsd:vpanic+0x140
snprintf() at netbsd:snprintf
trap() at netbsd:trap+0xb86
--- trap (number 6) ---
mutex_oncpu() at netbsd:mutex_oncpu+0x27
mutex_vector_enter() at netbsd:mutex_vector_enter+0xad
rt_update_wait() at netbsd:rt_update_wait+0x10
_rt_free() at netbsd:_rt_free+0x11
rtrequest1() at netbsd:rtrequest1+0x5ef
rtrequest() at netbsd:rtrequest+0x3e
carp_setroute() at netbsd:carp_setroute+0xd3
carp_setrun() at netbsd:carp_setrun+0x56
carp_carpdev_state() at netbsd:carp_carpdev_state+0x6c
if_down() at netbsd:if_down+0x17d
if_detach() at netbsd:if_detach+0x1d8
wm_detach() at netbsd:wm_detach+0xc1
config_detach() at netbsd:config_detach+0xf8
config_detach_all() at netbsd:config_detach_all+0x97
cpu_reboot() at netbsd:cpu_reboot+0x176
sys_reboot() at netbsd:sys_reboot+0x75
syscall() at netbsd:syscall+0x1df
--- syscall (number 208) ---
6fcc3443e43a:
cpu3: End traceback...
rebooting...
>How-To-Repeat:
Set up a netbsd current machine pair with multiple carp(4)
interfaces. Reboot, and watch it panic during shutdown.
>Fix:
Yes, please.
>Audit-Trail:
From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Sat, 14 Jan 2017 17:04:49 +0100
On Sat, 14 Jan 2017 15:45:00 +0000 (UTC), Hauke Fath wrote:
> A router set up with carp(4) for redundancy panics
> reproducibly during shutdown:
FWIW, the panic "works" with both pf(4) and npf(4).
From: Ryota Ozaki <ozaki-r@netbsd.org>
To: Hauke Fath <hf@spg.tu-darmstadt.de>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Mon, 16 Jan 2017 15:58:02 +0900
On Sun, Jan 15, 2017 at 1:10 AM, Hauke Fath <hf@spg.tu-darmstadt.de> wrote:
> The following reply was made to PR kern/51877; it has been noted by GNATS.
>
> From: Hauke Fath <hf@spg.tu-darmstadt.de>
> To: gnats-bugs@NetBSD.org
> Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org
> Subject: Re: kern/51877: carp related panic during shutdown
> Date: Sat, 14 Jan 2017 17:04:49 +0100
>
> On Sat, 14 Jan 2017 15:45:00 +0000 (UTC), Hauke Fath wrote:
> > A router set up with carp(4) for redundancy panics
> > reproducibly during shutdown:
>
> FWIW, the panic "works" with both pf(4) and npf(4).
>
Can you try with DEBUG && LOCKDEBUG if not enabled?
And can you show me states of carp0 and routes just before shutdown?
(ifconfig carp0 and netstat -nr -f inet)
Thanks,
ozaki-r
From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: Ryota Ozaki <ozaki-r@netbsd.org>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, gnats-admin@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Mon, 16 Jan 2017 13:14:38 +0100
On 01/16/17 07:58, Ryota Ozaki wrote:
> Can you try with DEBUG && LOCKDEBUG if not enabled?
>
> And can you show me states of carp0 and routes just before shutdown?
> (ifconfig carp0 and netstat -nr -f inet)
Booting a 7.99.59 pf DEBUG/LOCKDEBUG/DIAGNOSTIC kernel from today's
sources on the carp(4) secondary machine, dmesg has:
[...]
IPv6 mode: router
Configuring network interfaces: wm0 ixg0 wm4wm4: link state DOWN (was
UNKNOWN)
vlan2 vlan3 vlan7 vlan8 vlan9 vlan10 vlan11 vlan12 carp0ifconfig:
SIOCAIFADDR_IN6: Can'tcarp2: state transition from: I
assign requested address
carp3: state transition from: INIT -> to: BACKUP
carp2 carp3 carp7carp7: state transition from: INIT -> to: BACKUP
carp8carp8: state transition from: INIT -> to: BACKUP
carp9carp9: state transition from: INIT -> to: BACKUP
carp10carp10: state transition from: INIT -> to: BACKUP
carp11carp11: state transition from: INIT -> to: BACKUP
carp12carp12: state transition from: INIT -^@> to: BACKUP
pfsync0.
[...]
- note the mangled "Can't assign requested address" message - the -7
kernel doesn't have that.
# ifconfig carp0
carp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
enabled=0
carp: MASTER carpdev wm0 vhid 1 advbase 1 advskew 192
address: 00:00:5e:00:01:01
inet 130.83.42.73 netmask 0xfffffff8 broadcast 130.83.42.79
# netstat -nr -f inet
Routing tables
Internet:
Destination Gateway Flags Refs Use Mtu
Interface
default 130.83.42.78 UGS - - -L wm0
10.0.49/24 link#14 UC - - - vlan10
10.0.49.252 link#14 UHL - - - lo0
127/8 127.0.0.1 UGRS - - 33624 lo0
127.0.0.1 lo0 UH - - 33624 lo0
130.83.18.0/26 link#15 UC - - - vlan11
130.83.18.60 link#15 UHL - - - lo0
130.83.18.64/26 link#16 UC - - - vlan12
130.83.18.124 link#16 UHL - - - lo0
130.83.18.128/26 link#13 UC - - - vlan9
130.83.18.188 link#13 UHL - - - lo0
130.83.18.192/26 link#12 UC - - - vlan8
130.83.18.252 link#12 UHL - - - lo0
130.83.42.72/29 link#3 UC - - - wm0
130.83.42.73 130.83.42.73 UH - - - carp0
130.83.42.75 link#3 UHL - - - lo0
130.83.197.0/28 link#10 UC - - - vlan3
130.83.197.0/27 link#18 UC - - - carp2
130.83.197.11 link#10 UHL - - - lo0
130.83.197.16/28 link#9 UC - - - vlan2
130.83.197.28 link#9 UHL - - - lo0
130.83.228.0/26 link#11 UC - - - vlan7
130.83.228.60 link#11 UHL - - - lo0
192.168.27.0/28 link#7 UC - - - wm4
192.168.27.12 link#7 UHL - - - lo0
# shutdown -r now
Shutdown NOW!
[...]
Done running shutdown hooks.
Jan 16 12:55:32 Zinnenwand syslogd[433]: Exiting on signal 15
carp0: incorrect hash from 130.83.42.74
carp0: incorrect hash from 130.83.42.74
carp0: incorrect hash from 130.83.42.74
syncing disks... done
[...]
igphy3: detached
wm3: detached
igphy2: detached
wm2: detached
igphy1: detached
wm1: detached
igphy0: detached
carp0: state transition from: MASTER -> to: INIT
Mutex error: lockdebug_barrier: spin lock held
lock address : 0xfffffe821e74f400 type : spin
initialized : 0xffffffff80426c5a
shared holds : 0 exclusive: 1
shares wanted: 0 exclusive: 0
current cpu : 2 last held: 2
current lwp : 0xfffffe810fc42000 last held: 0xfffffe810fc42000
last locked* : 0xffffffff8044b97a unlocked : 0xffffffff8046cfc4
owner field : 0x0000000000010700 wait/spin: 0/1
Skipping crash dump on recursive panic
panic: LOCKDEBUG: Mutex error: lockdebug_barrier: spin lock held
cpu2: Begin traceback...
vpanic() at netbsd:vpanic+0x140
snprintf() at netbsd:snprintf
lockdebug_more() at netbsd:lockdebug_more
rw_enter() at netbsd:rw_enter+0x5fe
uvm_fault_internal() at netbsd:uvm_fault_internal+0x161
trap() at netbsd:trap+0x30a
--- trap (number 6) ---
mutex_tryenter() at netbsd:mutex_tryenter+0x12
lwp_trylock() at netbsd:lwp_trylock+0x17
turnstile_block() at netbsd:turnstile_block+0x238
mutex_enter() at netbsd:mutex_enter+0x36c
rt_update_wait() at netbsd:rt_update_wait+0x10
_rt_free() at netbsd:_rt_free+0x11
rtrequest1() at netbsd:rtrequest1+0x5ef
rtrequest() at netbsd:rtrequest+0x3e
carp_setroute() at netbsd:carp_setroute+0xd3
carp_setrun() at netbsd:carp_setrun+0x56
carp_carpdev_state() at netbsd:carp_carpdev_state+0x6c
if_down() at netbsd:if_down+0x17d
if_detach() at netbsd:if_detach+0x1d8
wm_detach() at netbsd:wm_detach+0xc1
config_detach() at netbsd:config_detach+0xf8
config_detach_all() at netbsd:config_detach_all+0x97
cpu_reboot() at netbsd:cpu_reboot+0x176
sys_reboot() at netbsd:sys_reboot+^@0x75
syscall() at netbsd:syscall+0x1e8
--- syscall (number 208) ---
7413ea83c99a:
cpu2: End traceback...
rebooting...
-- the primary carp(4) machine was still running -7 at the time. Maybe
the "carp0: incorrect hash from 130.83.42.74" messages (which the -7
kernel does not show) was due to this offset.
OTOH, there is no similar message from the other carp* interfaces, but
then they are all on vlans; carp0 is the only carp on a physical interface.
HTH,
hauke
From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Mon, 16 Jan 2017 13:35:59 +0100
On Mon, 16 Jan 2017 12:15:01 +0000 (UTC), Hauke Fath wrote:
> -- the primary carp(4) machine was still running -7 at the time. Maybe
> the "carp0: incorrect hash from 130.83.42.74" messages (which the -7
> kernel does not show) was due to this offset.
The primary machine logs "incorrect hash" messages for all carped
interfaces, presumably the kernels need to be identical.
From: Ryota Ozaki <ozaki-r@netbsd.org>
To: Hauke Fath <hf@spg.tu-darmstadt.de>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Tue, 17 Jan 2017 13:41:01 +0900
On Mon, Jan 16, 2017 at 9:14 PM, Hauke Fath <hf@spg.tu-darmstadt.de> wrote:
> On 01/16/17 07:58, Ryota Ozaki wrote:
>>
>> Can you try with DEBUG && LOCKDEBUG if not enabled?
>>
>> And can you show me states of carp0 and routes just before shutdown?
>> (ifconfig carp0 and netstat -nr -f inet)
>
>
> Booting a 7.99.59 pf DEBUG/LOCKDEBUG/DIAGNOSTIC kernel from today's sources
> on the carp(4) secondary machine, dmesg has:
>
> [...]
> IPv6 mode: router
> Configuring network interfaces: wm0 ixg0 wm4wm4: link state DOWN (was
> UNKNOWN)
> vlan2 vlan3 vlan7 vlan8 vlan9 vlan10 vlan11 vlan12 carp0ifconfig:
> SIOCAIFADDR_IN6: Can'tcarp2: state transition from: I
> assign requested address
> carp3: state transition from: INIT -> to: BACKUP
> carp2 carp3 carp7carp7: state transition from: INIT -> to: BACKUP
> carp8carp8: state transition from: INIT -> to: BACKUP
> carp9carp9: state transition from: INIT -> to: BACKUP
> carp10carp10: state transition from: INIT -> to: BACKUP
> carp11carp11: state transition from: INIT -> to: BACKUP
> carp12carp12: state transition from: INIT -^@> to: BACKUP
> pfsync0.
> [...]
>
> - note the mangled "Can't assign requested address" message - the -7 kernel
> doesn't have that.
>
> # ifconfig carp0
> carp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
> capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
> capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
> enabled=0
> carp: MASTER carpdev wm0 vhid 1 advbase 1 advskew 192
> address: 00:00:5e:00:01:01
> inet 130.83.42.73 netmask 0xfffffff8 broadcast 130.83.42.79
> # netstat -nr -f inet
> Routing tables
>
> Internet:
> Destination Gateway Flags Refs Use Mtu
> Interface
> default 130.83.42.78 UGS - - -L wm0
> 10.0.49/24 link#14 UC - - - vlan10
> 10.0.49.252 link#14 UHL - - - lo0
> 127/8 127.0.0.1 UGRS - - 33624 lo0
> 127.0.0.1 lo0 UH - - 33624 lo0
> 130.83.18.0/26 link#15 UC - - - vlan11
> 130.83.18.60 link#15 UHL - - - lo0
> 130.83.18.64/26 link#16 UC - - - vlan12
> 130.83.18.124 link#16 UHL - - - lo0
> 130.83.18.128/26 link#13 UC - - - vlan9
> 130.83.18.188 link#13 UHL - - - lo0
> 130.83.18.192/26 link#12 UC - - - vlan8
> 130.83.18.252 link#12 UHL - - - lo0
> 130.83.42.72/29 link#3 UC - - - wm0
> 130.83.42.73 130.83.42.73 UH - - - carp0
> 130.83.42.75 link#3 UHL - - - lo0
> 130.83.197.0/28 link#10 UC - - - vlan3
> 130.83.197.0/27 link#18 UC - - - carp2
> 130.83.197.11 link#10 UHL - - - lo0
> 130.83.197.16/28 link#9 UC - - - vlan2
> 130.83.197.28 link#9 UHL - - - lo0
> 130.83.228.0/26 link#11 UC - - - vlan7
> 130.83.228.60 link#11 UHL - - - lo0
> 192.168.27.0/28 link#7 UC - - - wm4
> 192.168.27.12 link#7 UHL - - - lo0
> # shutdown -r now
> Shutdown NOW!
>
> [...]
>
> Done running shutdown hooks.
> Jan 16 12:55:32 Zinnenwand syslogd[433]: Exiting on signal 15
> carp0: incorrect hash from 130.83.42.74
> carp0: incorrect hash from 130.83.42.74
> carp0: incorrect hash from 130.83.42.74
> syncing disks... done
>
> [...]
>
> igphy3: detached
> wm3: detached
> igphy2: detached
> wm2: detached
> igphy1: detached
> wm1: detached
> igphy0: detached
> carp0: state transition from: MASTER -> to: INIT
> Mutex error: lockdebug_barrier: spin lock held
>
> lock address : 0xfffffe821e74f400 type : spin
> initialized : 0xffffffff80426c5a
> shared holds : 0 exclusive: 1
> shares wanted: 0 exclusive: 0
> current cpu : 2 last held: 2
> current lwp : 0xfffffe810fc42000 last held: 0xfffffe810fc42000
> last locked* : 0xffffffff8044b97a unlocked : 0xffffffff8046cfc4
> owner field : 0x0000000000010700 wait/spin: 0/1
>
> Skipping crash dump on recursive panic
> panic: LOCKDEBUG: Mutex error: lockdebug_barrier: spin lock held
The mutex error happened because uvm_fault_internal tries to hold
a rwlock with holding a spin mutex. Can you identify the spin mutex
by dissembling the kernel? The addresses above such as "last locked"
will help to explore.
That said, I guess the spin mutex is held after the fault below.
(If a spin mutex is held before the fault, the below mutex_enter
should fail with the same mutex error.)
> cpu2: Begin traceback...
> vpanic() at netbsd:vpanic+0x140
> snprintf() at netbsd:snprintf
> lockdebug_more() at netbsd:lockdebug_more
> rw_enter() at netbsd:rw_enter+0x5fe
> uvm_fault_internal() at netbsd:uvm_fault_internal+0x161
> trap() at netbsd:trap+0x30a
> --- trap (number 6) ---
> mutex_tryenter() at netbsd:mutex_tryenter+0x12
> lwp_trylock() at netbsd:lwp_trylock+0x17
> turnstile_block() at netbsd:turnstile_block+0x238
> mutex_enter() at netbsd:mutex_enter+0x36c
I don't know why a fault happens inside mutex_enter. It's a global
adaptive mutex that is never destroyed and stable. And the fault
happened on a different place from the fault of the first report.
Something broken around the mutex...?
Just in case could you clean-build tools and the kernel and try again?
If it doesn't help could you comment out rt_update_wait in _rt_free
and try? Actually rt_update_wait isn't needed if !NET_MPSAFE for now.
Thanks,
ozaki-r
From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: Ryota Ozaki <ozaki-r@netbsd.org>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>,
kern-bug-people@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Tue, 17 Jan 2017 09:00:26 +0100
On Tue, 17 Jan 2017 13:41:01 +0900, Ryota Ozaki wrote:
>> Skipping crash dump on recursive panic
>> panic: LOCKDEBUG: Mutex error: lockdebug_barrier: spin lock held
>
> The mutex error happened because uvm_fault_internal tries to hold
> a rwlock with holding a spin mutex. Can you identify the spin mutex
> by dissembling the kernel? The addresses above such as "last locked"
> will help to explore.
Before I am off to work - that would be gdb? ddb? Do I need
makeoptions DEBUG="-g"
for that?
[...]
>> mutex_enter() at netbsd:mutex_enter+0x36c
>
> I don't know why a fault happens inside mutex_enter. It's a global
> adaptive mutex that is never destroyed and stable. And the fault
> happened on a different place from the fault of the first report.
> Something broken around the mutex...?
>
> Just in case could you clean-build tools and the kernel and try again?
Sure, will do.
> If it doesn't help could you comment out rt_update_wait in _rt_free
> and try?
Before I try and err... which source file are we talking about?
> Actually rt_update_wait isn't needed if !NET_MPSAFE for now.
Cheerio,
hauke
From: Ryota Ozaki <ozaki-r@netbsd.org>
To: Hauke Fath <hf@spg.tu-darmstadt.de>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Tue, 17 Jan 2017 17:44:04 +0900
On Tue, Jan 17, 2017 at 5:00 PM, Hauke Fath <hf@spg.tu-darmstadt.de> wrote:
> On Tue, 17 Jan 2017 13:41:01 +0900, Ryota Ozaki wrote:
>>> Skipping crash dump on recursive panic
>>> panic: LOCKDEBUG: Mutex error: lockdebug_barrier: spin lock held
>>
>> The mutex error happened because uvm_fault_internal tries to hold
>> a rwlock with holding a spin mutex. Can you identify the spin mutex
>> by dissembling the kernel? The addresses above such as "last locked"
>> will help to explore.
>
> Before I am off to work - that would be gdb? ddb? Do I need
>
> makeoptions DEBUG="-g"
>
> for that?
The option isn't required.
Without that, you can explore by something like this:
objdump -d netbsd |grep -30 ffffffff8044b97a
If you want to see with the source code:
./build.sh ... kernel.gdb=NFIFI # without -u
objdump -S -d netbsd.gdb |grep -30 ffffffff8044b97a # 30 may not be enough
>
> [...]
>>> mutex_enter() at netbsd:mutex_enter+0x36c
>>
>> I don't know why a fault happens inside mutex_enter. It's a global
>> adaptive mutex that is never destroyed and stable. And the fault
>> happened on a different place from the fault of the first report.
>> Something broken around the mutex...?
>>
>> Just in case could you clean-build tools and the kernel and try again?
>
> Sure, will do.
>
>> If it doesn't help could you comment out rt_update_wait in _rt_free
>> and try?
>
> Before I try and err... which source file are we talking about?
sys/net/route.c is.
Thanks,
ozaki-r
From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: Ryota Ozaki <ozaki-r@netbsd.org>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>,
kern-bug-people@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Tue, 17 Jan 2017 10:59:46 +0100
This is a multi-part message in MIME format.
--Multipart_20170117105946307115
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
On Tue, 17 Jan 2017 13:41:01 +0900, Ryota Ozaki wrote:
> The mutex error happened because uvm_fault_internal tries to hold
> a rwlock with holding a spin mutex. Can you identify the spin mutex
> by dissembling the kernel?
Attached.
--Multipart_20170117105946307115
Content-Type: application/octet-stream; name=pr51877_objdump.lst
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename=pr51877_objdump.lst
ffffffff8044b8b1:=09be 07 00 00 00 =09mov $0x7,%esi
ffffffff8044b8b6:=09bf 02 00 00 00 =09mov $0x2,%edi
ffffffff8044b8bb:=09e8 63 b3 fd ff =09callq ffffffff80426c23 <mutex_=
obj_alloc>
ffffffff8044b8c0:=0948 89 43 f8 =09mov %rax,-0x8(%rbx)
ffffffff8044b8c4:=0948 83 c3 10 =09add $0x10,%rbx
ffffffff8044b8c8:=0948 81 fb 88 a3 7c 80 =09cmp $0xffffffff807ca388,%rbx
ffffffff8044b8cf:=0975 d9 =09jne ffffffff8044b8aa <turnst=
ile_init+0x10>
ffffffff8044b8d1:=0948 c7 44 24 18 00 00 =09movq $0x0,0x18(%rsp)
ffffffff8044b8d8:=0900 00=20
ffffffff8044b8da:=0948 c7 44 24 10 00 00 =09movq $0x0,0x10(%rsp)
ffffffff8044b8e1:=0900 00=20
ffffffff8044b8e3:=0948 c7 44 24 08 20 b8 =09movq $0xffffffff8044b820,0x8(=
%rsp)
ffffffff8044b8ea:=0944 80=20
ffffffff8044b8ec:=09c7 04 24 00 00 00 00 =09movl $0x0,(%rsp)
ffffffff8044b8f3:=0945 31 c9 =09xor %r9d,%r9d
ffffffff8044b8f6:=0949 c7 c0 13 13 6e 80 =09mov $0xffffffff806e1313,%r8
ffffffff8044b8fd:=0931 c9 =09xor %ecx,%ecx
ffffffff8044b8ff:=0931 d2 =09xor %edx,%edx
ffffffff8044b901:=0931 f6 =09xor %esi,%esi
ffffffff8044b903:=09bf 60 00 00 00 =09mov $0x60,%edi
ffffffff8044b908:=09e8 72 88 01 00 =09callq ffffffff8046417f <pool_c=
ache_init>
ffffffff8044b90d:=0948 89 05 24 01 38 00 =09mov %rax,0x380124(%rip) =
# ffffffff807cba38 <turnstile_cache>
ffffffff8044b914:=0948 85 c0 =09test %rax,%rax
ffffffff8044b917:=0974 16 =09je ffffffff8044b92f <turnst=
ile_init+0x95>
ffffffff8044b919:=0931 d2 =09xor %edx,%edx
ffffffff8044b91b:=0948 c7 c6 80 b1 89 80 =09mov $0xffffffff8089b180,%rsi
ffffffff8044b922:=0931 ff =09xor %edi,%edi
ffffffff8044b924:=0948 83 c4 28 =09add $0x28,%rsp
ffffffff8044b928:=095b =09pop %rbx
ffffffff8044b929:=095d =09pop %rbp
ffffffff8044b92a:=09e9 f1 fe ff ff =09jmpq ffffffff8044b820 <turnst=
ile_ctor>
ffffffff8044b92f:=0941 b8 66 00 00 00 =09mov $0x66,%r8d
ffffffff8044b935:=0948 c7 c1 b0 14 6e 80 =09mov $0xffffffff806e14b0,%rcx
ffffffff8044b93c:=0948 c7 c2 1c 13 6e 80 =09mov $0xffffffff806e131c,%rdx
ffffffff8044b943:=0948 c7 c6 f7 92 68 80 =09mov $0xffffffff806892f7,%rsi
ffffffff8044b94a:=0948 c7 c7 b0 92 68 80 =09mov $0xffffffff806892b0,%rdi
ffffffff8044b951:=09e8 07 68 11 00 =09callq ffffffff8056215d <kern_a=
ssert>
ffffffff8044b956:=09eb c1 =09jmp ffffffff8044b919 <turnst=
ile_init+0x7f>
ffffffff8044b958 <turnstile_lookup>:
ffffffff8044b958:=0955 =09push %rbp
ffffffff8044b959:=0948 89 e5 =09mov %rsp,%rbp
ffffffff8044b95c:=0941 54 =09push %r12
ffffffff8044b95e:=0953 =09push %rbx
ffffffff8044b95f:=0948 89 fb =09mov %rdi,%rbx
ffffffff8044b962:=094c 8d 24 3f =09lea (%rdi,%rdi,1),%r12
ffffffff8044b966:=0941 81 e4 f0 03 00 00 =09and $0x3f0,%r12d
ffffffff8044b96d:=0949 8b bc 24 80 9f 7c =09mov -0x7f836080(%r12),%rdi
ffffffff8044b974:=0980=20
ffffffff8044b975:=09e8 79 a6 fd ff =09callq ffffffff80425ff3 <mutex_=
enter>
ffffffff8044b97a:=0949 8b 84 24 88 9f 7c =09mov -0x7f836078(%r12),%rax
ffffffff8044b981:=0980=20
ffffffff8044b982:=0948 85 c0 =09test %rax,%rax
ffffffff8044b985:=0975 0a =09jne ffffffff8044b991 <turnst=
ile_lookup+0x39>
ffffffff8044b987:=09eb 13 =09jmp ffffffff8044b99c <turnst=
ile_lookup+0x44>
ffffffff8044b989:=0948 8b 00 =09mov (%rax),%rax
ffffffff8044b98c:=0948 85 c0 =09test %rax,%rax
ffffffff8044b98f:=0974 06 =09je ffffffff8044b997 <turnst=
ile_lookup+0x3f>
ffffffff8044b991:=0948 3b 58 18 =09cmp 0x18(%rax),%rbx
ffffffff8044b995:=0975 f2 =09jne ffffffff8044b989 <turnst=
ile_lookup+0x31>
ffffffff8044b997:=095b =09pop %rbx
ffffffff8044b998:=0941 5c =09pop %r12
ffffffff8044b99a:=095d =09pop %rbp
ffffffff8044b99b:=09c3 =09retq =20
ffffffff8044b99c:=0931 c0 =09xor %eax,%eax
ffffffff8044b99e:=0966 90 =09xchg %ax,%ax
ffffffff8044b9a0:=09eb f5 =09jmp ffffffff8044b997 <turnst=
ile_lookup+0x3f>
ffffffff8044b9a2 <turnstile_exit>:
ffffffff8044b9a2:=0955 =09push %rbp
ffffffff8044b9a3:=0948 89 e5 =09mov %rsp,%rbp
ffffffff8044b9a6:=0948 01 ff =09add %rdi,%rdi
ffffffff8044b9a9:=0981 e7 f0 03 00 00 =09and $0x3f0,%edi
ffffffff8044b9af:=0948 8b bf 80 9f 7c 80 =09mov -0x7f836080(%rdi),%rdi
ffffffff8044b9b6:=095d =09pop %rbp
ffffffff8044b9b7:=09e9 6b ad fd ff =09jmpq ffffffff80426727 <mutex_=
exit>
ffffffff8044b9bc <turnstile_block>:
ffffffff8044b9bc:=0955 =09push %rbp
ffffffff8044b9bd:=0948 89 e5 =09mov %rsp,%rbp
ffffffff8044b9c0:=0941 57 =09push %r15
ffffffff8044b9c2:=0941 56 =09push %r14
ffffffff8044b9c4:=0941 55 =09push %r13
ffffffff8044b9c6:=0941 54 =09push %r12
ffffffff8044b9c8:=0953 =09push %rbx
ffffffff8044b9c9:=0948 83 ec 28 =09sub $0x28,%rsp
ffffffff8044b9cd:=0949 89 ff =09mov %rdi,%r15
ffffffff8044b9d0:=094c 63 f6 =09movslq %esi,%r14
ffffffff8044b9d3:=0949 89 d5 =09mov %rdx,%r13
ffffffff8044b9d6:=0948 89 4d c0 =09mov %rcx,-0x40(%rbp)
ffffffff8044b9da:=0965 48 8b 1c 25 e8 01 =09mov %gs:0x1e8,%rbx
ffffffff8044b9e1:=0900 00=20
ffffffff8044b9e3:=0949 89 d4 =09mov %rdx,%r12
ffffffff8044b9e6:=0949 c1 ec 03 =09shr $0x3,%r12
ffffffff8044b9ea:=0941 83 e4 3f =09and $0x3f,%r12d
ffffffff8044b9ee:=0941 83 fe 01 =09cmp $0x1,%r14d
ffffffff8044b9f2:=090f 87 81 06 00 00 =09ja ffffffff8044c079 <turnst=
ile_block+0x6bd>
ffffffff8044b9f8:=094c 89 e0 =09mov %r12,%rax
ffffffff8044b9fb:=0948 c1 e0 04 =09shl $0x4,%rax
ffffffff8044b9ff:=0948 8b b8 80 9f 7c 80 =09mov -0x7f836080(%rax),%rdi
ffffffff8044ba06:=09e8 b9 ae fd ff =09callq ffffffff804268c4 <mutex_=
owned>
--Multipart_20170117105946307115--
From: Ryota Ozaki <ozaki-r@netbsd.org>
To: Hauke Fath <hf@spg.tu-darmstadt.de>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Tue, 17 Jan 2017 19:15:36 +0900
On Tue, Jan 17, 2017 at 6:59 PM, Hauke Fath <hf@spg.tu-darmstadt.de> wrote:
> On Tue, 17 Jan 2017 13:41:01 +0900, Ryota Ozaki wrote:
>> The mutex error happened because uvm_fault_internal tries to hold
>> a rwlock with holding a spin mutex. Can you identify the spin mutex
>> by dissembling the kernel?
>
> Attached.
Hmm, turnstile_lookup. I didn't know a spin mutex is used in
an adaptive mutex...
Anyway one more thing. Could you check where mutex_tryenter+0x12 is?
(objdump -d netbsd |grep -A 30 'mutex_tryenter>:' or something)
Thanks,
ozaki-r
From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: Ryota Ozaki <ozaki-r@netbsd.org>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>,
kern-bug-people@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Tue, 17 Jan 2017 11:24:06 +0100
This is a multi-part message in MIME format.
--Multipart_20170117112406695268
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
On Tue, 17 Jan 2017 19:15:36 +0900, Ryota Ozaki wrote:
> Could you check where mutex_tryenter+0x12 is?
> (objdump -d netbsd |grep -A 30 'mutex_tryenter>:' or something)
Attached.
--Multipart_20170117112406695268
Content-Type: application/octet-stream; name=pr51877_objdump_tryenter.lst
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename=pr51877_objdump_tryenter.lst
ffffffff80426901 <mutex_tryenter>:
ffffffff80426901:=0955 =09push %rbp
ffffffff80426902:=0948 89 e5 =09mov %rsp,%rbp
ffffffff80426905:=0941 56 =09push %r14
ffffffff80426907:=0941 55 =09push %r13
ffffffff80426909:=0941 54 =09push %r12
ffffffff8042690b:=0953 =09push %rbx
ffffffff8042690c:=0948 83 ec 10 =09sub $0x10,%rsp
ffffffff80426910:=0948 89 fb =09mov %rdi,%rbx
ffffffff80426913:=0948 8b 07 =09mov (%rdi),%rax
ffffffff80426916:=09a8 01 =09test $0x1,%al
ffffffff80426918:=090f 85 85 00 00 00 =09jne ffffffff804269a3 <mutex_=
tryenter+0xa2>
ffffffff8042691e:=0965 4c 8b 2c 25 e8 01 =09mov %gs:0x1e8,%r13
ffffffff80426925:=0900 00=20
ffffffff80426927:=094d 85 ed =09test %r13,%r13
ffffffff8042692a:=090f 84 dc 00 00 00 =09je ffffffff80426a0c <mutex_=
tryenter+0x10b>
ffffffff80426930:=094c 8b 23 =09mov (%rbx),%r12
ffffffff80426933:=0941 83 e4 04 =09and $0x4,%r12d
ffffffff80426937:=094c 89 ea =09mov %r13,%rdx
ffffffff8042693a:=094c 09 e2 =09or %r12,%rdx
ffffffff8042693d:=094c 89 e6 =09mov %r12,%rsi
ffffffff80426940:=0948 89 df =09mov %rbx,%rdi
ffffffff80426943:=09e8 a8 8a 13 00 =09callq ffffffff8055f3f0 <_atomi=
c_cas_64>
ffffffff80426948:=0949 89 c6 =09mov %rax,%r14
ffffffff8042694b:=09e8 c0 8a 13 00 =09callq ffffffff8055f410 <_memba=
r_consumer>
ffffffff80426950:=094d 39 f4 =09cmp %r14,%r12
ffffffff80426953:=090f 85 a4 00 00 00 =09jne ffffffff804269fd <mutex_=
tryenter+0xfc>
ffffffff80426959:=0948 8b 03 =09mov (%rbx),%rax
ffffffff8042695c:=09a8 04 =09test $0x4,%al
ffffffff8042695e:=090f 84 01 01 00 00 =09je ffffffff80426a65 <mutex_=
tryenter+0x164>
ffffffff80426964:=0948 8b 03 =09mov (%rbx),%rax
--Multipart_20170117112406695268--
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/51877: carp related panic during shutdown
Date: Tue, 17 Jan 2017 18:35:40 +0000
On Tue, Jan 17, 2017 at 10:20:01AM +0000, Ryota Ozaki wrote:
> Hmm, turnstile_lookup. I didn't know a spin mutex is used in
> an adaptive mutex...
"adaptive mutex" means "it spins for a while, then goes to sleep".
fwiw.
--
David A. Holland
dholland@netbsd.org
From: Ryota Ozaki <ozaki-r@netbsd.org>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Wed, 18 Jan 2017 10:36:23 +0900
On Wed, Jan 18, 2017 at 3:40 AM, David Holland <dholland-bugs@netbsd.org> wrote:
> The following reply was made to PR kern/51877; it has been noted by GNATS.
>
> From: David Holland <dholland-bugs@netbsd.org>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: kern/51877: carp related panic during shutdown
> Date: Tue, 17 Jan 2017 18:35:40 +0000
>
> On Tue, Jan 17, 2017 at 10:20:01AM +0000, Ryota Ozaki wrote:
> > Hmm, turnstile_lookup. I didn't know a spin mutex is used in
> > an adaptive mutex...
>
> "adaptive mutex" means "it spins for a while, then goes to sleep".
Yes I know. I meant I didn't know that an adaptive mutex uses
another mutex (for lwp in turnstile). The fault happened on that
mutex.
ozaki-r
From: Ryota Ozaki <ozaki-r@netbsd.org>
To: Hauke Fath <hf@spg.tu-darmstadt.de>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Wed, 18 Jan 2017 16:00:00 +0900
On Tue, Jan 17, 2017 at 7:24 PM, Hauke Fath <hf@spg.tu-darmstadt.de> wrote:
> On Tue, 17 Jan 2017 19:15:36 +0900, Ryota Ozaki wrote:
>> Could you check where mutex_tryenter+0x12 is?
>> (objdump -d netbsd |grep -A 30 'mutex_tryenter>:' or something)
>
> Attached.
Thanks.
The fault happened at MUTEX_SPIN_P(mtx) in mutex_tryenter perhaps
because mtx is an invalid pointer (address). The mtx comes from
this:
l = curlwp; // or l = owner below
owner = (*l->l_syncobj->sobj_owner)(l->l_wchan);
lwp_trylock(owner);
mutex_tryenter(owner->l_mutex);
IIUC, owner->l_mutex can be invalid if the adaptive mutex in question
is destroyed or the owner of the mutex is disappeared holding the mutex
for some reasons (or the data of the mutex is corrupted somehow).
The former doesn't happen in this case as I said and also the latter
is unlikely to happen.
I don't have any ideas :-/ I hope clean build solves the issue.
ozaki-r
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
Hauke Fath <hf@spg.tu-darmstadt.de>
Cc:
Subject: Re: kern/51877: carp related panic during shutdown
Date: Wed, 18 Jan 2017 07:39:48 -0500
On Jan 18, 7:05am, ozaki-r@netbsd.org (Ryota Ozaki) wrote:
-- Subject: Re: kern/51877: carp related panic during shutdown
| The following reply was made to PR kern/51877; it has been noted by GNATS.
|
| From: Ryota Ozaki <ozaki-r@netbsd.org>
| To: Hauke Fath <hf@spg.tu-darmstadt.de>
| Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, kern-bug-people@netbsd.org,
| gnats-admin@netbsd.org
| Subject: Re: kern/51877: carp related panic during shutdown
| Date: Wed, 18 Jan 2017 16:00:00 +0900
|
| On Tue, Jan 17, 2017 at 7:24 PM, Hauke Fath <hf@spg.tu-darmstadt.de> wrote:
| > On Tue, 17 Jan 2017 19:15:36 +0900, Ryota Ozaki wrote:
| >> Could you check where mutex_tryenter+0x12 is?
| >> (objdump -d netbsd |grep -A 30 'mutex_tryenter>:' or something)
| >
| > Attached.
|
| Thanks.
|
| The fault happened at MUTEX_SPIN_P(mtx) in mutex_tryenter perhaps
| because mtx is an invalid pointer (address). The mtx comes from
| this:
| l = curlwp; // or l = owner below
| owner = (*l->l_syncobj->sobj_owner)(l->l_wchan);
| lwp_trylock(owner);
| mutex_tryenter(owner->l_mutex);
|
| IIUC, owner->l_mutex can be invalid if the adaptive mutex in question
| is destroyed or the owner of the mutex is disappeared holding the mutex
| for some reasons (or the data of the mutex is corrupted somehow).
| The former doesn't happen in this case as I said and also the latter
| is unlikely to happen.
|
| I don't have any ideas :-/ I hope clean build solves the issue.
| ozaki-r
Can you put a printf in carp_detach carp_ifdetach and carp_clone_destroy
and see when they are called during shutdown?
christos
From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org
Cc:
Subject: Re: kern/51877: carp related panic during shutdown
Date: Wed, 18 Jan 2017 15:04:03 +0100
On 01/18/17 08:05, Ryota Ozaki wrote:
> I don't have any ideas :-/ I hope clean build solves the issue.
I removed the objdir tree, 'cvs update'd, and built tools and kernel
from scratch:
[...]
igphy3: detached
wm3: detached
igphy2: detached
wm2: detached
igphy1: detached
wm1: detached
igphy0: detached
carp0: state transition from: MASTER -> to: INIT
Mutex error: lockdebug_barrier: spin lock held
lock address : 0xfffffe821e74f400 type : spin
initialized : 0xffffffff80426f7a
shared holds : 0 exclusive: 1
shares wanted: 0 exclusive: 0
current cpu : 0 last held: 0
current lwp : 0xfffffe821ae8b9e0 last held: 0xfffffe821ae8b9e0
last locked* : 0xffffffff8044bc9a unlocked : 0xffffffff8040a72c
owner field : 0x0000000000010700 wait/spin: 0/1
Skipping crash dump on recursive panic
panic: LOCKDEBUG: Mutex error: lockdebug_barrier: spin lock held
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x140
snprintf() at netbsd:snprintf
lockdebug_more() at netbsd:lockdebug_more
rw_enter() at netbsd:rw_enter+0x5fe
uvm_fault_internal() at netbsd:uvm_fault_internal+0x161
trap() aaaaat netbsd:trap+0x30a
--- trap (number 6) ---
mutex_tryenter() at netbsd:mutex_tryenter+0x12
lwp_trylock() at netbsd:lwp_trylock+0x17
turnstile_block() at netbsd:turnstile_block+0x238
mutex_enter() at netbsd:mutex_enter+0x36c
rt_update_wait() at netbsd:rt_update_wait+0x10
_rt_free() at netbsd:_rt_free+0x11
rtrequest1() at netbsd:rtrequest1+0x5ef
rtrequest() at netbsd:rtrequest+0x3e
carp_setroute() at netbsd:carp_setroute+0xd3
carp_setrun() at netbsd:carp_setrun+0x56
carp_carpdev_state() at netbsd:carp_carpdev_state+0x6c
if_down() at netbsd:if_down+0x17d
if_detach() at netbsd:if_detach+0x1d8
wm_detach() at netbsd:wm_detach+0xc1
config_detach() at netbsd:config_detach+0xf8
config_detach_all() at netbsd:config_detach_all+0x97
cpu_reboot() at netbsd:cpu_reboot+0x176
sys_reboot() at netbsd:sys_reboot+0x75
syscall() at netbsd:syscall+0x1db
--- syscall (number 208) ---
741aebe3c99a:
cpu0: End traceback...
rebooting...
-- that is, same old.
Cheerio,
hauke
--
The ASCII Ribbon Campaign Hauke Fath
() No HTML/RTF in email Institut für Nachrichtentechnik
/\ No Word docs in email TU Darmstadt
Respect for open standards Ruf +49-6151-16-21344
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
Hauke Fath <hf@spg.tu-darmstadt.de>
Cc:
Subject: Re: kern/51877: carp related panic during shutdown
Date: Wed, 18 Jan 2017 09:19:16 -0500
On Jan 18, 2:05pm, hf@spg.tu-darmstadt.de (Hauke Fath) wrote:
-- Subject: Re: kern/51877: carp related panic during shutdown
| The following reply was made to PR kern/51877; it has been noted by GNATS.
|
| From: Hauke Fath <hf@spg.tu-darmstadt.de>
| To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org
| Cc:
| Subject: Re: kern/51877: carp related panic during shutdown
| Date: Wed, 18 Jan 2017 15:04:03 +0100
|
| On 01/18/17 08:05, Ryota Ozaki wrote:
| > I don't have any ideas :-/ I hope clean build solves the issue.
|
| I removed the objdir tree, 'cvs update'd, and built tools and kernel
| from scratch:
|
| [...]
| igphy3: detached
| wm3: detached
| igphy2: detached
| wm2: detached
| igphy1: detached
| wm1: detached
| igphy0: detached
| carp0: state transition from: MASTER -> to: INIT
| Mutex error: lockdebug_barrier: spin lock held
I think that carp should detach before the physical interfaces?
christos
From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org
Cc:
Subject: Re: kern/51877: carp related panic during shutdown
Date: Wed, 18 Jan 2017 16:25:57 +0100
On 01/18/17 13:40, Christos Zoulas wrote:
> Can you put a printf in carp_detach carp_ifdetach and carp_clone_destroy
> and see when they are called during shutdown?
Tried that
% strings /netbsd8 | grep '@christos'
@christos: carp_ifdetach()
@christos: carpdetach() detaching %s
@christos: carp_clone_destroy() %s
%
but none show up - the kernel panics earlier.
> | wm1: detached
> | igphy0: detached
> | carp0: state transition from: MASTER -> to: INIT
> | Mutex error: lockdebug_barrier: spin lock held
>
> I think that carp should detach before the physical interfaces?
There's an idea... but
% fgrep carpdev /etc/ifconfig.carp0
carpdev wm0
%
which is still attached.
Cheerio,
hauke
From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org
Cc:
Subject: Re: kern/51877: carp related panic during shutdown
Date: Wed, 18 Jan 2017 16:38:08 +0100
On 01/17/17 09:45, Ryota Ozaki wrote:
> >> If it doesn't help could you comment out rt_update_wait in _rt_free
> >> and try?
> >
> > Before I try and err... which source file are we talking about?
>
> sys/net/route.c is.
Looks better:
[...]
igphy3: detached
wm3: detached
igphy2: detached
wm2: detached
igphy1: detached
wm1: detached
igphy0: detached
carp0: state transition from: MASTER -> to: INIT
@christos: carp_ifdetach()
@christos: carpdetach() detaching carp0
wm0: detached
pci5: detached
pci4: detached
ppb4: detached
ppb3: detached
uhub3: detached
sysbeep0: detached
pci3: detached
pcppi0: detached
com1: detached
makphy1: detached
wm5: detached
makphy0: detached
wm4: detached
ppb2: detached
carp12: state transition from: BACKUP -> to: INIT
carp11: state transition from: BACKUP -> to: INIT
carp10: state transition from: BACKUP -> to: INIT
carp9: state transition from: BACKUP -> to: INIT
carp8: state transition from: BACKUP -> to: INIT
carp7: state transition from: BACKUP -> to: INIT
carp3: state transition from: BACKUP -> to: INIT
carp2: state transition from: BACKUP -> to: INIT
ixg0: link state DOWN (was UP)
ixg0: detached
pci8: detached
pci7: detached
pci6: detached
pci2: detached
pci1: detached
ppb7: detached
ppb6: detached
ppb5: detached
ppb1: detached
ppb0: detached
pchb0: detached
attimer1: detached
unmounting 0xfffffe821d6f2008 / (/dev/raid0a)...
forcefully unmounting / (/dev/raid0a)...
raid0: detached
wd1: detached
wd0: detached
atabus1: detached
atabus0: detached
rebooting...
From: "Ryota Ozaki" <ozaki-r@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/51877 CVS commit: src/sys/net
Date: Thu, 19 Jan 2017 06:58:55 +0000
Module Name: src
Committed By: ozaki-r
Date: Thu Jan 19 06:58:55 UTC 2017
Modified Files:
src/sys/net: route.c rtsock.c
Log Message:
Disable rt_update mechanism by default
This is a workaround for PR kern/51877. Enable again once the issue
is fixed.
To generate a diff of this commit:
cvs rdiff -u -r1.187 -r1.188 src/sys/net/route.c
cvs rdiff -u -r1.199 -r1.200 src/sys/net/rtsock.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Ryota Ozaki <ozaki-r@netbsd.org>
To: Hauke Fath <hf@spg.tu-darmstadt.de>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Thu, 19 Jan 2017 17:24:54 +0900
On Thu, Jan 19, 2017 at 12:40 AM, Hauke Fath <hf@spg.tu-darmstadt.de> wrote:
> The following reply was made to PR kern/51877; it has been noted by GNATS.
>
> From: Hauke Fath <hf@spg.tu-darmstadt.de>
> To: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org
> Cc:
> Subject: Re: kern/51877: carp related panic during shutdown
> Date: Wed, 18 Jan 2017 16:38:08 +0100
>
> On 01/17/17 09:45, Ryota Ozaki wrote:
> > >> If it doesn't help could you comment out rt_update_wait in _rt_free
> > >> and try?
> > >
> > > Before I try and err... which source file are we talking about?
> >
> > sys/net/route.c is.
>
> Looks better:
Thanks! So I disabled the feature by default (!NET_MPSAFE).
Nevertheless I need to investigate and fix the issue. I think
I need to reproduce the panic on my machine to do so, so could
you please send me full network configurations of your machine
(with a private email if you want)?
Thanks,
ozaki-r
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.