NetBSD Problem Report #51877

From hf@spg.tu-darmstadt.de  Sat Jan 14 15:40:36 2017
Return-Path: <hf@spg.tu-darmstadt.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 864817A111
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 14 Jan 2017 15:40:36 +0000 (UTC)
Message-Id: <201701141540.v0EFeWVv001108@Zinnenwand.nt.e-technik.tu-darmstadt.de>
Date: Sat, 14 Jan 2017 16:40:32 +0100 (CET)
From: Hauke Fath <hf@spg.tu-darmstadt.de>
Reply-To: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org
Cc: Hauke Fath <hf@spg.tu-darmstadt.de>
Subject: carp related panic during shutdown
X-Send-Pr-Version: 3.95

>Number:         51877
>Category:       kern
>Synopsis:       carp related panic during shutdown
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jan 14 15:45:00 +0000 2017
>Last-Modified:  Thu Jan 19 08:30:00 +0000 2017
>Originator:     Hauke Fath
>Release:        NetBSD 7.99.58
>Organization:
Technische Universitaet Darmstadt
>Environment:


System: NetBSD Zinnenwand 7.99.58 NetBSD 7.99.58 (FIFI-$Revision$) #4: Fri Jan 13 13:20:31 CET 2017 hf@Hochstuhl:/var/obj/netbsd-builds/developer/amd64/sys/arch/amd64/compile/NFIFI amd64
Architecture: x86_64
Machine: amd64
>Description:

	A router set up with carp(4) for redundancy panics
	reproducibly during shutdown:

[...]
igphy3: detached
wm3: detached
igphy2: detached
wm2: detached
igphy1: detached
wm1: detached
igphy0: detached
carp0: state transition from: MASTER -> to: INIT
uvm_fault(0xfffffe821bbfde70, 0x10000, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff8040783d cs 8 rflags 10206 cr2 101e7 ilevel 6 rsp fffffe810f979650
curlwp 0xfffffe821d3d48c0 pid 178.1 lowest kstack 0xfffffe810f9762c0
Skipping crash dump on recursive panic
panic: trap
cpu3: Begin traceback...
vpanic() at netbsd:vpanic+0x140
snprintf() at netbsd:snprintf
trap() at netbsd:trap+0xb86
--- trap (number 6) ---
mutex_oncpu() at netbsd:mutex_oncpu+0x27
mutex_vector_enter() at netbsd:mutex_vector_enter+0xad
rt_update_wait() at netbsd:rt_update_wait+0x10
_rt_free() at netbsd:_rt_free+0x11
rtrequest1() at netbsd:rtrequest1+0x5ef
rtrequest() at netbsd:rtrequest+0x3e
carp_setroute() at netbsd:carp_setroute+0xd3
carp_setrun() at netbsd:carp_setrun+0x56
carp_carpdev_state() at netbsd:carp_carpdev_state+0x6c
if_down() at netbsd:if_down+0x17d
if_detach() at netbsd:if_detach+0x1d8
wm_detach() at netbsd:wm_detach+0xc1
config_detach() at netbsd:config_detach+0xf8
config_detach_all() at netbsd:config_detach_all+0x97
cpu_reboot() at netbsd:cpu_reboot+0x176
sys_reboot() at netbsd:sys_reboot+0x75
syscall() at netbsd:syscall+0x1df
--- syscall (number 208) ---
6fcc3443e43a:
cpu3: End traceback...
rebooting...



>How-To-Repeat:

	Set up a netbsd current machine pair with multiple carp(4)
	interfaces. Reboot, and watch it panic during shutdown.


>Fix:

	Yes, please.


>Audit-Trail:
From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Sat, 14 Jan 2017 17:04:49 +0100

 On Sat, 14 Jan 2017 15:45:00 +0000 (UTC), Hauke Fath wrote:
 > 	A router set up with carp(4) for redundancy panics
 > 	reproducibly during shutdown:

 FWIW, the panic "works" with both pf(4) and npf(4).

From: Ryota Ozaki <ozaki-r@netbsd.org>
To: Hauke Fath <hf@spg.tu-darmstadt.de>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Mon, 16 Jan 2017 15:58:02 +0900

 On Sun, Jan 15, 2017 at 1:10 AM, Hauke Fath <hf@spg.tu-darmstadt.de> wrote:
 > The following reply was made to PR kern/51877; it has been noted by GNATS.
 >
 > From: Hauke Fath <hf@spg.tu-darmstadt.de>
 > To: gnats-bugs@NetBSD.org
 > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org
 > Subject: Re: kern/51877: carp related panic during shutdown
 > Date: Sat, 14 Jan 2017 17:04:49 +0100
 >
 >  On Sat, 14 Jan 2017 15:45:00 +0000 (UTC), Hauke Fath wrote:
 >  >      A router set up with carp(4) for redundancy panics
 >  >      reproducibly during shutdown:
 >
 >  FWIW, the panic "works" with both pf(4) and npf(4).
 >

 Can you try with DEBUG && LOCKDEBUG if not enabled?

 And can you show me states of carp0 and routes just before shutdown?
 (ifconfig carp0 and netstat -nr -f inet)

 Thanks,
   ozaki-r

From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: Ryota Ozaki <ozaki-r@netbsd.org>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, gnats-admin@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Mon, 16 Jan 2017 13:14:38 +0100

 On 01/16/17 07:58, Ryota Ozaki wrote:
 > Can you try with DEBUG && LOCKDEBUG if not enabled?
 >
 > And can you show me states of carp0 and routes just before shutdown?
 > (ifconfig carp0 and netstat -nr -f inet)

 Booting a 7.99.59 pf DEBUG/LOCKDEBUG/DIAGNOSTIC kernel from today's 
 sources on the carp(4) secondary machine, dmesg has:

 [...]
 IPv6 mode: router
 Configuring network interfaces: wm0 ixg0 wm4wm4: link state DOWN (was 
 UNKNOWN)
   vlan2 vlan3 vlan7 vlan8 vlan9 vlan10 vlan11 vlan12 carp0ifconfig: 
 SIOCAIFADDR_IN6: Can'tcarp2: state transition from: I
   assign requested address
 carp3: state transition from: INIT -> to: BACKUP
   carp2 carp3 carp7carp7: state transition from: INIT -> to: BACKUP
   carp8carp8: state transition from: INIT -> to: BACKUP
   carp9carp9: state transition from: INIT -> to: BACKUP
   carp10carp10: state transition from: INIT -> to: BACKUP
   carp11carp11: state transition from: INIT -> to: BACKUP
   carp12carp12: state transition from: INIT -^@> to: BACKUP
   pfsync0.
 [...]

 - note the mangled "Can't assign requested address" message - the -7 
 kernel doesn't have that.

 # ifconfig  carp0
 carp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
          capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
          capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
          capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
          enabled=0
          carp: MASTER carpdev wm0 vhid 1 advbase 1 advskew 192
          address: 00:00:5e:00:01:01
          inet 130.83.42.73 netmask 0xfffffff8 broadcast 130.83.42.79
 # netstat -nr -f inet
 Routing tables

 Internet:
 Destination        Gateway            Flags    Refs      Use    Mtu 
 Interface
 default            130.83.42.78       UGS         -        -      -L wm0
 10.0.49/24         link#14            UC          -        -      -  vlan10
 10.0.49.252        link#14            UHL         -        -      -  lo0
 127/8              127.0.0.1          UGRS        -        -  33624  lo0
 127.0.0.1          lo0                UH          -        -  33624  lo0
 130.83.18.0/26     link#15            UC          -        -      -  vlan11
 130.83.18.60       link#15            UHL         -        -      -  lo0
 130.83.18.64/26    link#16            UC          -        -      -  vlan12
 130.83.18.124      link#16            UHL         -        -      -  lo0
 130.83.18.128/26   link#13            UC          -        -      -  vlan9
 130.83.18.188      link#13            UHL         -        -      -  lo0
 130.83.18.192/26   link#12            UC          -        -      -  vlan8
 130.83.18.252      link#12            UHL         -        -      -  lo0
 130.83.42.72/29    link#3             UC          -        -      -  wm0
 130.83.42.73       130.83.42.73       UH          -        -      -  carp0
 130.83.42.75       link#3             UHL         -        -      -  lo0
 130.83.197.0/28    link#10            UC          -        -      -  vlan3
 130.83.197.0/27    link#18            UC          -        -      -  carp2
 130.83.197.11      link#10            UHL         -        -      -  lo0
 130.83.197.16/28   link#9             UC          -        -      -  vlan2
 130.83.197.28      link#9             UHL         -        -      -  lo0
 130.83.228.0/26    link#11            UC          -        -      -  vlan7
 130.83.228.60      link#11            UHL         -        -      -  lo0
 192.168.27.0/28    link#7             UC          -        -      -  wm4
 192.168.27.12      link#7             UHL         -        -      -  lo0
 # shutdown -r now
 Shutdown  NOW!

 [...]

 Done running shutdown hooks.
 Jan 16 12:55:32 Zinnenwand syslogd[433]: Exiting on signal 15
 carp0: incorrect hash from 130.83.42.74
 carp0: incorrect hash from 130.83.42.74
 carp0: incorrect hash from 130.83.42.74
 syncing disks... done

 [...]

 igphy3: detached
 wm3: detached
 igphy2: detached
 wm2: detached
 igphy1: detached
 wm1: detached
 igphy0: detached
 carp0: state transition from: MASTER -> to: INIT
 Mutex error: lockdebug_barrier: spin lock held

 lock address : 0xfffffe821e74f400 type     :               spin
 initialized  : 0xffffffff80426c5a
 shared holds :                  0 exclusive:                  1
 shares wanted:                  0 exclusive:                  0
 current cpu  :                  2 last held:                  2
 current lwp  : 0xfffffe810fc42000 last held: 0xfffffe810fc42000
 last locked* : 0xffffffff8044b97a unlocked : 0xffffffff8046cfc4
 owner field  : 0x0000000000010700 wait/spin:                0/1

 Skipping crash dump on recursive panic
 panic: LOCKDEBUG: Mutex error: lockdebug_barrier: spin lock held
 cpu2: Begin traceback...
 vpanic() at netbsd:vpanic+0x140
 snprintf() at netbsd:snprintf
 lockdebug_more() at netbsd:lockdebug_more
 rw_enter() at netbsd:rw_enter+0x5fe
 uvm_fault_internal() at netbsd:uvm_fault_internal+0x161
 trap() at netbsd:trap+0x30a
 --- trap (number 6) ---
 mutex_tryenter() at netbsd:mutex_tryenter+0x12
 lwp_trylock() at netbsd:lwp_trylock+0x17
 turnstile_block() at netbsd:turnstile_block+0x238
 mutex_enter() at netbsd:mutex_enter+0x36c
 rt_update_wait() at netbsd:rt_update_wait+0x10
 _rt_free() at netbsd:_rt_free+0x11
 rtrequest1() at netbsd:rtrequest1+0x5ef
 rtrequest() at netbsd:rtrequest+0x3e
 carp_setroute() at netbsd:carp_setroute+0xd3
 carp_setrun() at netbsd:carp_setrun+0x56
 carp_carpdev_state() at netbsd:carp_carpdev_state+0x6c
 if_down() at netbsd:if_down+0x17d
 if_detach() at netbsd:if_detach+0x1d8
 wm_detach() at netbsd:wm_detach+0xc1
 config_detach() at netbsd:config_detach+0xf8
 config_detach_all() at netbsd:config_detach_all+0x97
 cpu_reboot() at netbsd:cpu_reboot+0x176
 sys_reboot() at netbsd:sys_reboot+^@0x75
 syscall() at netbsd:syscall+0x1e8
 --- syscall (number 208) ---
 7413ea83c99a:
 cpu2: End traceback...
 rebooting...


 -- the primary carp(4) machine was still running -7 at the time. Maybe 
 the "carp0: incorrect hash from 130.83.42.74" messages (which the -7 
 kernel does not show) was due to this offset.

 OTOH, there is no similar message from the other carp* interfaces, but 
 then they are all on vlans; carp0 is the only carp on a physical interface.

 HTH,
 hauke

From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Mon, 16 Jan 2017 13:35:59 +0100

 On Mon, 16 Jan 2017 12:15:01 +0000 (UTC), Hauke Fath wrote:
 >  -- the primary carp(4) machine was still running -7 at the time. Maybe 
 >  the "carp0: incorrect hash from 130.83.42.74" messages (which the -7 
 >  kernel does not show) was due to this offset.

 The primary machine logs "incorrect hash" messages for all carped 
 interfaces, presumably the kernels need to be identical.

From: Ryota Ozaki <ozaki-r@netbsd.org>
To: Hauke Fath <hf@spg.tu-darmstadt.de>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Tue, 17 Jan 2017 13:41:01 +0900

 On Mon, Jan 16, 2017 at 9:14 PM, Hauke Fath <hf@spg.tu-darmstadt.de> wrote:
 > On 01/16/17 07:58, Ryota Ozaki wrote:
 >>
 >> Can you try with DEBUG && LOCKDEBUG if not enabled?
 >>
 >> And can you show me states of carp0 and routes just before shutdown?
 >> (ifconfig carp0 and netstat -nr -f inet)
 >
 >
 > Booting a 7.99.59 pf DEBUG/LOCKDEBUG/DIAGNOSTIC kernel from today's sources
 > on the carp(4) secondary machine, dmesg has:
 >
 > [...]
 > IPv6 mode: router
 > Configuring network interfaces: wm0 ixg0 wm4wm4: link state DOWN (was
 > UNKNOWN)
 >  vlan2 vlan3 vlan7 vlan8 vlan9 vlan10 vlan11 vlan12 carp0ifconfig:
 > SIOCAIFADDR_IN6: Can'tcarp2: state transition from: I
 >  assign requested address
 > carp3: state transition from: INIT -> to: BACKUP
 >  carp2 carp3 carp7carp7: state transition from: INIT -> to: BACKUP
 >  carp8carp8: state transition from: INIT -> to: BACKUP
 >  carp9carp9: state transition from: INIT -> to: BACKUP
 >  carp10carp10: state transition from: INIT -> to: BACKUP
 >  carp11carp11: state transition from: INIT -> to: BACKUP
 >  carp12carp12: state transition from: INIT -^@> to: BACKUP
 >  pfsync0.
 > [...]
 >
 > - note the mangled "Can't assign requested address" message - the -7 kernel
 > doesn't have that.
 >
 > # ifconfig  carp0
 > carp0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
 >         capabilities=7ff80<TSO4,IP4CSUM_Rx,IP4CSUM_Tx,TCP4CSUM_Rx>
 >         capabilities=7ff80<TCP4CSUM_Tx,UDP4CSUM_Rx,UDP4CSUM_Tx,TCP6CSUM_Rx>
 >         capabilities=7ff80<TCP6CSUM_Tx,UDP6CSUM_Rx,UDP6CSUM_Tx,TSO6>
 >         enabled=0
 >         carp: MASTER carpdev wm0 vhid 1 advbase 1 advskew 192
 >         address: 00:00:5e:00:01:01
 >         inet 130.83.42.73 netmask 0xfffffff8 broadcast 130.83.42.79
 > # netstat -nr -f inet
 > Routing tables
 >
 > Internet:
 > Destination        Gateway            Flags    Refs      Use    Mtu
 > Interface
 > default            130.83.42.78       UGS         -        -      -L wm0
 > 10.0.49/24         link#14            UC          -        -      -  vlan10
 > 10.0.49.252        link#14            UHL         -        -      -  lo0
 > 127/8              127.0.0.1          UGRS        -        -  33624  lo0
 > 127.0.0.1          lo0                UH          -        -  33624  lo0
 > 130.83.18.0/26     link#15            UC          -        -      -  vlan11
 > 130.83.18.60       link#15            UHL         -        -      -  lo0
 > 130.83.18.64/26    link#16            UC          -        -      -  vlan12
 > 130.83.18.124      link#16            UHL         -        -      -  lo0
 > 130.83.18.128/26   link#13            UC          -        -      -  vlan9
 > 130.83.18.188      link#13            UHL         -        -      -  lo0
 > 130.83.18.192/26   link#12            UC          -        -      -  vlan8
 > 130.83.18.252      link#12            UHL         -        -      -  lo0
 > 130.83.42.72/29    link#3             UC          -        -      -  wm0
 > 130.83.42.73       130.83.42.73       UH          -        -      -  carp0
 > 130.83.42.75       link#3             UHL         -        -      -  lo0
 > 130.83.197.0/28    link#10            UC          -        -      -  vlan3
 > 130.83.197.0/27    link#18            UC          -        -      -  carp2
 > 130.83.197.11      link#10            UHL         -        -      -  lo0
 > 130.83.197.16/28   link#9             UC          -        -      -  vlan2
 > 130.83.197.28      link#9             UHL         -        -      -  lo0
 > 130.83.228.0/26    link#11            UC          -        -      -  vlan7
 > 130.83.228.60      link#11            UHL         -        -      -  lo0
 > 192.168.27.0/28    link#7             UC          -        -      -  wm4
 > 192.168.27.12      link#7             UHL         -        -      -  lo0
 > # shutdown -r now
 > Shutdown  NOW!
 >
 > [...]
 >
 > Done running shutdown hooks.
 > Jan 16 12:55:32 Zinnenwand syslogd[433]: Exiting on signal 15
 > carp0: incorrect hash from 130.83.42.74
 > carp0: incorrect hash from 130.83.42.74
 > carp0: incorrect hash from 130.83.42.74
 > syncing disks... done
 >
 > [...]
 >
 > igphy3: detached
 > wm3: detached
 > igphy2: detached
 > wm2: detached
 > igphy1: detached
 > wm1: detached
 > igphy0: detached
 > carp0: state transition from: MASTER -> to: INIT
 > Mutex error: lockdebug_barrier: spin lock held
 >
 > lock address : 0xfffffe821e74f400 type     :               spin
 > initialized  : 0xffffffff80426c5a
 > shared holds :                  0 exclusive:                  1
 > shares wanted:                  0 exclusive:                  0
 > current cpu  :                  2 last held:                  2
 > current lwp  : 0xfffffe810fc42000 last held: 0xfffffe810fc42000
 > last locked* : 0xffffffff8044b97a unlocked : 0xffffffff8046cfc4
 > owner field  : 0x0000000000010700 wait/spin:                0/1
 >
 > Skipping crash dump on recursive panic
 > panic: LOCKDEBUG: Mutex error: lockdebug_barrier: spin lock held

 The mutex error happened because uvm_fault_internal tries to hold
 a rwlock with holding a spin mutex. Can you identify the spin mutex
 by dissembling the kernel? The addresses above such as "last locked"
 will help to explore.

 That said, I guess the spin mutex is held after the fault below.
 (If a spin mutex is held before the fault, the below mutex_enter
 should fail with the same mutex error.)

 > cpu2: Begin traceback...
 > vpanic() at netbsd:vpanic+0x140
 > snprintf() at netbsd:snprintf
 > lockdebug_more() at netbsd:lockdebug_more
 > rw_enter() at netbsd:rw_enter+0x5fe
 > uvm_fault_internal() at netbsd:uvm_fault_internal+0x161
 > trap() at netbsd:trap+0x30a
 > --- trap (number 6) ---
 > mutex_tryenter() at netbsd:mutex_tryenter+0x12
 > lwp_trylock() at netbsd:lwp_trylock+0x17
 > turnstile_block() at netbsd:turnstile_block+0x238
 > mutex_enter() at netbsd:mutex_enter+0x36c

 I don't know why a fault happens inside mutex_enter. It's a global
 adaptive mutex that is never destroyed and stable. And the fault
 happened on a different place from the fault of the first report.
 Something broken around the mutex...?

 Just in case could you clean-build tools and the kernel and try again?
 If it doesn't help could you comment out rt_update_wait in _rt_free
 and try? Actually rt_update_wait isn't needed if !NET_MPSAFE for now.

 Thanks,
   ozaki-r

From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: Ryota Ozaki <ozaki-r@netbsd.org>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>,
        kern-bug-people@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Tue, 17 Jan 2017 09:00:26 +0100

 On Tue, 17 Jan 2017 13:41:01 +0900, Ryota Ozaki wrote:
 >> Skipping crash dump on recursive panic
 >> panic: LOCKDEBUG: Mutex error: lockdebug_barrier: spin lock held
 > 
 > The mutex error happened because uvm_fault_internal tries to hold
 > a rwlock with holding a spin mutex. Can you identify the spin mutex
 > by dissembling the kernel? The addresses above such as "last locked"
 > will help to explore.

 Before I am off to work - that would be gdb? ddb? Do I need 

 makeoptions    DEBUG="-g"

 for that?

 [...]
 >> mutex_enter() at netbsd:mutex_enter+0x36c
 > 
 > I don't know why a fault happens inside mutex_enter. It's a global
 > adaptive mutex that is never destroyed and stable. And the fault
 > happened on a different place from the fault of the first report.
 > Something broken around the mutex...?
 > 
 > Just in case could you clean-build tools and the kernel and try again?

 Sure, will do.

 > If it doesn't help could you comment out rt_update_wait in _rt_free
 > and try?

 Before I try and err... which source file are we talking about?

 > Actually rt_update_wait isn't needed if !NET_MPSAFE for now.

 Cheerio,
 hauke

From: Ryota Ozaki <ozaki-r@netbsd.org>
To: Hauke Fath <hf@spg.tu-darmstadt.de>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Tue, 17 Jan 2017 17:44:04 +0900

 On Tue, Jan 17, 2017 at 5:00 PM, Hauke Fath <hf@spg.tu-darmstadt.de> wrote:
 > On Tue, 17 Jan 2017 13:41:01 +0900, Ryota Ozaki wrote:
 >>> Skipping crash dump on recursive panic
 >>> panic: LOCKDEBUG: Mutex error: lockdebug_barrier: spin lock held
 >>
 >> The mutex error happened because uvm_fault_internal tries to hold
 >> a rwlock with holding a spin mutex. Can you identify the spin mutex
 >> by dissembling the kernel? The addresses above such as "last locked"
 >> will help to explore.
 >
 > Before I am off to work - that would be gdb? ddb? Do I need
 >
 > makeoptions    DEBUG="-g"
 >
 > for that?

 The option isn't required.

 Without that, you can explore by something like this:
   objdump -d netbsd |grep -30 ffffffff8044b97a

 If you want to see with the source code:
   ./build.sh ... kernel.gdb=NFIFI  # without -u
   objdump -S -d netbsd.gdb |grep -30 ffffffff8044b97a  # 30 may not be enough

 >
 > [...]
 >>> mutex_enter() at netbsd:mutex_enter+0x36c
 >>
 >> I don't know why a fault happens inside mutex_enter. It's a global
 >> adaptive mutex that is never destroyed and stable. And the fault
 >> happened on a different place from the fault of the first report.
 >> Something broken around the mutex...?
 >>
 >> Just in case could you clean-build tools and the kernel and try again?
 >
 > Sure, will do.
 >
 >> If it doesn't help could you comment out rt_update_wait in _rt_free
 >> and try?
 >
 > Before I try and err... which source file are we talking about?

 sys/net/route.c is.

 Thanks,
   ozaki-r

From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: Ryota Ozaki <ozaki-r@netbsd.org>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>,
        kern-bug-people@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Tue, 17 Jan 2017 10:59:46 +0100

 This is a multi-part message in MIME format.

 --Multipart_20170117105946307115
 Content-Type: text/plain; charset=us-ascii
 Content-Transfer-Encoding: 7bit

 On Tue, 17 Jan 2017 13:41:01 +0900, Ryota Ozaki wrote:
 > The mutex error happened because uvm_fault_internal tries to hold
 > a rwlock with holding a spin mutex. Can you identify the spin mutex
 > by dissembling the kernel?

 Attached.


 --Multipart_20170117105946307115
 Content-Type: application/octet-stream; name=pr51877_objdump.lst
 Content-Transfer-Encoding: quoted-printable
 Content-Disposition: attachment; filename=pr51877_objdump.lst

 ffffffff8044b8b1:=09be 07 00 00 00       =09mov    $0x7,%esi
 ffffffff8044b8b6:=09bf 02 00 00 00       =09mov    $0x2,%edi
 ffffffff8044b8bb:=09e8 63 b3 fd ff       =09callq  ffffffff80426c23 <mutex_=
 obj_alloc>
 ffffffff8044b8c0:=0948 89 43 f8          =09mov    %rax,-0x8(%rbx)
 ffffffff8044b8c4:=0948 83 c3 10          =09add    $0x10,%rbx
 ffffffff8044b8c8:=0948 81 fb 88 a3 7c 80 =09cmp    $0xffffffff807ca388,%rbx
 ffffffff8044b8cf:=0975 d9                =09jne    ffffffff8044b8aa <turnst=
 ile_init+0x10>
 ffffffff8044b8d1:=0948 c7 44 24 18 00 00 =09movq   $0x0,0x18(%rsp)
 ffffffff8044b8d8:=0900 00=20
 ffffffff8044b8da:=0948 c7 44 24 10 00 00 =09movq   $0x0,0x10(%rsp)
 ffffffff8044b8e1:=0900 00=20
 ffffffff8044b8e3:=0948 c7 44 24 08 20 b8 =09movq   $0xffffffff8044b820,0x8(=
 %rsp)
 ffffffff8044b8ea:=0944 80=20
 ffffffff8044b8ec:=09c7 04 24 00 00 00 00 =09movl   $0x0,(%rsp)
 ffffffff8044b8f3:=0945 31 c9             =09xor    %r9d,%r9d
 ffffffff8044b8f6:=0949 c7 c0 13 13 6e 80 =09mov    $0xffffffff806e1313,%r8
 ffffffff8044b8fd:=0931 c9                =09xor    %ecx,%ecx
 ffffffff8044b8ff:=0931 d2                =09xor    %edx,%edx
 ffffffff8044b901:=0931 f6                =09xor    %esi,%esi
 ffffffff8044b903:=09bf 60 00 00 00       =09mov    $0x60,%edi
 ffffffff8044b908:=09e8 72 88 01 00       =09callq  ffffffff8046417f <pool_c=
 ache_init>
 ffffffff8044b90d:=0948 89 05 24 01 38 00 =09mov    %rax,0x380124(%rip)     =
    # ffffffff807cba38 <turnstile_cache>
 ffffffff8044b914:=0948 85 c0             =09test   %rax,%rax
 ffffffff8044b917:=0974 16                =09je     ffffffff8044b92f <turnst=
 ile_init+0x95>
 ffffffff8044b919:=0931 d2                =09xor    %edx,%edx
 ffffffff8044b91b:=0948 c7 c6 80 b1 89 80 =09mov    $0xffffffff8089b180,%rsi
 ffffffff8044b922:=0931 ff                =09xor    %edi,%edi
 ffffffff8044b924:=0948 83 c4 28          =09add    $0x28,%rsp
 ffffffff8044b928:=095b                   =09pop    %rbx
 ffffffff8044b929:=095d                   =09pop    %rbp
 ffffffff8044b92a:=09e9 f1 fe ff ff       =09jmpq   ffffffff8044b820 <turnst=
 ile_ctor>
 ffffffff8044b92f:=0941 b8 66 00 00 00    =09mov    $0x66,%r8d
 ffffffff8044b935:=0948 c7 c1 b0 14 6e 80 =09mov    $0xffffffff806e14b0,%rcx
 ffffffff8044b93c:=0948 c7 c2 1c 13 6e 80 =09mov    $0xffffffff806e131c,%rdx
 ffffffff8044b943:=0948 c7 c6 f7 92 68 80 =09mov    $0xffffffff806892f7,%rsi
 ffffffff8044b94a:=0948 c7 c7 b0 92 68 80 =09mov    $0xffffffff806892b0,%rdi
 ffffffff8044b951:=09e8 07 68 11 00       =09callq  ffffffff8056215d <kern_a=
 ssert>
 ffffffff8044b956:=09eb c1                =09jmp    ffffffff8044b919 <turnst=
 ile_init+0x7f>

 ffffffff8044b958 <turnstile_lookup>:
 ffffffff8044b958:=0955                   =09push   %rbp
 ffffffff8044b959:=0948 89 e5             =09mov    %rsp,%rbp
 ffffffff8044b95c:=0941 54                =09push   %r12
 ffffffff8044b95e:=0953                   =09push   %rbx
 ffffffff8044b95f:=0948 89 fb             =09mov    %rdi,%rbx
 ffffffff8044b962:=094c 8d 24 3f          =09lea    (%rdi,%rdi,1),%r12
 ffffffff8044b966:=0941 81 e4 f0 03 00 00 =09and    $0x3f0,%r12d
 ffffffff8044b96d:=0949 8b bc 24 80 9f 7c =09mov    -0x7f836080(%r12),%rdi
 ffffffff8044b974:=0980=20
 ffffffff8044b975:=09e8 79 a6 fd ff       =09callq  ffffffff80425ff3 <mutex_=
 enter>
 ffffffff8044b97a:=0949 8b 84 24 88 9f 7c =09mov    -0x7f836078(%r12),%rax
 ffffffff8044b981:=0980=20
 ffffffff8044b982:=0948 85 c0             =09test   %rax,%rax
 ffffffff8044b985:=0975 0a                =09jne    ffffffff8044b991 <turnst=
 ile_lookup+0x39>
 ffffffff8044b987:=09eb 13                =09jmp    ffffffff8044b99c <turnst=
 ile_lookup+0x44>
 ffffffff8044b989:=0948 8b 00             =09mov    (%rax),%rax
 ffffffff8044b98c:=0948 85 c0             =09test   %rax,%rax
 ffffffff8044b98f:=0974 06                =09je     ffffffff8044b997 <turnst=
 ile_lookup+0x3f>
 ffffffff8044b991:=0948 3b 58 18          =09cmp    0x18(%rax),%rbx
 ffffffff8044b995:=0975 f2                =09jne    ffffffff8044b989 <turnst=
 ile_lookup+0x31>
 ffffffff8044b997:=095b                   =09pop    %rbx
 ffffffff8044b998:=0941 5c                =09pop    %r12
 ffffffff8044b99a:=095d                   =09pop    %rbp
 ffffffff8044b99b:=09c3                   =09retq  =20
 ffffffff8044b99c:=0931 c0                =09xor    %eax,%eax
 ffffffff8044b99e:=0966 90                =09xchg   %ax,%ax
 ffffffff8044b9a0:=09eb f5                =09jmp    ffffffff8044b997 <turnst=
 ile_lookup+0x3f>

 ffffffff8044b9a2 <turnstile_exit>:
 ffffffff8044b9a2:=0955                   =09push   %rbp
 ffffffff8044b9a3:=0948 89 e5             =09mov    %rsp,%rbp
 ffffffff8044b9a6:=0948 01 ff             =09add    %rdi,%rdi
 ffffffff8044b9a9:=0981 e7 f0 03 00 00    =09and    $0x3f0,%edi
 ffffffff8044b9af:=0948 8b bf 80 9f 7c 80 =09mov    -0x7f836080(%rdi),%rdi
 ffffffff8044b9b6:=095d                   =09pop    %rbp
 ffffffff8044b9b7:=09e9 6b ad fd ff       =09jmpq   ffffffff80426727 <mutex_=
 exit>

 ffffffff8044b9bc <turnstile_block>:
 ffffffff8044b9bc:=0955                   =09push   %rbp
 ffffffff8044b9bd:=0948 89 e5             =09mov    %rsp,%rbp
 ffffffff8044b9c0:=0941 57                =09push   %r15
 ffffffff8044b9c2:=0941 56                =09push   %r14
 ffffffff8044b9c4:=0941 55                =09push   %r13
 ffffffff8044b9c6:=0941 54                =09push   %r12
 ffffffff8044b9c8:=0953                   =09push   %rbx
 ffffffff8044b9c9:=0948 83 ec 28          =09sub    $0x28,%rsp
 ffffffff8044b9cd:=0949 89 ff             =09mov    %rdi,%r15
 ffffffff8044b9d0:=094c 63 f6             =09movslq %esi,%r14
 ffffffff8044b9d3:=0949 89 d5             =09mov    %rdx,%r13
 ffffffff8044b9d6:=0948 89 4d c0          =09mov    %rcx,-0x40(%rbp)
 ffffffff8044b9da:=0965 48 8b 1c 25 e8 01 =09mov    %gs:0x1e8,%rbx
 ffffffff8044b9e1:=0900 00=20
 ffffffff8044b9e3:=0949 89 d4             =09mov    %rdx,%r12
 ffffffff8044b9e6:=0949 c1 ec 03          =09shr    $0x3,%r12
 ffffffff8044b9ea:=0941 83 e4 3f          =09and    $0x3f,%r12d
 ffffffff8044b9ee:=0941 83 fe 01          =09cmp    $0x1,%r14d
 ffffffff8044b9f2:=090f 87 81 06 00 00    =09ja     ffffffff8044c079 <turnst=
 ile_block+0x6bd>
 ffffffff8044b9f8:=094c 89 e0             =09mov    %r12,%rax
 ffffffff8044b9fb:=0948 c1 e0 04          =09shl    $0x4,%rax
 ffffffff8044b9ff:=0948 8b b8 80 9f 7c 80 =09mov    -0x7f836080(%rax),%rdi
 ffffffff8044ba06:=09e8 b9 ae fd ff       =09callq  ffffffff804268c4 <mutex_=
 owned>

 --Multipart_20170117105946307115--

From: Ryota Ozaki <ozaki-r@netbsd.org>
To: Hauke Fath <hf@spg.tu-darmstadt.de>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Tue, 17 Jan 2017 19:15:36 +0900

 On Tue, Jan 17, 2017 at 6:59 PM, Hauke Fath <hf@spg.tu-darmstadt.de> wrote:
 > On Tue, 17 Jan 2017 13:41:01 +0900, Ryota Ozaki wrote:
 >> The mutex error happened because uvm_fault_internal tries to hold
 >> a rwlock with holding a spin mutex. Can you identify the spin mutex
 >> by dissembling the kernel?
 >
 > Attached.

 Hmm, turnstile_lookup. I didn't know a spin mutex is used in
 an adaptive mutex...

 Anyway one more thing. Could you check where mutex_tryenter+0x12 is?
 (objdump -d netbsd |grep -A 30 'mutex_tryenter>:' or something)

 Thanks,
   ozaki-r

From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: Ryota Ozaki <ozaki-r@netbsd.org>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>,
        kern-bug-people@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Tue, 17 Jan 2017 11:24:06 +0100

 This is a multi-part message in MIME format.

 --Multipart_20170117112406695268
 Content-Type: text/plain; charset=us-ascii
 Content-Transfer-Encoding: 7bit

 On Tue, 17 Jan 2017 19:15:36 +0900, Ryota Ozaki wrote:
 > Could you check where mutex_tryenter+0x12 is?
 > (objdump -d netbsd |grep -A 30 'mutex_tryenter>:' or something)

 Attached.


 --Multipart_20170117112406695268
 Content-Type: application/octet-stream; name=pr51877_objdump_tryenter.lst
 Content-Transfer-Encoding: quoted-printable
 Content-Disposition: attachment; filename=pr51877_objdump_tryenter.lst

 ffffffff80426901 <mutex_tryenter>:
 ffffffff80426901:=0955                   =09push   %rbp
 ffffffff80426902:=0948 89 e5             =09mov    %rsp,%rbp
 ffffffff80426905:=0941 56                =09push   %r14
 ffffffff80426907:=0941 55                =09push   %r13
 ffffffff80426909:=0941 54                =09push   %r12
 ffffffff8042690b:=0953                   =09push   %rbx
 ffffffff8042690c:=0948 83 ec 10          =09sub    $0x10,%rsp
 ffffffff80426910:=0948 89 fb             =09mov    %rdi,%rbx
 ffffffff80426913:=0948 8b 07             =09mov    (%rdi),%rax
 ffffffff80426916:=09a8 01                =09test   $0x1,%al
 ffffffff80426918:=090f 85 85 00 00 00    =09jne    ffffffff804269a3 <mutex_=
 tryenter+0xa2>
 ffffffff8042691e:=0965 4c 8b 2c 25 e8 01 =09mov    %gs:0x1e8,%r13
 ffffffff80426925:=0900 00=20
 ffffffff80426927:=094d 85 ed             =09test   %r13,%r13
 ffffffff8042692a:=090f 84 dc 00 00 00    =09je     ffffffff80426a0c <mutex_=
 tryenter+0x10b>
 ffffffff80426930:=094c 8b 23             =09mov    (%rbx),%r12
 ffffffff80426933:=0941 83 e4 04          =09and    $0x4,%r12d
 ffffffff80426937:=094c 89 ea             =09mov    %r13,%rdx
 ffffffff8042693a:=094c 09 e2             =09or     %r12,%rdx
 ffffffff8042693d:=094c 89 e6             =09mov    %r12,%rsi
 ffffffff80426940:=0948 89 df             =09mov    %rbx,%rdi
 ffffffff80426943:=09e8 a8 8a 13 00       =09callq  ffffffff8055f3f0 <_atomi=
 c_cas_64>
 ffffffff80426948:=0949 89 c6             =09mov    %rax,%r14
 ffffffff8042694b:=09e8 c0 8a 13 00       =09callq  ffffffff8055f410 <_memba=
 r_consumer>
 ffffffff80426950:=094d 39 f4             =09cmp    %r14,%r12
 ffffffff80426953:=090f 85 a4 00 00 00    =09jne    ffffffff804269fd <mutex_=
 tryenter+0xfc>
 ffffffff80426959:=0948 8b 03             =09mov    (%rbx),%rax
 ffffffff8042695c:=09a8 04                =09test   $0x4,%al
 ffffffff8042695e:=090f 84 01 01 00 00    =09je     ffffffff80426a65 <mutex_=
 tryenter+0x164>
 ffffffff80426964:=0948 8b 03             =09mov    (%rbx),%rax

 --Multipart_20170117112406695268--

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/51877: carp related panic during shutdown
Date: Tue, 17 Jan 2017 18:35:40 +0000

 On Tue, Jan 17, 2017 at 10:20:01AM +0000, Ryota Ozaki wrote:
  >  Hmm, turnstile_lookup. I didn't know a spin mutex is used in
  >  an adaptive mutex...

 "adaptive mutex" means "it spins for a while, then goes to sleep".

 fwiw.

 -- 
 David A. Holland
 dholland@netbsd.org

From: Ryota Ozaki <ozaki-r@netbsd.org>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Wed, 18 Jan 2017 10:36:23 +0900

 On Wed, Jan 18, 2017 at 3:40 AM, David Holland <dholland-bugs@netbsd.org> wrote:
 > The following reply was made to PR kern/51877; it has been noted by GNATS.
 >
 > From: David Holland <dholland-bugs@netbsd.org>
 > To: gnats-bugs@NetBSD.org
 > Cc:
 > Subject: Re: kern/51877: carp related panic during shutdown
 > Date: Tue, 17 Jan 2017 18:35:40 +0000
 >
 >  On Tue, Jan 17, 2017 at 10:20:01AM +0000, Ryota Ozaki wrote:
 >   >  Hmm, turnstile_lookup. I didn't know a spin mutex is used in
 >   >  an adaptive mutex...
 >
 >  "adaptive mutex" means "it spins for a while, then goes to sleep".

 Yes I know. I meant I didn't know that an adaptive mutex uses
 another mutex (for lwp in turnstile). The fault happened on that
 mutex.

   ozaki-r

From: Ryota Ozaki <ozaki-r@netbsd.org>
To: Hauke Fath <hf@spg.tu-darmstadt.de>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Wed, 18 Jan 2017 16:00:00 +0900

 On Tue, Jan 17, 2017 at 7:24 PM, Hauke Fath <hf@spg.tu-darmstadt.de> wrote:
 > On Tue, 17 Jan 2017 19:15:36 +0900, Ryota Ozaki wrote:
 >> Could you check where mutex_tryenter+0x12 is?
 >> (objdump -d netbsd |grep -A 30 'mutex_tryenter>:' or something)
 >
 > Attached.

 Thanks.

 The fault happened at MUTEX_SPIN_P(mtx) in mutex_tryenter perhaps
 because mtx is an invalid pointer (address). The mtx comes from
 this:
   l = curlwp;  // or l = owner below
   owner = (*l->l_syncobj->sobj_owner)(l->l_wchan);
   lwp_trylock(owner);
   mutex_tryenter(owner->l_mutex);

 IIUC, owner->l_mutex can be invalid if the adaptive mutex in question
 is destroyed or the owner of the mutex is disappeared holding the mutex
 for some reasons (or the data of the mutex is corrupted somehow).
 The former doesn't happen in this case as I said and also the latter
 is unlikely to happen.

 I don't have any ideas :-/ I hope clean build solves the issue.
   ozaki-r

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	Hauke Fath <hf@spg.tu-darmstadt.de>
Cc: 
Subject: Re: kern/51877: carp related panic during shutdown
Date: Wed, 18 Jan 2017 07:39:48 -0500

 On Jan 18,  7:05am, ozaki-r@netbsd.org (Ryota Ozaki) wrote:
 -- Subject: Re: kern/51877: carp related panic during shutdown

 | The following reply was made to PR kern/51877; it has been noted by GNATS.
 | 
 | From: Ryota Ozaki <ozaki-r@netbsd.org>
 | To: Hauke Fath <hf@spg.tu-darmstadt.de>
 | Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, kern-bug-people@netbsd.org, 
 | 	gnats-admin@netbsd.org
 | Subject: Re: kern/51877: carp related panic during shutdown
 | Date: Wed, 18 Jan 2017 16:00:00 +0900
 | 
 |  On Tue, Jan 17, 2017 at 7:24 PM, Hauke Fath <hf@spg.tu-darmstadt.de> wrote:
 |  > On Tue, 17 Jan 2017 19:15:36 +0900, Ryota Ozaki wrote:
 |  >> Could you check where mutex_tryenter+0x12 is?
 |  >> (objdump -d netbsd |grep -A 30 'mutex_tryenter>:' or something)
 |  >
 |  > Attached.
 |  
 |  Thanks.
 |  
 |  The fault happened at MUTEX_SPIN_P(mtx) in mutex_tryenter perhaps
 |  because mtx is an invalid pointer (address). The mtx comes from
 |  this:
 |    l = curlwp;  // or l = owner below
 |    owner = (*l->l_syncobj->sobj_owner)(l->l_wchan);
 |    lwp_trylock(owner);
 |    mutex_tryenter(owner->l_mutex);
 |  
 |  IIUC, owner->l_mutex can be invalid if the adaptive mutex in question
 |  is destroyed or the owner of the mutex is disappeared holding the mutex
 |  for some reasons (or the data of the mutex is corrupted somehow).
 |  The former doesn't happen in this case as I said and also the latter
 |  is unlikely to happen.
 |  
 |  I don't have any ideas :-/ I hope clean build solves the issue.
 |    ozaki-r


 Can you put a printf in carp_detach carp_ifdetach and carp_clone_destroy
 and see when they are called during shutdown?

 christos

From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org
Cc: 
Subject: Re: kern/51877: carp related panic during shutdown
Date: Wed, 18 Jan 2017 15:04:03 +0100

 On 01/18/17 08:05, Ryota Ozaki wrote:
 > I don't have any ideas :-/ I hope clean build solves the issue.

 I removed the objdir tree, 'cvs update'd, and built tools and kernel 
 from scratch:

 [...]
 igphy3: detached
 wm3: detached
 igphy2: detached
 wm2: detached
 igphy1: detached
 wm1: detached
 igphy0: detached
 carp0: state transition from: MASTER -> to: INIT
 Mutex error: lockdebug_barrier: spin lock held

 lock address : 0xfffffe821e74f400 type     :               spin
 initialized  : 0xffffffff80426f7a
 shared holds :                  0 exclusive:                  1
 shares wanted:                  0 exclusive:                  0
 current cpu  :                  0 last held:                  0
 current lwp  : 0xfffffe821ae8b9e0 last held: 0xfffffe821ae8b9e0
 last locked* : 0xffffffff8044bc9a unlocked : 0xffffffff8040a72c
 owner field  : 0x0000000000010700 wait/spin:                0/1

 Skipping crash dump on recursive panic
 panic: LOCKDEBUG: Mutex error: lockdebug_barrier: spin lock held
 cpu0: Begin traceback...
 vpanic() at netbsd:vpanic+0x140
 snprintf() at netbsd:snprintf
 lockdebug_more() at netbsd:lockdebug_more
 rw_enter() at netbsd:rw_enter+0x5fe
 uvm_fault_internal() at netbsd:uvm_fault_internal+0x161
 trap() aaaaat netbsd:trap+0x30a
 --- trap (number 6) ---
 mutex_tryenter() at netbsd:mutex_tryenter+0x12
 lwp_trylock() at netbsd:lwp_trylock+0x17
 turnstile_block() at netbsd:turnstile_block+0x238
 mutex_enter() at netbsd:mutex_enter+0x36c
 rt_update_wait() at netbsd:rt_update_wait+0x10
 _rt_free() at netbsd:_rt_free+0x11
 rtrequest1() at netbsd:rtrequest1+0x5ef
 rtrequest() at netbsd:rtrequest+0x3e
 carp_setroute() at netbsd:carp_setroute+0xd3
 carp_setrun() at netbsd:carp_setrun+0x56
 carp_carpdev_state() at netbsd:carp_carpdev_state+0x6c
 if_down() at netbsd:if_down+0x17d
 if_detach() at netbsd:if_detach+0x1d8
 wm_detach() at netbsd:wm_detach+0xc1
 config_detach() at netbsd:config_detach+0xf8
 config_detach_all() at netbsd:config_detach_all+0x97
 cpu_reboot() at netbsd:cpu_reboot+0x176
 sys_reboot() at netbsd:sys_reboot+0x75
 syscall() at netbsd:syscall+0x1db
 --- syscall (number 208) ---
 741aebe3c99a:
 cpu0: End traceback...
 rebooting...

 -- that is, same old.

 Cheerio,
 hauke

 -- 
       The ASCII Ribbon Campaign                    Hauke Fath
 ()     No HTML/RTF in email	        Institut für Nachrichtentechnik
 /\     No Word docs in email                     TU Darmstadt
       Respect for open standards              Ruf +49-6151-16-21344

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	Hauke Fath <hf@spg.tu-darmstadt.de>
Cc: 
Subject: Re: kern/51877: carp related panic during shutdown
Date: Wed, 18 Jan 2017 09:19:16 -0500

 On Jan 18,  2:05pm, hf@spg.tu-darmstadt.de (Hauke Fath) wrote:
 -- Subject: Re: kern/51877: carp related panic during shutdown

 | The following reply was made to PR kern/51877; it has been noted by GNATS.
 | 
 | From: Hauke Fath <hf@spg.tu-darmstadt.de>
 | To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org
 | Cc: 
 | Subject: Re: kern/51877: carp related panic during shutdown
 | Date: Wed, 18 Jan 2017 15:04:03 +0100
 | 
 |  On 01/18/17 08:05, Ryota Ozaki wrote:
 |  > I don't have any ideas :-/ I hope clean build solves the issue.
 |  
 |  I removed the objdir tree, 'cvs update'd, and built tools and kernel 
 |  from scratch:
 |  
 |  [...]
 |  igphy3: detached
 |  wm3: detached
 |  igphy2: detached
 |  wm2: detached
 |  igphy1: detached
 |  wm1: detached
 |  igphy0: detached
 |  carp0: state transition from: MASTER -> to: INIT
 |  Mutex error: lockdebug_barrier: spin lock held

 I think that carp should detach before the physical interfaces?

 christos

From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org
Cc: 
Subject: Re: kern/51877: carp related panic during shutdown
Date: Wed, 18 Jan 2017 16:25:57 +0100

 On 01/18/17 13:40, Christos Zoulas wrote:
 >  Can you put a printf in carp_detach carp_ifdetach and carp_clone_destroy
 >  and see when they are called during shutdown?

 Tried that

 % strings /netbsd8 | grep '@christos'
 @christos: carp_ifdetach()
 @christos: carpdetach() detaching %s
 @christos: carp_clone_destroy() %s
 %

 but none show up - the kernel panics earlier.

 > |  wm1: detached
 > |  igphy0: detached
 > |  carp0: state transition from: MASTER -> to: INIT
 > |  Mutex error: lockdebug_barrier: spin lock held
 >
 > I think that carp should detach before the physical interfaces?

 There's an idea... but

 % fgrep carpdev /etc/ifconfig.carp0
 carpdev wm0
 %

 which is still attached.

 Cheerio,
 hauke

From: Hauke Fath <hf@spg.tu-darmstadt.de>
To: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org
Cc: 
Subject: Re: kern/51877: carp related panic during shutdown
Date: Wed, 18 Jan 2017 16:38:08 +0100

 On 01/17/17 09:45, Ryota Ozaki wrote:
 >  >> If it doesn't help could you comment out rt_update_wait in _rt_free
 >  >> and try?
 >  >
 >  > Before I try and err... which source file are we talking about?
 >
 >  sys/net/route.c is.

 Looks better:

 [...]
 igphy3: detached
 wm3: detached
 igphy2: detached
 wm2: detached
 igphy1: detached
 wm1: detached
 igphy0: detached
 carp0: state transition from: MASTER -> to: INIT
 @christos: carp_ifdetach()
 @christos: carpdetach() detaching carp0
 wm0: detached
 pci5: detached
 pci4: detached
 ppb4: detached
 ppb3: detached
 uhub3: detached
 sysbeep0: detached
 pci3: detached
 pcppi0: detached
 com1: detached
 makphy1: detached
 wm5: detached
 makphy0: detached
 wm4: detached
 ppb2: detached
 carp12: state transition from: BACKUP -> to: INIT
 carp11: state transition from: BACKUP -> to: INIT
 carp10: state transition from: BACKUP -> to: INIT
 carp9: state transition from: BACKUP -> to: INIT
 carp8: state transition from: BACKUP -> to: INIT
 carp7: state transition from: BACKUP -> to: INIT
 carp3: state transition from: BACKUP -> to: INIT
 carp2: state transition from: BACKUP -> to: INIT
 ixg0: link state DOWN (was UP)
 ixg0: detached
 pci8: detached
 pci7: detached
 pci6: detached
 pci2: detached
 pci1: detached
 ppb7: detached
 ppb6: detached
 ppb5: detached
 ppb1: detached
 ppb0: detached
 pchb0: detached
 attimer1: detached
 unmounting 0xfffffe821d6f2008 / (/dev/raid0a)...
 forcefully unmounting / (/dev/raid0a)...
 raid0: detached
 wd1: detached
 wd0: detached
 atabus1: detached
 atabus0: detached
 rebooting...


From: "Ryota Ozaki" <ozaki-r@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/51877 CVS commit: src/sys/net
Date: Thu, 19 Jan 2017 06:58:55 +0000

 Module Name:	src
 Committed By:	ozaki-r
 Date:		Thu Jan 19 06:58:55 UTC 2017

 Modified Files:
 	src/sys/net: route.c rtsock.c

 Log Message:
 Disable rt_update mechanism by default

 This is a workaround for PR kern/51877. Enable again once the issue
 is fixed.


 To generate a diff of this commit:
 cvs rdiff -u -r1.187 -r1.188 src/sys/net/route.c
 cvs rdiff -u -r1.199 -r1.200 src/sys/net/rtsock.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Ryota Ozaki <ozaki-r@netbsd.org>
To: Hauke Fath <hf@spg.tu-darmstadt.de>
Cc: "gnats-bugs@NetBSD.org" <gnats-bugs@netbsd.org>, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/51877: carp related panic during shutdown
Date: Thu, 19 Jan 2017 17:24:54 +0900

 On Thu, Jan 19, 2017 at 12:40 AM, Hauke Fath <hf@spg.tu-darmstadt.de> wrote:
 > The following reply was made to PR kern/51877; it has been noted by GNATS.
 >
 > From: Hauke Fath <hf@spg.tu-darmstadt.de>
 > To: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org
 > Cc:
 > Subject: Re: kern/51877: carp related panic during shutdown
 > Date: Wed, 18 Jan 2017 16:38:08 +0100
 >
 >  On 01/17/17 09:45, Ryota Ozaki wrote:
 >  >  >> If it doesn't help could you comment out rt_update_wait in _rt_free
 >  >  >> and try?
 >  >  >
 >  >  > Before I try and err... which source file are we talking about?
 >  >
 >  >  sys/net/route.c is.
 >
 >  Looks better:

 Thanks! So I disabled the feature by default (!NET_MPSAFE).

 Nevertheless I need to investigate and fix the issue. I think
 I need to reproduce the panic on my machine to do so, so could
 you please send me full network configurations of your machine
 (with a private email if you want)?

 Thanks,
   ozaki-r

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.