NetBSD Problem Report #38273

From woods@once.weird.com  Fri Mar 21 20:55:23 2008
Return-Path: <woods@once.weird.com>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id 1A61F63B863
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 21 Mar 2008 20:55:23 +0000 (UTC)
Message-Id: <m1JcoGk-0018LzC@once.weird.com>
Date: Fri, 21 Mar 2008 16:55:18 -0400 (EDT)
From: "Greg A. Woods" <woods@planix.com>
Sender: "Greg A. Woods" <woods@once.weird.com>	
Reply-To: "Greg A. Woods" <woods@planix.com>
To: NetBSD GNATS <gnats-bugs@gnats.NetBSD.org>
Subject: "lockdebug_barrier: spin lock held" from ld_ataraid_start_raid0()
X-Send-Pr-Version: 3.95

>Number:         38273
>Category:       kern
>Synopsis:       panic: LOCKDEBUG, "lockdebug_barrier: spin lock held", from ld_ataraid_start_raid0()
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Mar 21 21:00:00 +0000 2008
>Last-Modified:  Wed Mar 17 19:50:02 +0000 2010
>Originator:     Greg A. Woods
>Release:        NetBSD 4.99.55 2008/03/20
>Organization:
Planix, Inc.; Toronto, Ontario; Canada
>Environment:
System: NetBSD 4.99.55 GENERIC (with LOCKDEBUG)
Architecture: i386
Machine: i386
>Description:

	In hopes of debugging the ataraid problem on my Asus
	PSCH-SR/SATA based server I built and booted a -current kernel
	with LOCKDEBUG, and immediatly on the first attempt to access
	the logical drive (ld0a with newfs), the following panic
	occured:

Mutex error: lockdebug_barrier: spin lock held

lock address : 0x00000000c381a6e0 type     :               spin
shared holds :                  0 exclusive:                  1
shares wanted:                  0 exclusive:                  0
current cpu  :                  0 last held:                  0
current lwp  : 0x00000000cdf1ac20 last held: 0x00000000cdf1ac20
last locked  : 0x00000000c01f05cd unlocked : 0x00000000c01f0734
initialized  : 0x00000000c01eef81
owner field  : 0x0000000000010500 wait/spin:                0/1

panic: LOCKDEBUG
Stopped in pid 871.1 (newfs) at netbsd:breakpoint+0x4:  popl    %ebp
db{0}> trace
breakpoint(cdefe818,5,cdefe84c,c0583c8b,c0a2de0f) at netbsd:breakpoint+0x4
cpu_Debugger(c0a2de0f,cdefe858,c0c9b5c0,c0584b3b,5) at netbsd:cpu_Debugger+0xb
panic(c0a2cfd0,c0584ada,c0a2cdee,c0a2ce00,0) at netbsd:panic+0x164
lockdebug_abort1(cda38700,c0d63720,c0a2cdee,c0a2ce00,1) at netbsd:lockdebug_abod
lockdebug_barrier(c0d56b80,1,0,c0c9b5c0,5) at netbsd:lockdebug_barrier+0x103
mutex_vector_enter(cc875b88,c3188750,1,1749efff,0) at netbsd:mutex_vector_enter1
ld_ataraid_start_raid0(c381a600,c3188750,cdefea0c,c01ef350,cdefea05) at netbsd:a
ldstart(c381a600,c3188750,0,c054e4ee,0) at netbsd:ldstart+0x98
ldstrategy(c3188750,0,200,1,cdefea7c) at netbsd:ldstrategy+0x256
physio(c01f0335,0,4500,0,c01f0f58) at netbsd:physio+0x3c2
ldwrite(4500,cdefec4c,10,c054e4ee,0) at netbsd:ldwrite+0x38
cdev_write(4500,cdefec4c,10,c054e25d,2) at netbsd:cdev_write+0x63
spec_write(cdefebe0,0,c0d636e0,cdf1ac20,cdef6500) at netbsd:spec_write+0xd0
ufsspec_write(cdefebe0,1864c00,cdefebfc,c05da286,cdc0d5ec) at netbsd:ufsspec_wr2
VOP_WRITE(cdc0d5ec,cdefec4c,10,cc864c00,cdc0d5ec) at netbsd:VOP_WRITE+0x70
vn_write(cde906c0,cdefecb4,cdefec4c,cc864c00,0) at netbsd:vn_write+0x131
dofilewrite(4,cde906c0,bbbb5000,200,cdefecb4) at netbsd:dofilewrite+0x8c
sys_pwrite(cdf1ac20,cdefed04,cdefecfc,c082eda1,cdefed07) at netbsd:sys_pwrite+0c
syscall(cdefed48,b3,ab,1f,1f) at netbsd:syscall+0x182
db{0}> 

	This system now has a serial console connection so more
	debugging is easy to do and access can be made available if
	necessary to anyone who can help fix the problem.

>How-To-Repeat:

>Fix:


>Audit-Trail:
From: Andrew Doran <ad@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/38273: "lockdebug_barrier: spin lock held" from ld_ataraid_start_raid0()
Date: Fri, 21 Mar 2008 22:10:23 +0000

 On Fri, Mar 21, 2008 at 09:00:00PM +0000, Greg A. Woods wrote:

 > Mutex error: lockdebug_barrier: spin lock held
 > 
 > lock address : 0x00000000c381a6e0 type     :               spin
 > shared holds :                  0 exclusive:                  1
 > shares wanted:                  0 exclusive:                  0
 > current cpu  :                  0 last held:                  0
 > current lwp  : 0x00000000cdf1ac20 last held: 0x00000000cdf1ac20
 > last locked  : 0x00000000c01f05cd unlocked : 0x00000000c01f0734
 > initialized  : 0x00000000c01eef81
 > owner field  : 0x0000000000010500 wait/spin:                0/1

 What are:

 x/I 0x00000000c01f05cd
 x/I 0x00000000c01eef81

 You can also find them by poking about with 'nm'.

 Thanks,
 Andrew

From: "Greg A. Woods" <woods@planix.com>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: Andrew Doran <ad@netbsd.org>
Subject: Re: kern/38273: "lockdebug_barrier: spin lock held" from ld_ataraid_start_raid0()
Date: Fri, 21 Mar 2008 19:25:39 -0400

 --pgp-sign-Multipart_Fri_Mar_21_19:25:39_2008-1
 Content-Type: text/plain; charset=US-ASCII
 Content-Transfer-Encoding: quoted-printable

 At Fri, 21 Mar 2008 22:15:05 +0000 (UTC), Andrew Doran wrote:
 Subject: Re: kern/38273: "lockdebug_barrier: spin lock held" from ld_atarai=
 d_start_raid0()
 >=20
 >  What are:
 > =20
 >  x/I 0x00000000c01f05cd
 >  x/I 0x00000000c01eef81

 Mutex error: lockdebug_barrier: spin lock held

 lock address : 0x00000000c381a6e0 type     :               spin
 shared holds :                  0 exclusive:                  1
 shares wanted:                  0 exclusive:                  0
 current cpu  :                  0 last held:                  0
 current lwp  : 0x00000000cdaee0a0 last held: 0x00000000cdaee0a0
 last locked  : 0x00000000c01f05cd unlocked : 0x00000000c01f0734
 initialized  : 0x00000000c01eef81
 owner field  : 0x0000000000010500 wait/spin:                0/1

 panic: LOCKDEBUG
 Stopped in pid 858.1 (newfs) at netbsd:breakpoint+0x4:  popl    %ebp
 db{0}> x/I 0x00000000c01f05cd
 netbsd:ldstart+0x16:    cmpl    $0,0xc(%ebp)
 db{0}> x/I 0x00000000c01eef81
 netbsd:ldattach+0x29:   movl    0x8(%ebp),%eax
 db{0}>=20

 --=20
 						Greg A. Woods
 						Planix, Inc.

 <woods@planix.com>     +1 416 489-5852 x122     http://www.planix.com/

 --pgp-sign-Multipart_Fri_Mar_21_19:25:39_2008-1
 Content-Type: application/pgp-signature
 Content-Transfer-Encoding: 7bit

 -----BEGIN PGP SIGNATURE-----
 Version: PGPfreeware 5.0i for non-commercial use
 MessageID: Nf1ZW/vx6YIAZLlJIVcF9hpLdrIQ+1Ax

 iQA/AwUBR+RD82Z9cbd4v/R/EQKR+gCeL412nqTI82sB06Vik1ehjgaXK/oAn2kL
 ce/bK3vqjVKU4twBoNfiznG8
 =/U16
 -----END PGP SIGNATURE-----

 --pgp-sign-Multipart_Fri_Mar_21_19:25:39_2008-1--

From: "Greg A. Woods" <woods@planix.com>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: 
Subject: Re: kern/38273: "lockdebug_barrier: spin lock held" from ld_ataraid_start_raid0()
Date: Thu, 27 Mar 2008 13:48:43 -0400

 It doesn't look like there's been any change in this with 4.99.58 as of
 yesterday's sources (2008/03/26):

 Mutex error: lockdebug_barrier: spin lock held

 lock address : 0x00000000c381cce0 type     :               spin
 shared holds :                  0 exclusive:                  1
 shares wanted:                  0 exclusive:                  0
 current cpu  :                  0 last held:                  0
 current lwp  : 0x00000000cdf08680 last held: 0x00000000cdf08680
 last locked  : 0x00000000c01f0b41 unlocked : 0x00000000c01f0ca8
 initialized  : 0x00000000c01ef4f5
 owner field  : 0x0000000000010500 wait/spin:                0/1

 panic: LOCKDEBUG
 Stopped in pid 1428.1 (newfs) at        netbsd:breakpoint+0x4:  popl    %ebp
 db{0}> x/I 0x00000000c01f0b41
 netbsd:ldstart+0x16:    cmpl    $0,0xc(%ebp)
 db{0}> x/I 0x00000000c01ef4f5
 netbsd:ldattach+0x29:   movl    0x8(%ebp),%eax
 db{0}> x/I 0x00000000c381cce0
 0xc381cce0:     addl    $-0x1ffffefb,%eax
 db{0}> whatis 0x00000000c381cce0
 0xc381cce0 is 0xc316c000+7015648 from VMMAP 0xc0d346e0
 0xc381cce0 is 0xc3749000+867552 from VMMAP 0xc0d35e60
 db{0}> trace
 breakpoint(c0a2d30f,cdf4b848,c0c9b5c0,c05855cb,5) at netbsd:breakpoint+0x5
 panic(c0a2c4d0,c058556a,c0a2c2ee,c0a2c300,0) at netbsd:panic+0x164
 lockdebug_abort1(cda389c0,c0d63660,c0a2c2ee,c0a2c300,1) at netbsd:lockdebug_abort1+0x8d
 lockdebug_barrier(c0d56a00,1,0,c0c9b5c0,5) at netbsd:lockdebug_barrier+0x103
 mutex_vector_enter(cc875b88,c31881d4,1,1749efff,0) at netbsd:mutex_vector_enter+0x2d1
 ld_ataraid_start_raid0(c381cc00,c31881d4,cdf4b9fc,c01ef8c4,cdf4b905) at netbsd:ld_ataraid_start_raid0+0x50a
 ldstart(c381cc00,c31881d4,0,c054edde,0) at netbsd:ldstart+0x98
 ldstrategy(c31881d4,0,200,1,cdf4ba6c) at netbsd:ldstrategy+0x256
 physio(c01f08a9,0,4500,0,c01f14cc) at netbsd:physio+0x3c2
 ldwrite(4500,cdf4bc50,10,c054edde,0) at netbsd:ldwrite+0x38
 cdev_write(4500,cdf4bc50,10,c054eb4d,2) at netbsd:cdev_write+0x63
 spec_write(cdf4bbd0,0,0,cdf61180,cdeb6530) at netbsd:spec_write+0xd0
 ufsspec_write(cdf4bbd0,1864840,cdf4bbec,c05da0b9,cdc0f5ec) at netbsd:ufsspec_write+0x62
 VOP_WRITE(cdc0f5ec,cdf4bc50,10,cc864840,3) at netbsd:VOP_WRITE+0x70
 vn_write(cdea2580,cdf4bcbc,cdf4bc50,cc864840,0) at netbsd:vn_write+0x10b
 dofilewrite(4,cdea2580,bbbb5000,200,cdf4bcbc) at netbsd:dofilewrite+0x86
 sys_pwrite(cdf08680,cdf4bd04,cdf4bcfc,c082fc7d,cdf4bd07) at netbsd:sys_pwrite+0x109
 syscall(cdf4bd48,b3,ab,1f,1f) at netbsd:syscall+0x17b
 db{0}> ps
  PID           PPID     PGRP        UID S   FLAGS LWPS          COMMAND    WAIT
 >1428          1256     1428          0 2  0x4000    1            newfs
  1256           624     1256          0 2  0x4000    1              ksh     tty
  624            614      624       1000 2  0x4000    1              ksh   pause
  614            619      614       1000 2  0x4000    1            xterm  select
  619            261      261          0 2  0x4100    1             rshd  select
  285              1      285          0 2  0x4000    1            getty     tty
  280              1      280          0 2  0x4000    1            getty     tty
  287              1      287          0 2  0x4000    1            getty     tty
  268              1      268          0 2  0x4000    1            getty     tty
  275              1      275          0 2  0x4000    1            getty     tty
  292              1      292          0 2  0x4000    1            getty     tty
  277              1      277          0 2  0x4000    1            getty     tty
  283              1      283          0 2  0x4000    1            getty     tty
  286              1      286          0 2  0x4000    1            getty     tty
  282              1      282          0 2       0    1             cron nanoslp
  261              1      261          0 2       0    1            inetd  kqueue
  217              1      217         15 2   0x100    1             ntpd   pause
  156              1      156          0 2       0    1        mount_mfs  mfsidl
  114              1      114          0 2       0    1          syslogd  kqueue
  1                0        1          0 2  0x4001    1             init    wait
  0               -1        0          0 2 0x20002   36           system       *
 db{0}> reboot
 syncing disks... done
 rebooting...

 -- 
 						Greg A. Woods
 						Planix, Inc.

 <woods@planix.com>     +1 416 489-5852 x122     http://www.planix.com/

From: "Greg A. Woods" <woods@planix.com>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: NetBSD Kernel Bug People <kern-bug-people@netbsd.org>
Subject: Re: kern/38273: "lockdebug_barrier: spin lock held" from ld_ataraid_start_raid0()
Date: Mon, 07 Apr 2008 08:29:43 -0400

 I think this may be another view of the same bug, but I'm not 100% sure.

 This panic occurs reliably many seconds into reading the raw ld0 device
 with "dd if=/dev/rld0d of=/dev/null bs=20m" (up until the panic speeds
 of over 57MB/s are observed):

 Mutex error: lockdebug_barrier: spin lock held

 lock address : 0x00000000c37b6ce0 type     :               spin
 shared holds :                  0 exclusive:                  1
 shares wanted:                  0 exclusive:                  0
 current cpu  :                  1 last held:                  1
 current lwp  : 0x00000000cdaf0720 last held: 0x00000000cdaf0720
 last locked  : 0x00000000c01f0a99 unlocked : 0x00000000c01f0c00
 initialized  : 0x00000000c01ef44d
 owner field  : 0x0000000000010500 wait/spin:                0/1

 panic: LOCKDEBUG
 Stopped in pid 716.1 (dd) at    netbsd:breakpoint+0x4:  popl    %ebp
 db{1}> x/I 0x00000000cdaf0720
 0xcdaf0720:     addb    %al,0(%eax)
 db{1}> whatis 0x00000000cdaf0720
 0xcdaf0720 is 0xcdaf0000+1824 in POOL 'kvakernel' (allocated)
 0xcdaf0720 is 0xcdaf0720+0 in POOL 'lwppl' (allocated)
 0xcdaf0720 is 0xcdad7000+104224 from VMMAP 0xc0d3f7e0
 db{1}> x/I 0x00000000c01f0a99
 netbsd:ldstart+0x16:    cmpl    $0,0xc(%ebp)
 db{1}> x/I 0x00000000c01ef44d
 netbsd:ldattach+0x29:   movl    0x8(%ebp),%eax
 db{1}> x/I 0x00000000c01f0c00
 netbsd:ldstart+0x17d:   leave
 db{1}> trace
 breakpoint(cdb25478,5,cdb254ac,c058601b,c0a31b47) at netbsd:breakpoint+0x4
 cpu_Debugger(c0a31b47,cdb254b8,c36c0800,c0586ecb,5) at netbsd:cpu_Debugger+0xb
 panic(c0a30d74,c0586e6a,c0a30b92,c0a30ba4,0) at netbsd:panic+0x164
 lockdebug_abort1(cda4cb00,c0d6e760,c0a30b92,c0a30ba4,1) at netbsd:lockdebug_abort1+0x8d
 lockdebug_barrier(c0d61b00,1,0,100,c0d64b00) at netbsd:lockdebug_barrier+0x103
 mutex_vector_enter(c0d466f0,c36c0800,ffffffff,c36c0800,5) at netbsd:mutex_vector_enter+0x2d1
 pool_cache_invalidate(c0d46600,cdb255d4,cdb255d4,c057e9c5,c0d6e760) at netbsd:pool_cache_invalidate+0x16
 pool_reclaim(c0d46600,c36c0800,cdb2563c,c0550179,c0d40ffe) at netbsd:pool_reclaim+0x81
 pool_reclaim_callback(c0d466d8,c0d46600,0,c36c0800,5) at netbsd:pool_reclaim_callback+0x59
 callback_runone(c0d40ffc,0,cdb2567c,22,34) at netbsd:callback_runone+0xdb
 callback_run_roundrobin(c0d40ffc,0,cdb2569c,c04e0cfc,c0d40f6c) at netbsd:callback_run_roundrobin+0x3c
 uvm_km_va_drain(c0d40f60,e01727,cdb256ec,c04e1d57,c0d40f60) at netbsd:uvm_km_va_drain+0x2c
 vm_map_drain(c0d40f60,e01727,20000,cdb256f8,0) at netbsd:vm_map_drain+0x24
 uvm_map_prepare(c0d40f60,c3177000,20000,0,ffffffff) at netbsd:uvm_map_prepare+0x1fa
 uvm_map(c0d40f60,cdb25790,20000,0,ffffffff) at netbsd:uvm_map+0x180
 km_vacache_alloc(c0d41020,2,0,c36c0800,c0d41110) at netbsd:km_vacache_alloc+0x7f
 pool_allocator_alloc(c0d41020,2,0,c36c0800,ffffffff) at netbsd:pool_allocator_alloc+0x23
 pool_grow(c0d41020,2,0,c057e56c,5) at netbsd:pool_grow+0x2d
 pool_get(c0d41020,2,cdb2586c,c057f2c9,c0d6e760) at netbsd:pool_get+0x272
 uvm_km_alloc_poolpage_cache(c0d40f60,0,0,0,1) at netbsd:uvm_km_alloc_poolpage_cache+0x71
 pool_page_alloc(c0d47600,2,0,c36c0800,c0ca4380) at netbsd:pool_page_alloc+0x25
 pool_allocator_alloc(c0d47600,2,0,c36c0800,ffffffff) at netbsd:pool_allocator_alloc+0x23
 pool_grow(c0d47600,2,8,0,0) at netbsd:pool_grow+0x2d
 pool_get(c0d47600,2,0,0,49f000) at netbsd:pool_get+0x272
 ld_ataraid_make_cbuf(c37b6c00,c389ea60,0,49f000,0) at netbsd:ld_ataraid_make_cbuf+0x26
 ld_ataraid_start_raid0(c37b6c00,c389ea60,cdb25a2c,c01ef81c,cdb25a05) at netbsd:ld_ataraid_start_raid0+0x367
 ldstart(c37b6c00,c389ea60,0,c05502af,0) at netbsd:ldstart+0x98
 ldstrategy(c389ea60,0,10000,2,c0d6e760) at netbsd:ldstrategy+0x256
 physio(c01f0801,0,4503,100000,c01f1424) at netbsd:physio+0x3c2
 ldread(4503,cdb25c6c,0,c05502af,0) at netbsd:ldread+0x38
 cdev_read(4503,cdb25c6c,0,c057e542,1) at netbsd:cdev_read+0x5f
 spec_read(cdb25bf0,c0a393c0,cdc10400,10001,0) at netbsd:spec_read+0xe5
 ufsspec_read(cdb25bf0,10001,1,c0a38e80,cdc10400) at netbsd:ufsspec_read+0x60
 VOP_READ(cdc10400,cdb25c6c,0,cc864a80,cdb25c38) at netbsd:VOP_READ+0x67
 vn_read(cdee75c0,cdee75c0,cdb25c6c,cc864a80,1) at netbsd:vn_read+0xc0
 dofileread(3,cdee75c0,8065000,200000,cdee75c0) at netbsd:dofileread+0x8b
 sys_read(cdaf0720,cdb25d04,cdb25cfc,2,164ea4b) at netbsd:sys_read+0x89
 syscall(cdb25d48,b3,ab,1f,1f) at netbsd:syscall+0x17b
 db{1}> ps
  PID           PPID     PGRP        UID S   FLAGS LWPS          COMMAND    WAIT
 >716            725      716          0 2  0x4000    1               dd
  725            281      725          0 2  0x4000    1              ksh   pause
  281            285      281       1000 2  0x4000    1              ksh   pause
  285            291      285       1000 2  0x4000    1            xterm  select
  291            272      272          0 2  0x4100    1             rshd  select
  287              1      287          0 2  0x4000    1            getty     tty
  299              1      299          0 2  0x4000    1            getty     tty
  293              1      293          0 2  0x4000    1            getty     tty
  305              1      305          0 2  0x4000    1            getty     tty
  297              1      297          0 2  0x4000    1            getty     tty
  280              1      280          0 2  0x4000    1            getty     tty
  294              1      294          0 2  0x4000    1            getty     tty
  302              1      302          0 2  0x4000    1            getty     tty
  296              1      296          0 2  0x4000    1            getty     tty
  286              1      286          0 2       0    1             cron nanoslp
  272              1      272          0 2       0    1            inetd  kqueue
  231              1      231         15 2   0x100    1             ntpd   pause
  156              1      156          0 2       0    1        mount_mfs  mfsidl
  114              1      114          0 2       0    1          syslogd  kqueue
  1                0        1          0 2  0x4001    1             init    wait
  0               -1        0          0 2 0x20002   36           system       *
 db{1}> 

 -- 
 						Greg A. Woods
 						Planix, Inc.

 <woods@planix.com>     +1 416 489-5852 x122     http://www.planix.com/

From: "Greg A. Woods" <woods@planix.com>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: NetBSD Kernel Bug People <kern-bug-people@netbsd.org>
Subject: Re: kern/38273: "lockdebug_barrier: spin lock held" from ld_ataraid_start_raid0()
Date: Fri, 25 Apr 2008 13:13:42 -0400

 I've been trying my hand at looking deeper at this problem but I'm
 having a difficult time figuring out which lock is which, and at this
 point I'm not even sure if the mutex_vector_enter() in the stack
 backtrace is the same as mutex_enter() in the source or not.

 The first line in ldstart() is:

 	mutex_enter(&sc->sc_mutex);

 Then a little bit later, before any mutex_exit(&sc->sc_mutex) there's a
 call, through the sc_start function pointer, to the ld_ataraid_start_raid0()
 routine.

 The only locking I can see that ld_ataraid_start_raid0() does is:

 			mutex_enter(&cbp->cb_buf.b_vp->v_interlock);

 Is that the same lock as is used in ldstart(), i.e. the sc_mutex?

 Interestingly I see that before and after calling biodone(), ldstart()
 releases and then re-acquires the sc_mutex (if I'm interpreting this
 right):

 				mutex_exit(&sc->sc_mutex);
 				biodone(bp);
 				mutex_enter(&sc->sc_mutex);

 Should the same be done before calling the sc_start function?

 Or should ld_ataraid_start_raid0() not be doing any locking at all?

 -- 
 						Greg A. Woods
 						Planix, Inc.

 <woods@planix.com>     +1 416 489-5852 x122     http://www.planix.com/

From: "Greg A. Woods" <woods@planix.com>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: NetBSD GNATS Administrator <gnats-admin@NetBSD.org>,
    NetBSD Kernel Technical Discussion List <tech-kern@NetBSD.org>
Subject: Re: kern/38273: "lockdebug_barrier: spin lock held" from ld_ataraid_start_raid0()
Date: Sat, 23 Aug 2008 19:02:11 -0400

 --pgp-sign-Multipart_Sat_Aug_23_19:02:07_2008-1
 Content-Type: text/plain; charset=US-ASCII
 Content-Transfer-Encoding: quoted-printable

 At Fri, 25 Apr 2008 17:15:04 +0000 (UTC), Me-planix.com wrote:
 Subject: Re: kern/38273: "lockdebug_barrier: spin lock held" from ld_atarai=
 d_start_raid0()
 >=20
 >  I've been trying my hand at looking deeper at this problem but I'm
 >  having a difficult time figuring out which lock is which, and at this
 >  point I'm not even sure if the mutex_vector_enter() in the stack
 >  backtrace is the same as mutex_enter() in the source or not.
 > =20
 >  The first line in ldstart() is:
 > =20
 >  	mutex_enter(&sc->sc_mutex);
 > =20
 >  Then a little bit later, before any mutex_exit(&sc->sc_mutex) there's a
 >  call, through the sc_start function pointer, to the ld_ataraid_start_rai=
 d0()
 >  routine.
 > =20
 >  The only locking I can see that ld_ataraid_start_raid0() does is:
 > =20
 >  			mutex_enter(&cbp->cb_buf.b_vp->v_interlock);
 > =20
 >  Is that the same lock as is used in ldstart(), i.e. the sc_mutex?
 > =20
 >  Interestingly I see that before and after calling biodone(), ldstart()
 >  releases and then re-acquires the sc_mutex (if I'm interpreting this
 >  right):
 > =20
 >  				mutex_exit(&sc->sc_mutex);
 >  				biodone(bp);
 >  				mutex_enter(&sc->sc_mutex);
 > =20
 >  Should the same be done before calling the sc_start function?
 > =20
 >  Or should ld_ataraid_start_raid0() not be doing any locking at all?

 As far as I can tell I haven't seen any reply to this yet.

 It's still happening.  I hadn't even got this far until today when
 Juergen Hannken-Illjes suggested a working fix for my PR# 38636.

 Now I'm back to this one.  I've CC'ed tech-kern once again to see if
 fresh eyes might help spot something obvious.

 FYI, here's what the crash looks like today:

 Mutex error: lockdebug_barrier: spin lock held

 lock address : 0x00000000d185d7ac type     :               spin
 initialized  : 0x00000000c01f430c
 shared holds :                  0 exclusive:                  1
 shares wanted:                  0 exclusive:                  0
 current cpu  :                  0 last held:                  0
 current lwp  : 0x00000000d1e57380 last held: 0x00000000d1e57380
 last locked  : 0x00000000c01f3cee unlocked : 0x00000000c01f3d6b
 owner field  : 0x0000000000010600 wait/spin:                0/1

 panic: LOCKDEBUG
 fatal breakpoint trap in supervisor mode
 trap type 1 code 0 eip c05ac52c cs 8 eflags 246 cr2 bbbfb000 ilevel 6
 Stopped in pid 857.1 (newfs) at netbsd:breakpoint+0x4:  popl    %ebp
 db{0}> trace
 breakpoint(c0afbae3,d1c3d8c8,c0b29800,c04e351f,6,1,0,0,d1c3d8c8,8) at netbs=
 d:breakpoint+0x4
 panic(c0a9eddc,c0a9a5f7,c087af90,c0a9edf5,0,1000001,6,0,0,d1823b80) at netb=
 sd:panic+0x1b8
 lockdebug_abort1(c0a9edf5,1,0,0,c0aa38ce,d185d6cc,d1c3d92c,c049a1ca,c31f7e6=
 0,c0b25fa4) at netbsd:lockdebug_abort1+0xbb
 mutex_vector_enter(d1823b80,0,cc4c0000,200,6,0,c01f3cee,c32c5f44,0,efff1749=
 ) at netbsd:mutex_vector_enter+0x437
 ld_ataraid_start_raid0(d185d6cc,c31e860c,d1c3da4c,200,c32cda00,d185d7ac,d18=
 5d750,0,c31e860c,d185d6cc) at netbsd:ld_ataraid_start_raid0+0x2e2
 ldstart(6,c31e860c,0,0,c04b358b,101,0,d1818830,0,c32cda00) at netbsd:ldstar=
 t+0x6e
 ldstrategy(c31e860c,200,200,1,0,d181881c,d1818830,d1818834,bbbb5000,d1e5738=
 0) at netbsd:ldstrategy+0x171
 physio(c01f4770,0,4500,0,c01f3500,d1c3dc5c,d1c3db4c,c04d64b0,4500,d1c3dc5c)=
  at netbsd:physio+0x251
 ldwrite(4500,d1c3dc5c,10,8,d1b09720,d1c3dc5c,6,d1e57380,d1c3dbe4,d1b09680) =
 at netbsd:ldwrite+0x35
 cdev_write(4500,d1c3dc5c,10,2,d1b09720,d17fd000,d1c3db8c,c0522bf7,d1b09720,=
 1) at netbsd:cdev_write+0x70
 spec_write(d1c3dbe4,bbbf8000,c087c740,d1b09680,2,20002,d1c3dbfc,c052e058,c0=
 87c240,d1b09680) at netbsd:spec_write+0xa0
 VOP_WRITE(d1b09680,d1c3dc5c,10,cc4a6a80,0,0,2,16,200,bbbb5000) at netbsd:VO=
 P_WRITE+0x6c
 vn_write(d1e1c980,d1c3dcc4,d1c3dc5c,cc4a6a80,0,ffffffff,d1c3dc8c,c053632c,d=
 1c3dc6c,d1e1c900) at netbsd:vn_write+0xb1
 dofilewrite(4,d1e1c980,bbbb5000,200,d1c3dcc4,0,d1c3dd28,c05b5b7f,0,0) at ne=
 tbsd:dofilewrite+0x75
 sys_pwrite(d1e57380,d1c3dd00,d1c3dd28,bbbfb000,bbbfb000,d1ea2dd8,2,4,bbbb50=
 00,200) at netbsd:sys_pwrite+0xc7
 syscall(d1c3dd48,b3,ab,1f,1f,0,1749efff,bfbfc8b8,0,0) at netbsd:syscall+0xab
 db{0}> x/I 0x00000000c01f3cee
 netbsd:ldstart+0x1e:    testl   %esi,%esi
 db{0}> x/I 0x00000000d1e57380
 0xd1e57380:     addb    %al,0(%eax)
 db{0}> x/I 0x00000000c01f3d6b
 netbsd:ldstart+0x9b:    addl    $0x1c,%esp
 db{0}> x/I 0x00000000c01f430c
 netbsd:ldattach+0x2c:   testb   $0x1,0x128(%edi)
 db{0}> call simple_lock_dump
 Symbol not found
 db{0}>=20

 --=20
 						Greg A. Woods
 						Planix, Inc.

 <woods@planix.com>     +1 416 489-5852 x122     http://www.planix.com/

 --pgp-sign-Multipart_Sat_Aug_23_19:02:07_2008-1
 Content-Type: application/pgp-signature
 Content-Transfer-Encoding: 7bit

 -----BEGIN PGP SIGNATURE-----
 Version: PGPfreeware 5.0i for non-commercial use
 MessageID: WAD0I7+JuJmcRZ3NO/I9I2k7ym7/7sex

 iQA/AwUBSLCW82Z9cbd4v/R/EQL0PwCg6PrcbklyGS1H/KIXC6FnCzG6GXsAniYm
 QajjB4l9wN49CLthK2s2RIg+
 =F7Nt
 -----END PGP SIGNATURE-----

 --pgp-sign-Multipart_Sat_Aug_23_19:02:07_2008-1--

From: Juan RP <xtraeme@gmail.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/38273 panic: LOCKDEBUG,
 "lockdebug_barrier: spin lock held", from ld_ataraid_start_raid0()
Date: Wed, 10 Sep 2008 18:11:03 +0200

 Hi,

 I don't know exactly if the mutex needs to run at IPL_VM exactly,
 but changing it to IPL_NONE seems to do the right thing at least
 on ataraid(4).

 Someone with more knowledge should verify that this change is
 correct, but I'm using it as workaround for now... and I was able
 to copy/remove a few gigabytes with a DEBUG/DIAGNOSTIC/LOCKDEBUG
 kernel without any issue on an Intel MatrixRAID controller (RAID1).

 Index: ld.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/ld.c,v
 retrieving revision 1.63
 diff -b -u -p -r1.63 ld.c
 --- ld.c        9 Sep 2008 12:45:39 -0000       1.63
 +++ ld.c        10 Sep 2008 16:06:25 -0000
 @@ -99,7 +99,7 @@ ldattach(struct ld_softc *sc)
  {
         char tbuf[9];

 -       mutex_init(&sc->sc_mutex, MUTEX_DEFAULT, IPL_VM);
 +       mutex_init(&sc->sc_mutex, MUTEX_DEFAULT, IPL_NONE);

         if ((sc->sc_flags & LDF_ENABLED) == 0) {
                 aprint_normal_dev(sc->sc_dv, "disabled\n");

From: "Greg A. Woods" <woods@planix.com>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: 
Subject: Re: kern/38273: "lockdebug_barrier: spin lock held" from ld_ataraid_start_raid0()
Date: Wed, 10 Sep 2008 15:43:48 -0400

 --pgp-sign-Multipart_Wed_Sep_10_15:43:04_2008-1
 Content-Type: text/plain; charset=US-ASCII
 Content-Transfer-Encoding: quoted-printable

 Well, now today I get a kernel that just hangs hard with newfs of ld(4)
 on ataraid(4):

 NetBSD 4.99.72 (GENERIC) #4: Wed Sep 10 15:05:13 EDT 2008
 [[....]]
 ASUSTeK COMPUTER INC. PSCHSR-A (1.XX)
 [[....]]
 wd1 at atabus2 drive 0: <WDC WD2000JD-00HBB0>
 wd1: 186 GB, 387621 cyl, 16 head, 63 sec, 512 bytes/sect x 390721968 sectors
 rnd: wd1 attached as an entropy source (collecting)
 wd2 at atabus3 drive 0: <WDC WD2000JD-00HBB0>
 wd2: 186 GB, 387621 cyl, 16 head, 63 sec, 512 bytes/sect x 390721968 sectors
 [[....]]
 ataraid0: found 1 RAID volume
 ld0 at ataraid0 vendtype 1 unit 0: Adaptec ATA RAID-1 array
 ld0: ld_ataraid_attach(): ld unit 0 (ld->sc_dv =3D 0xd180de10)
 ld0: ldattach(): unit 0
 ld0: 186 GB, 24321 cyl, 255 head, 63 sec, 512 bytes/sect x 390721536 sectors


 [halt sent]
 fatal breakpoint trap in supervisor mode
 trap type 1 code 0 eip c05ae0dc cs 8 eflags 202 cr2 8089000 ilevel 8
 Stopped in pid 0.4 (system) at  netbsd:breakpoint+0x4:  popl    %ebp
 db{0}> trace
 breakpoint(0,3f8,5,6,c0bf03a0,d0e80bcc,d0d0cf6c,c32c400e,c32c5000,7f9) at n=
 etbsd:breakpoint+0x4
 comintr(d0e80ac0,d087cb10,2f0fe20f,70ff10b,ef00f8f,2f0e0d8f,f0fb20f,50ff20f=
 ,ff00f0d,f0e850f) at netbsd:comintr+0x575
 DDB lost frame for netbsd:Xintr_ioapic_edge10+0xa9, trying 0xd0d0cf74
 Xintr_ioapic_edge10() at netbsd:Xintr_ioapic_edge10+0xa9
 --- interrupt ---
 --- switch to interrupt stack ---
 Xspllower(6,c7088644,c4,c0848419,1000000,d49a48c0,c0bd8514,c0847ee6,d49a4d4=
 0,d49a4d80) at netbsd:Xspllower+0xf
 pool_put(c0bd84a0,c7088644,cc504000,0,6,0,c01f3f6a,c32d4f44,c04b528f,cc4c77=
 20) at netbsd:pool_put+0x60
 ld_ataraid_start_raid0(d180de10,c31e8158,0,c049bd03,c31f7e18,d180de70,d180d=
 e14,c31f7dc8,10000,c32d4f00) at netbsd:ld_ataraid_start_raid0+0x231
 ldstart(d180de10,c3446ecc,3,d1843ac8,0,1,d1843ac8,c04b245a,c3446ecc,d180de1=
 0) at netbsd:ldstart+0x6d
 ld_ataraid_iodone_raid0(c31f7dc8,0,0,3,c31f7dc8,c0b37940,d087cd40,c051d97a,=
 3,3) at netbsd:ld_ataraid_iodone_raid0+0x1bd
 biodone2(3,3,3,3,cc4be29c,cc4be004,d087cd80,c04c3af7,0,0) at netbsd:biodone=
 2+0x99
 biointr(0,0,0,0,0,0,0,3,0,0) at netbsd:biointr+0x3a
 softint_dispatch(cc4c7be0,3,0,0,0,0,d087cd90,d087cba8,d087cc00,28) at netbs=
 d:softint_dispatch+0xb7
 DDB lost frame for netbsd:Xsoftintr+0x3d, trying 0xd087cd88
 Xsoftintr() at netbsd:Xsoftintr+0x3d
 --- interrupt ---
 fatal page fault in supervisor mode
 trap type 6 code 0 eip c05b055f cs 8 eflags 10202 cr2 3b ilevel 8
 kernel: supervisor trap page fault, code=3D0
 Faulted in DDB; continuing...
 db{0}> cont
 [halt sent]
 fatal breakpoint trap in supervisor mode
 trap type 1 code 0 eip c05ae0dc cs 8 eflags 202 cr2 3b ilevel 8
 Stopped in pid 0.4 (system) at  netbsd:breakpoint+0x4:  popl    %ebp
 db{0}> trace
 breakpoint(0,3f8,5,6,c0bf03a0,d0e80bcc,d0d0cf6c,c32c4010,c32c5000,7ff) at n=
 etbsd:breakpoint+0x4
 comintr(d0e80ac0,d087caf0,2f0fe20f,70ff10b,ef00f8f,2f0e0d8f,f0fb20f,50ff20f=
 ,ff00f0d,f0e850f) at netbsd:comintr+0x575
 DDB lost frame for netbsd:Xintr_ioapic_edge10+0xa9, trying 0xd0d0cf74
 Xintr_ioapic_edge10() at netbsd:Xintr_ioapic_edge10+0xa9
 --- interrupt ---
 --- switch to interrupt stack ---
 Xspllower(6,c04e127c,0,c86af70c,c86affe0,d087cbe0,d087cbf0,c04e127c,c0bd851=
 4,c86af70c) at netbsd:Xspllower+0xf
 mutex_vector_exit(c0bd8514,c86af70c,c4,d3f69fc0,0,d3f69f40,c0bd8514,c084841=
 9,1009e80,c86af7a4) at netbsd:mutex_vector_exit+0x145
 pool_put(c0bd84a0,c86af70c,cc504000,0,6,0,c01f3f0b,c32d4f44,c04b528f,cc4c77=
 20) at netbsd:pool_put+0x1ac
 ld_ataraid_start_raid0(d180de10,c31e8158,0,c049bd03,c31f7c90,d180de70,d180d=
 e14,c31f7c40,10000,c32d4f00) at netbsd:ld_ataraid_start_raid0+0x231
 ldstart(d180de10,c31e8000,3,d1843ac8,0,1,d1843ac8,c04b245a,c31e8000,d180de1=
 0) at netbsd:ldstart+0x6d
 ld_ataraid_iodone_raid0(c31f7c40,0,0,3,c31f7c40,c0b37940,d087cd40,c051d97a,=
 3,3) at netbsd:ld_ataraid_iodone_raid0+0x1bd
 biodone2(3,3,3,3,cc4be29c,cc4be004,d087cd80,c04c3af7,0,0) at netbsd:biodone=
 2+0x99
 biointr(0,0,0,0,0,0,0,3,0,0) at netbsd:biointr+0x3a
 softint_dispatch(cc4c7be0,3,0,0,0,0,d087cd90,d087cce4,d087cd00,28) at netbs=
 d:softint_dispatch+0xb7
 DDB lost frame for netbsd:Xsoftintr+0x3d, trying 0xd087cd88
 Xsoftintr() at netbsd:Xsoftintr+0x3d
 --- interrupt ---
 fatal page fault in supervisor mode
 trap type 6 code 0 eip c05b055f cs 8 eflags 10202 cr2 3b ilevel 8
 kernel: supervisor trap page fault, code=3D0
 Faulted in DDB; continuing...
 db{0}>=20
 db{0}> ps
  PID           PPID     PGRP        UID S   FLAGS LWPS          COMMAND    =
 WAIT
  891            721      891          0 2  0x4000    1            newfs  ph=
 ysio
 [[....]]

 --=20
 						Greg A. Woods
 						Planix, Inc.

 <woods@planix.com>     +1 416 489-5852 x122     http://www.planix.com/

 --pgp-sign-Multipart_Wed_Sep_10_15:43:04_2008-1
 Content-Type: application/pgp-signature
 Content-Transfer-Encoding: 7bit

 -----BEGIN PGP SIGNATURE-----
 Version: PGPfreeware 5.0i for non-commercial use
 MessageID: /3YxBS+k5jDSESTGOObBrSFl0GyEPvFe

 iQA/AwUBSMgjdGZ9cbd4v/R/EQJKjgCg8w3FTula0JeH/QqEXh81UYOBfjEAn0AB
 4lyOAu01cbDFIYx1h8PedfUV
 =VQLk
 -----END PGP SIGNATURE-----

 --pgp-sign-Multipart_Wed_Sep_10_15:43:04_2008-1--

From: Matthias Scheler <tron@NetBSD.org>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: 
Subject: Re: kern/38273
Date: Thu, 11 Sep 2008 17:30:31 +0100

 	Hello,

 Juan's proposed fixed is unfortunately not correct because ld(4) can
 be called on an interrupt context. Please have a look at this
 discussion on "tech-kern" for the details:

 http://mail-index.netbsd.org/tech-kern/2008/09/11/msg002699.html

 	Kind regards

 -- 
 Matthias Scheler                                  http://zhadum.org.uk/

From: Juan RP <xtraeme@gmail.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/38273 panic: LOCKDEBUG,
 "lockdebug_barrier: spin lock held", from ld_ataraid_start_raid0()
Date: Wed, 17 Sep 2008 16:58:17 +0200

 Hi,

 please try this patch. I've built a NetBSD/amd64 full release with
 src on RAID0 and obj/destdir/tools on RAID1 with ataraid(4)
 successfully with all debugging options enabled.

 $ dmesg|grep 'ld[0-9]' 
 ld0 at ataraid0 vendtype 5 unit 0: Intel MatrixRAID ATA RAID-0 array
 ld0: 233 GB, 30416 cyl, 255 head, 63 sec, 512 bytes/sect x 488636416 sectors
 ld1 at ataraid0 vendtype 5 unit 1: Intel MatrixRAID ATA RAID-1 array
 ld1: 117 GB, 15306 cyl, 255 head, 63 sec, 512 bytes/sect x 245905408 sectors
 $

 /dev/ld0a         226G       1.0G       214G   0% /mnt/raid0
 /dev/ld1a         114G       949M       107G   0% /mnt/raid1

 Index: ld_ataraid.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/ata/ld_ataraid.c,v
 retrieving revision 1.32
 diff -b -u -p -r1.32 ld_ataraid.c
 --- ld_ataraid.c	16 Sep 2008 11:45:30 -0000	1.32
 +++ ld_ataraid.c	17 Sep 2008 13:13:08 -0000
 @@ -90,6 +90,12 @@ struct ld_ataraid_softc {
  	struct vnode *sc_vnodes[ATA_RAID_MAX_DISKS];

  	void	(*sc_iodone)(struct buf *);
 +
 +	pool_cache_t sc_cbufpool;
 +
 +	SIMPLEQ_HEAD(, cbuf) sc_cbufq;
 +
 +	void	*sc_sih_cookie;
  };

  static int	ld_ataraid_match(struct device *, struct cfdata *, void *);
 @@ -97,6 +103,10 @@ static void	ld_ataraid_attach(struct dev

  static int	ld_ataraid_dump(struct ld_softc *, void *, int, int);

 +static int	cbufpool_ctor(void *, void *, int);
 +static void	cbufpool_dtor(void *, void *);
 +
 +static void	ld_ataraid_start_vstrategy(void *);
  static int	ld_ataraid_start_span(struct ld_softc *, struct buf *);

  static int	ld_ataraid_start_raid0(struct ld_softc *, struct buf *);
 @@ -113,9 +123,6 @@ static int	ld_ataraid_biodisk(struct ld_
  CFATTACH_DECL_NEW(ld_ataraid, sizeof(struct ld_ataraid_softc),
      ld_ataraid_match, ld_ataraid_attach, NULL, NULL);

 -static int ld_ataraid_initialized;
 -static struct pool ld_ataraid_cbufpl;
 -
  struct cbuf {
  	struct buf	cb_buf;		/* new I/O buf */
  	struct buf	*cb_obp;	/* ptr. to original I/O buf */
 @@ -127,8 +134,8 @@ struct cbuf {
  #define	CBUF_IODONE	0x00000001	/* I/O is already successfully done */
  };

 -#define	CBUF_GET()	pool_get(&ld_ataraid_cbufpl, PR_NOWAIT);
 -#define	CBUF_PUT(cbp)	pool_put(&ld_ataraid_cbufpl, (cbp))
 +#define	CBUF_GET()	pool_cache_get(sc->sc_cbufpool, PR_NOWAIT);
 +#define	CBUF_PUT(cbp)	pool_cache_put(sc->sc_cbufpool, (cbp))

  static int
  ld_ataraid_match(device_t parent, cfdata_t match, void *aux)
 @@ -151,11 +158,10 @@ ld_ataraid_attach(device_t parent, devic

  	ld->sc_dv = self;

 -	if (ld_ataraid_initialized == 0) {
 -		ld_ataraid_initialized = 1;
 -		pool_init(&ld_ataraid_cbufpl, sizeof(struct cbuf), 0,
 -		    0, 0, "ldcbuf", NULL, IPL_BIO);
 -	}
 +	sc->sc_cbufpool	= pool_cache_init(sizeof(struct cbuf), 0,
 +	    0, 0, "ldcbuf", NULL, IPL_BIO, cbufpool_ctor, cbufpool_dtor, sc);
 +	sc->sc_sih_cookie = softint_establish(SOFTINT_BIO,
 +	    ld_ataraid_start_vstrategy, sc);

  	sc->sc_aai = aai;	/* this data persists */

 @@ -246,9 +252,33 @@ ld_ataraid_attach(device_t parent, devic
  		panic("%s: bioctl registration failed\n",
  		    device_xname(ld->sc_dv));
  #endif
 +	SIMPLEQ_INIT(&sc->sc_cbufq);
  	ldattach(ld);
  }

 +static int
 +cbufpool_ctor(void *arg, void *obj, int flags)
 +{
 +	struct ld_ataraid_softc *sc = arg;
 +	struct ld_softc *ld = &sc->sc_ld;
 +	struct cbuf *cbp = obj;
 +
 +	/* We release/reacquire the spinlock before calling buf_init() */
 +	mutex_exit(&ld->sc_mutex);
 +	buf_init(&cbp->cb_buf);
 +	mutex_enter(&ld->sc_mutex);
 +
 +	return 0;
 +}
 +
 +static void
 +cbufpool_dtor(void *arg, void *obj)
 +{
 +	struct cbuf *cbp = obj;
 +
 +	buf_destroy(&cbp->cb_buf);
 +}
 +
  static struct cbuf *
  ld_ataraid_make_cbuf(struct ld_ataraid_softc *sc, struct buf *bp,
      u_int comp, daddr_t bn, void *addr, long bcount)
 @@ -257,8 +287,7 @@ ld_ataraid_make_cbuf(struct ld_ataraid_s

  	cbp = CBUF_GET();
  	if (cbp == NULL)
 -		return (NULL);
 -	buf_init(&cbp->cb_buf);
 +		return NULL;
  	cbp->cb_buf.b_flags = bp->b_flags;
  	cbp->cb_buf.b_oflags = bp->b_oflags;
  	cbp->cb_buf.b_cflags = bp->b_cflags;
 @@ -277,7 +306,24 @@ ld_ataraid_make_cbuf(struct ld_ataraid_s
  	cbp->cb_other = NULL;
  	cbp->cb_flags = 0;

 -	return (cbp);
 +	return cbp;
 +}
 +
 +static void
 +ld_ataraid_start_vstrategy(void *arg)
 +{
 +	struct ld_ataraid_softc *sc = arg;
 +	struct cbuf *cbp;
 +
 +	while ((cbp = SIMPLEQ_FIRST(&sc->sc_cbufq)) != NULL) {
 +		SIMPLEQ_REMOVE_HEAD(&sc->sc_cbufq, cb_q);
 +		if ((cbp->cb_buf.b_flags & B_READ) == 0) {
 +			mutex_enter(&cbp->cb_buf.b_vp->v_interlock);
 +			cbp->cb_buf.b_vp->v_numoutput++;
 +			mutex_exit(&cbp->cb_buf.b_vp->v_interlock);
 +		}
 +		VOP_STRATEGY(cbp->cb_buf.b_vp, &cbp->cb_buf);
 +	}
  }

  static int
 @@ -286,7 +332,6 @@ ld_ataraid_start_span(struct ld_softc *l
  	struct ld_ataraid_softc *sc = (void *) ld;
  	struct ataraid_array_info *aai = sc->sc_aai;
  	struct ataraid_disk_info *adi;
 -	SIMPLEQ_HEAD(, cbuf) cbufq;
  	struct cbuf *cbp;
  	char *addr;
  	daddr_t bn;
 @@ -294,7 +339,6 @@ ld_ataraid_start_span(struct ld_softc *l
  	u_int comp;

  	/* Allocate component buffers. */
 -	SIMPLEQ_INIT(&cbufq);
  	addr = bp->b_data;

  	/* Find the first component. */
 @@ -316,12 +360,11 @@ ld_ataraid_start_span(struct ld_softc *l
  		cbp = ld_ataraid_make_cbuf(sc, bp, comp, bn, addr, rcount);
  		if (cbp == NULL) {
  			/* Free the already allocated component buffers. */
 -			while ((cbp = SIMPLEQ_FIRST(&cbufq)) != NULL) {
 -				SIMPLEQ_REMOVE_HEAD(&cbufq, cb_q);
 -				buf_destroy(&cbp->cb_buf);
 +			while ((cbp = SIMPLEQ_FIRST(&sc->sc_cbufq)) != NULL) {
 +				SIMPLEQ_REMOVE_HEAD(&sc->sc_cbufq, cb_q);
  				CBUF_PUT(cbp);
  			}
 -			return (EAGAIN);
 +			return EAGAIN;
  		}

  		/*
 @@ -331,31 +374,22 @@ ld_ataraid_start_span(struct ld_softc *l
  		adi = &aai->aai_disks[++comp];
  		bn = 0;

 -		SIMPLEQ_INSERT_TAIL(&cbufq, cbp, cb_q);
 +		SIMPLEQ_INSERT_TAIL(&sc->sc_cbufq, cbp, cb_q);
  		addr += rcount;
  	}

  	/* Now fire off the requests. */
 -	while ((cbp = SIMPLEQ_FIRST(&cbufq)) != NULL) {
 -		SIMPLEQ_REMOVE_HEAD(&cbufq, cb_q);
 -		if ((cbp->cb_buf.b_flags & B_READ) == 0) {
 -			mutex_enter(&cbp->cb_buf.b_vp->v_interlock);
 -			cbp->cb_buf.b_vp->v_numoutput++;
 -			mutex_exit(&cbp->cb_buf.b_vp->v_interlock);
 -		}
 -		VOP_STRATEGY(cbp->cb_buf.b_vp, &cbp->cb_buf);
 -	}
 +	softint_schedule(sc->sc_sih_cookie);

 -	return (0);
 +	return 0;
  }

  static int
  ld_ataraid_start_raid0(struct ld_softc *ld, struct buf *bp)
  {
 -	struct ld_ataraid_softc *sc = (void *) ld;
 +	struct ld_ataraid_softc *sc = (void *)ld;
  	struct ataraid_array_info *aai = sc->sc_aai;
  	struct ataraid_disk_info *adi;
 -	SIMPLEQ_HEAD(, cbuf) cbufq;
  	struct cbuf *cbp, *other_cbp;
  	char *addr;
  	daddr_t bn, cbn, tbn, off;
 @@ -363,10 +397,9 @@ ld_ataraid_start_raid0(struct ld_softc *
  	u_int comp;
  	const int read = bp->b_flags & B_READ;
  	const int mirror = aai->aai_level & AAI_L_RAID1;
 -	int error;
 +	int error = 0;

  	/* Allocate component buffers. */
 -	SIMPLEQ_INIT(&cbufq);
  	addr = bp->b_data;
  	bn = bp->b_rawblkno;

 @@ -417,14 +450,13 @@ resource_shortage:
  			error = EAGAIN;
  free_and_exit:
  			/* Free the already allocated component buffers. */
 -			while ((cbp = SIMPLEQ_FIRST(&cbufq)) != NULL) {
 -				SIMPLEQ_REMOVE_HEAD(&cbufq, cb_q);
 -				buf_destroy(&cbp->cb_buf);
 +			while ((cbp = SIMPLEQ_FIRST(&sc->sc_cbufq)) != NULL) {
 +				SIMPLEQ_REMOVE_HEAD(&sc->sc_cbufq, cb_q);
  				CBUF_PUT(cbp);
  			}
 -			return (error);
 +			return error;
  		}
 -		SIMPLEQ_INSERT_TAIL(&cbufq, cbp, cb_q);
 +		SIMPLEQ_INSERT_TAIL(&sc->sc_cbufq, cbp, cb_q);
  		if (mirror && !read && comp < aai->aai_width) {
  			comp += aai->aai_width;
  			adi = &aai->aai_disks[comp];
 @@ -433,7 +465,8 @@ free_and_exit:
  				    comp, cbn, addr, rcount);
  				if (other_cbp == NULL)
  					goto resource_shortage;
 -				SIMPLEQ_INSERT_TAIL(&cbufq, other_cbp, cb_q);
 +				SIMPLEQ_INSERT_TAIL(&sc->sc_cbufq,
 +				    other_cbp, cb_q);
  				other_cbp->cb_other = cbp;
  				cbp->cb_other = other_cbp;
  			}
 @@ -443,17 +476,9 @@ free_and_exit:
  	}

  	/* Now fire off the requests. */
 -	while ((cbp = SIMPLEQ_FIRST(&cbufq)) != NULL) {
 -		SIMPLEQ_REMOVE_HEAD(&cbufq, cb_q);
 -		if ((cbp->cb_buf.b_flags & B_READ) == 0) {
 -			mutex_enter(&cbp->cb_buf.b_vp->v_interlock);
 -			cbp->cb_buf.b_vp->v_numoutput++;
 -			mutex_exit(&cbp->cb_buf.b_vp->v_interlock);
 -		}
 -		VOP_STRATEGY(cbp->cb_buf.b_vp, &cbp->cb_buf);
 -	}
 +	softint_schedule(sc->sc_sih_cookie);

 -	return (0);
 +	return error;
  }

  /*
 @@ -530,7 +555,6 @@ ld_ataraid_iodone_raid0(struct buf *vbp)
  			other_cbp->cb_flags |= CBUF_IODONE;
  	}
  	count = cbp->cb_buf.b_bcount;
 -	buf_destroy(&cbp->cb_buf);
  	CBUF_PUT(cbp);

  	if (other_cbp != NULL)
 ---- END ----

From: "Greg A. Woods" <woods@planix.com>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: Juan RP <xtraeme@gmail.com>
Subject: Re: kern/38273 panic: LOCKDEBUG, "lockdebug_barrier: spin lock held", from ld_ataraid_start_raid0()
Date: Wed, 17 Sep 2008 14:38:05 -0400

 --pgp-sign-Multipart_Wed_Sep_17_14:38:04_2008-1
 Content-Type: text/plain; charset=US-ASCII
 Content-Transfer-Encoding: quoted-printable

 At Wed, 17 Sep 2008 15:00:08 +0000 (UTC), Juan RP wrote:
 Subject: Re: kern/38273 panic: LOCKDEBUG, "lockdebug_barrier: spin lock hel=
 d", from ld_ataraid_start_raid0()
 >=20
 >  please try this patch. I've built a NetBSD/amd64 full release with
 >  src on RAID0 and obj/destdir/tools on RAID1 with ataraid(4)
 >  successfully with all debugging options enabled.

 Well now at least it immediately dumps during boot:

 ataraid0: found 1 RAID volume
 ld0 at ataraid0 vendtype 1 unit 0: Adaptec ATA RAID-1 array
 ld0: ld_ataraid_attach(): ld unit 0 (ld->sc_dv =3D 0xd187064c)
 ld0: ldattach(): unit 0
 ld0: 186 GB, 24321 cyl, 255 head, 63 sec, 512 bytes/sect x 390721536 sectors
 rnd: ld0 attached as an entropy source (collecting)
 Mutex error: mutex_vector_exit: exiting unheld spin mutex

 lock address : 0x00000000d18706ac type     :               spin
 initialized  : 0x00000000c01f46f9
 shared holds :                  0 exclusive:                  0
 shares wanted:                  0 exclusive:                  0
 current cpu  :                  1 last held:                  1
 current lwp  : 0x00000000c0b349e0 last held: 000000000000000000
 last locked  : 0x00000000c01f411b unlocked : 0x00000000c01f4160
 owner field  : 0x0000000000000600 wait/spin:                0/1

 panic: LOCKDEBUG
 Begin traceback...
 copyright(d1832202,0,0,c0bf4350,d1832200,d1832198,d18706ac,c0b34724,0,c01f4=
 11b) at 0xc0aa6f11
 ?(d18321c1,0,0,c0b34c2c,d1832280,d1832218,d187068c,c0b34730,c0b349e0,c01f43=
 82) at 0
 End traceback...
 fatal breakpoint trap in supervisor mode
 trap type 1 code 0 eip c05ae38c cs 8 eflags 246 cr2 0 ilevel 6
 Stopped in pid 0.1 (system) at  netbsd:breakpoint+0x4:  popl    %ebp
 db{1}>=20
 db{1}> trace
 breakpoint(c0aac84e,1,fffe,c0b09d65,c04e4e50,1,0,0,c0d296f4,8) at netbsd:br=
 eakpoint+0x4
 panic(c0aab734,c0aa6f2b,c0886a15,c0aa6f11,c0aa6f11,1886a15,6,d18706ac,0,cc4=
 cc000) at netbsd:panic+0x1b8
 lockdebug_abort1(c0aa6f11,1,c0886a15,c0aa6f11,cc4cc0ec,c31f7e8c,c0d29758,c0=
 4b2754,d18706ac,c0886a15) at netbsd:lockdebug_abort1+0xbb
 mutex_abort(d18706ac,c0886a15,c0aa6f11,c04b25c2,d18706ac,0,c0d29768,c05dff3=
 5,d18706ac,c31f7e8c) at netbsd:mutex_abort+0x2e
 mutex_vector_exit(d18706ac,c31f7e8c,c0d297a8,c04e1a48,d187064c,c31f7e8c,2,0=
 ,d0e80328,c0bd1e00) at netbsd:mutex_vector_exit+0x1c4
 cbufpool_ctor(d187064c,c31f7e8c,2,0,d0e80328,c0bd1e00,c0d297a8,c0d297d8,6,c=
 c4daf00) at netbsd:cbufpool_ctor+0x15
 pool_cache_get_slow(0,2,c0d297d8,0,1,0,6,c084a799,0,0) at netbsd:pool_cache=
 _get_slow+0x218
 pool_cache_get_paddr(cc4cc000,2,0,0,0,0,0,0,200,0) at netbsd:pool_cache_get=
 _paddr+0x180
 ld_ataraid_make_cbuf(0,0,d078e000,200,8,cc4c3df4,d1870744,c32d4f64,d18321c0=
 ,d18706ac) at netbsd:ld_ataraid_make_cbuf+0x38
 ld_ataraid_start_raid0(d187064c,c31e7a14,c0d298d8,c0848696,c0bf85c0,d18706a=
 c,d1870650,0,c31e7a14,d187064c) at netbsd:ld_ataraid_start_raid0+0x1be
 ldstart(6,0,c0d29918,c04ddfda,8,1,0,c051cd51,0,c32dca00) at netbsd:ldstart+=
 0x6d
 ldstrategy(c31e7a14,c0b349e0,c0d29958,c0848696,0,c0d29a24,c0d29968,c05b1579=
 ,c01f4b70,c32dca00) at netbsd:ldstrategy+0x165
 disk_read_sectors(c01f4b70,c32dca00,c31e7a14,0,1,c32dca00,c0d299f8,c05b15bb=
 ,c051cd7d,c0b349e0) at netbsd:disk_read_sectors+0x4b
 read_sector(c051cd7d,c0b349e0,0,c0bfc320,0,400,c0bfc320,c04b264a,8,c05b1d20=
 ) at netbsd:read_sector+0x29
 scan_mbr(400,0,c04d848d,0,0,0,c0d29a38,0,d0874340,c32dca00) at netbsd:scan_=
 mbr+0x2b
 readdisklabel(3,c01f4b70,c32dca00,c32dc800,d187068c,3,0,c086d800,1303,c0b34=
 9e0) at netbsd:readdisklabel+0xdf
 ldopen(1303,1,6000,c0b349e0,6000,1,6,d1843a10,d0ce8e20,0) at netbsd:ldopen+=
 0x15a
 bdev_open(1303,1,6000,c0b349e0,0,0,c0bfc460,c04b264a,c0b349e0,1303) at netb=
 sd:bdev_open+0x99
 spec_open(c0d29b24,20002,c0d29b38,c052fcb8,d1843a10,c0887fa0,d1843a10,1,fff=
 fffff,0) at netbsd:spec_open+0x2c8
 VOP_OPEN(d1843a10,1,ffffffff,c04b607d,d180de10,0,1303,d1843a10,0,0) at netb=
 sd:VOP_OPEN+0x6c
 dkwedge_discover(d1870650,c01f4b50,1,c01f4af0,ff,3f,200,1749f000,0,2) at ne=
 tbsd:dkwedge_discover+0xfc
 ldattach(d187064c,c05df630,0,d187064c,c0ac6514,0,6,c05dff20,c05dff10,d18706=
 4c) at netbsd:ldattach+0x333
 ld_ataraid_attach(cc4bd6b4,d180de10,c32d4f00,c0d29c74,c32d4f00,d180dc20,d18=
 0de10,c32d4f00,c0d29c74,cc4bd6b4) at netbsd:ld_ataraid_attach+0x25f
 config_attach_loc(cc4bd6b4,c0b20ec8,c0d29c74,c32d4f00,c05dcba0,c04d4180,c0d=
 29c7a,615872de,cc4bd6b4,1) at netbsd:config_attach_loc+0x173
 ataraid_attach(0,cc4bd6b4,0,d0e80938,0,d0e80938,c0d29cc8,c05dce3c,c0b38dac,=
 c0b38d80) at netbsd:ataraid_attach+0x87
 config_attach_pseudo(c0b38dac,c0b38d80,0,28,2a,c0490920,c0d29ce8,c04d4759,0=
 ,c0bd3eb0) at netbsd:config_attach_pseudo+0x35
 ata_raid_finalize(0,c0bd3eb0,0,0,0,c0490920,c0d29d38,c0490fb1,0,0) at netbs=
 d:ata_raid_finalize+0x4c
 config_finalize(0,0,14,0,0,c0490920,0,0,c0bf13c4,0) at netbsd:config_finali=
 ze+0x99
 main(0,c01002cd,0,0,0,0,0,0,0,0) at netbsd:main+0x271
 db{1}>=20
 db{1}> x/I 0x00000000c01f411b
 netbsd:ldstart+0x1b:    testl   %esi,%esi
 db{1}> x/I 0x00000000c01f4160
 netbsd:ldstart+0x60:    movl    %esi,0x4(%esp)
 db{1}>=20
 =20
 --=20
 						Greg A. Woods
 						Planix, Inc.

 <woods@planix.com>     +1 416 489-5852 x122     http://www.planix.com/

 --pgp-sign-Multipart_Wed_Sep_17_14:38:04_2008-1
 Content-Type: application/pgp-signature
 Content-Transfer-Encoding: 7bit

 -----BEGIN PGP SIGNATURE-----
 Version: PGPfreeware 5.0i for non-commercial use
 MessageID: JsfNEbL2eM6MzzgvB30VL6IL0/wHwt3j

 iQA/AwUBSNFOjGZ9cbd4v/R/EQJfSACgtqJ2tXPxFHU9UK2LYNSaMJjKHh0AoMc0
 C3CuTldkbucaSIbOfb/z6cAp
 =9cdV
 -----END PGP SIGNATURE-----

 --pgp-sign-Multipart_Wed_Sep_17_14:38:04_2008-1--

From: "Greg A. Woods; Planix, Inc." <woods@planix.ca>
To: gnats-bugs@NetBSD.org
Cc: Juan RP <xtraeme@gmail.com>
Subject: Re: kern/38273 panic: LOCKDEBUG, "lockdebug_barrier: spin lock held", from ld_ataraid_start_raid0()
Date: Wed, 17 Sep 2008 14:58:54 -0400

 This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
 --Apple-Mail-15-705082021
 Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
 Content-Transfer-Encoding: 7bit


 On 17-Sep-08, at 2:40 PM, Greg A. Woods wrote:
 > Well now at least it immediately dumps during boot:
 >
 > ataraid0: found 1 RAID volume
 > ld0 at ataraid0 vendtype 1 unit 0: Adaptec ATA RAID-1 array
 > ld0: ld_ataraid_attach(): ld unit 0 (ld->sc_dv =3D 0xd187064c)
 > ld0: ldattach(): unit 0
 > ld0: 186 GB, 24321 cyl, 255 head, 63 sec, 512 bytes/sect x 390721536  
 > sectors
 > rnd: ld0 attached as an entropy source (collecting)
 > Mutex error: mutex_vector_exit: exiting unheld spin mutex
 >
 > lock address : 0x00000000d18706ac type     :               spin
 > initialized  : 0x00000000c01f46f9
 > shared holds :                  0 exclusive:                  0
 > shares wanted:                  0 exclusive:                  0
 > current cpu  :                  1 last held:                  1
 > current lwp  : 0x00000000c0b349e0 last held: 000000000000000000
 > last locked  : 0x00000000c01f411b unlocked : 0x00000000c01f4160
 > owner field  : 0x0000000000000600 wait/spin:                0/1
 >
 > panic: LOCKDEBUG


 FYI, "current lwp" there appears to be the swapper thread, according  
 to "ps /l"

 -- 
 					Greg A. Woods; Planix, Inc.
 					<woods@planix.ca>


 --Apple-Mail-15-705082021
 content-type: application/pgp-signature; x-mac-type=70674453;
 	name=PGP.sig
 content-description: This is a digitally signed message part
 content-disposition: inline; filename=PGP.sig
 content-transfer-encoding: 7bit

 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.8 (Darwin)

 iD8DBQFI0VNuZn1xt3i/9H8RApUKAJ9oivGHLmCQN2uPEe1wGZVydeGAZgCgrm9z
 TTiXzlu088N/hEaoVDns3nA=
 =dYjg
 -----END PGP SIGNATURE-----

 --Apple-Mail-15-705082021--

From: "Juan Romero Pardines" <xtraeme@gmail.com>
To: "NetBSD GNATS" <gnats-bugs@netbsd.org>
Cc: 
Subject: Re: kern/38273 panic: LOCKDEBUG, "lockdebug_barrier: spin lock held", from ld_ataraid_start_raid0()
Date: Wed, 17 Sep 2008 21:22:38 +0200

 2008/9/17 Greg A. Woods; Planix, Inc. <woods@planix.ca>:
 >
 > On 17-Sep-08, at 2:40 PM, Greg A. Woods wrote:
 >>
 >> Well now at least it immediately dumps during boot:
 >>
 >> ataraid0: found 1 RAID volume
 >> ld0 at ataraid0 vendtype 1 unit 0: Adaptec ATA RAID-1 array
 >> ld0: ld_ataraid_attach(): ld unit 0 (ld->sc_dv =3D 0xd187064c)
 >> ld0: ldattach(): unit 0
 >> ld0: 186 GB, 24321 cyl, 255 head, 63 sec, 512 bytes/sect x 390721536
 >> sectors
 >> rnd: ld0 attached as an entropy source (collecting)
 >> Mutex error: mutex_vector_exit: exiting unheld spin mutex
 >>
 >> lock address : 0x00000000d18706ac type     :               spin
 >> initialized  : 0x00000000c01f46f9
 >> shared holds :                  0 exclusive:                  0
 >> shares wanted:                  0 exclusive:                  0
 >> current cpu  :                  1 last held:                  1
 >> current lwp  : 0x00000000c0b349e0 last held: 000000000000000000
 >> last locked  : 0x00000000c01f411b unlocked : 0x00000000c01f4160
 >> owner field  : 0x0000000000000600 wait/spin:                0/1
 >>
 >> panic: LOCKDEBUG
 >
 >
 > FYI, "current lwp" there appears to be the swapper thread, according to "ps
 > /l"

 Do you have any local changes in ld.c? I'd like to know why the mutex hasn't
 been acquired in ldstart().

From: "Greg A. Woods" <woods@planix.com>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: Juan Romero Pardines <xtraeme@gmail.com>
Subject: Re: kern/38273 panic: LOCKDEBUG, "lockdebug_barrier: spin lock held", from ld_ataraid_start_raid0()
Date: Wed, 17 Sep 2008 15:59:14 -0400

 --pgp-sign-Multipart_Wed_Sep_17_15:59:14_2008-1
 Content-Type: text/plain; charset=US-ASCII
 Content-Transfer-Encoding: quoted-printable

 At Wed, 17 Sep 2008 19:25:02 +0000 (UTC), Juan Romero Pardines wrote:
 Subject: Re: kern/38273 panic: LOCKDEBUG, "lockdebug_barrier: spin lock hel=
 d", from ld_ataraid_start_raid0()
 >=20
 >  Do you have any local changes in ld.c? I'd like to know why the mutex ha=
 sn't
 >  been acquired in ldstart().
 =20
 Indeed I do.  I have the following changes, the meat of which were
 suggested on August 25 by Juergen Hannken-Illjes in response to this PR,
 but in private mail.

 As I recall they changed the behaviour from a panic to then just causing
 the newfs process to hang, so they sort of improved things, but not really.

 I'll try reverting them now and see what happens, but to my naive eyes
 they do look logical and proper.


 Index: sys/dev/ld.c
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
 RCS file: /cvs/master/m-NetBSD/main/src/sys/dev/ld.c,v
 retrieving revision 1.63
 diff -u -r1.63 ld.c
 --- sys/dev/ld.c	9 Sep 2008 12:45:39 -0000	1.63
 +++ sys/dev/ld.c	10 Sep 2008 19:00:20 -0000
 @@ -106,6 +106,10 @@
  		return;
  	}
 =20
 +#ifdef DIAGNOSTIC
 +	aprint_normal_dev(sc->sc_dv, "ldattach(): unit %d\n", device_unit(sc->sc_=
 dv));
 +#endif
 +
  	/* Initialise and attach the disk structure. */
  	disk_init(&sc->sc_dk, device_xname(sc->sc_dv), &lddkdriver);
  	disk_attach(&sc->sc_dk);
 @@ -650,19 +654,16 @@
 =20
  	while (sc->sc_queuecnt < sc->sc_maxqueuecnt) {
  		/* See if there is work to do. */
 -		if ((bp =3D BUFQ_PEEK(sc->sc_bufq)) =3D=3D NULL)
 +		if ((bp =3D BUFQ_GET(sc->sc_bufq)) =3D=3D NULL)
  			break;
 =20
  		disk_busy(&sc->sc_dk);
  		sc->sc_queuecnt++;
 =20
 -		if (__predict_true((error =3D (*sc->sc_start)(sc, bp)) =3D=3D 0)) {
 -			/*
 -			 * The back-end is running the job; remove it from
 -			 * the queue.
 -			 */
 -			(void) BUFQ_GET(sc->sc_bufq);
 -		} else  {
 +		mutex_exit(&sc->sc_mutex);
 +		error =3D (*sc->sc_start)(sc, bp);
 +		mutex_enter(&sc->sc_mutex);
 +		if (__predict_false(error !=3D 0)) {
  			disk_unbusy(&sc->sc_dk, 0, (bp->b_flags & B_READ));
  			sc->sc_queuecnt--;
  			if (error =3D=3D EAGAIN) {
 @@ -674,9 +675,9 @@
  				 * XXX We might consider a watchdog timer
  				 * XXX to make sure we are kicked into action.
  				 */
 +				BUFQ_PUT(sc->sc_bufq, bp);
  				break;
  			} else {
 -				(void) BUFQ_GET(sc->sc_bufq);
  				bp->b_error =3D error;
  				bp->b_resid =3D bp->b_bcount;
  				mutex_exit(&sc->sc_mutex);
 @@ -918,5 +919,6 @@
  ld_config_interrupts (struct device *d)
  {
  	struct ld_softc *sc =3D device_private(d);
 +
  	dkwedge_discover(&sc->sc_dk);
  }


 --=20
 						Greg A. Woods
 						Planix, Inc.

 <woods@planix.com>     +1 416 489-5852 x122     http://www.planix.com/

 --pgp-sign-Multipart_Wed_Sep_17_15:59:14_2008-1
 Content-Type: application/pgp-signature
 Content-Transfer-Encoding: 7bit

 -----BEGIN PGP SIGNATURE-----
 Version: PGPfreeware 5.0i for non-commercial use
 MessageID: 1c0CBXcRxn01uJjHfGHshesz4zeQ7Le+

 iQA/AwUBSNFhkmZ9cbd4v/R/EQKbyQCglIeoBb8hVhZENclWvxLZK/rZQO4AnRlz
 slWd7z7OI7HdS8LIEbH/t5Mn
 =tBwh
 -----END PGP SIGNATURE-----

 --pgp-sign-Multipart_Wed_Sep_17_15:59:14_2008-1--

From: "Juan Romero Pardines" <xtraeme@gmail.com>
To: "NetBSD GNATS" <gnats-bugs@netbsd.org>
Cc: 
Subject: Re: kern/38273 panic: LOCKDEBUG, "lockdebug_barrier: spin lock held", from ld_ataraid_start_raid0()
Date: Wed, 17 Sep 2008 22:01:36 +0200

 2008/9/17 Greg A. Woods <woods@planix.com>:
 > At Wed, 17 Sep 2008 19:25:02 +0000 (UTC), Juan Romero Pardines wrote:
 > Subject: Re: kern/38273 panic: LOCKDEBUG, "lockdebug_barrier: spin lock held", from ld_ataraid_start_raid0()
 >>
 >>  Do you have any local changes in ld.c? I'd like to know why the mutex hasn't
 >>  been acquired in ldstart().
 >
 > Indeed I do.  I have the following changes, the meat of which were
 > suggested on August 25 by Juergen Hannken-Illjes in response to this PR,
 > but in private mail.
 >
 > As I recall they changed the behaviour from a panic to then just causing
 > the newfs process to hang, so they sort of improved things, but not really.
 >
 > I'll try reverting them now and see what happens, but to my naive eyes
 > they do look logical and proper.

 That explained why my patch caused this panic. Try it without those changes
 and let me know if you have any problem.

From: "Greg A. Woods" <woods@planix.com>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: Juan Romero Pardines <xtraeme@gmail.com>
Subject: Re: kern/38273 panic: LOCKDEBUG, "lockdebug_barrier: spin lock held", from ld_ataraid_start_raid0()
Date: Wed, 17 Sep 2008 16:15:13 -0400

 --pgp-sign-Multipart_Wed_Sep_17_16:15:13_2008-1
 Content-Type: text/plain; charset=US-ASCII
 Content-Transfer-Encoding: quoted-printable

 At Wed, 17 Sep 2008 19:25:02 +0000 (UTC), Juan Romero Pardines wrote:
 Subject: Re: kern/38273 panic: LOCKDEBUG, "lockdebug_barrier: spin lock hel=
 d", from ld_ataraid_start_raid0()
 >=20
 >  Do you have any local changes in ld.c? I'd like to know why the mutex ha=
 sn't
 >  been acquired in ldstart().

 OK, after reverting Juergen's changes to ld.c and back to having just
 your changes to ld_ataraid.c, I'm back to booting but then getting a
 panic (the same one?) while running "newfs /dev/rld0a":

 Mutex error: lockdebug_barrier: spin lock held

 lock address : 0x00000000d18706ac type     :               spin
 initialized  : 0x00000000c01f46d9
 shared holds :                  0 exclusive:                  1
 shares wanted:                  0 exclusive:                  0
 current cpu  :                  0 last held:                  0
 current lwp  : 0x00000000d1f00d20 last held: 0x00000000d1f00d20
 last locked  : 0x00000000c05dff28 unlocked : 0x00000000c05dff15
 owner field  : 0x0000000000010600 wait/spin:                0/1

 panic: LOCKDEBUG
 fatal breakpoint trap in supervisor mode
 trap type 1 code 0 eip c05ae36c cs 8 eflags 246 cr2 d1c19000 ilevel 6
 Stopped in pid 862.1 (newfs) at netbsd:breakpoint+0x4:  popl    %ebp
 db{0}> trace
 breakpoint(c0b097af,d1cfb538,c0b37940,c04e50ff,6,1,0,0,d1cfb538,8) at netbs=
 d:breakpoint+0x4
 panic(c0aab714,c0aa6f0b,c0886e50,c0aab72d,b5cc,1000001,6,0,0,c0bd71b4) at n=
 etbsd:panic+0x1b8
 lockdebug_abort1(c0aab72d,1,0,0,c0bf4080,c0bf408c,d1cfb59c,c0848676,c0bf408=
 0,c0bd4a54) at netbsd:lockdebug_abort1+0xbb
 mutex_vector_enter(c0bd71b4,c04e395c,0,c0bd70c0,c0bd70c0,20,d1cfb64c,c04e38=
 0c,c0bd70c0,0) at netbsd:mutex_vector_enter+0x437
 pool_cache_invalidate(c0bd70c0,0,d1cfb64c,c04de965,59,bd1928,c0bd4a54,0,c04=
 d769f,d1f00d20) at netbsd:pool_cache_invalidate+0x20
 pool_reclaim(c0bd70c0,c04d769f,0,0,c0bd719c,c0bd1928,d1cfb69c,c04d76c8,c0bd=
 719c,c0bd70c0) at netbsd:pool_reclaim+0x4c
 pool_reclaim_callback(c0bd719c,c0bd70c0,0,c04b25a2,c0bd188e,34,0,0,c90a8260=
 ,c0bd1880) at netbsd:pool_reclaim_callback+0x25
 callback_run_roundrobin(c0bd1928,0,20000,d1cfb6f8,0,ffffffff,ffffffff,20000=
 ,e01727,2) at netbsd:callback_run_roundrobin+0x48
 uvm_map_prepare(c0bd1880,c31d9000,20000,0,ffffffff,ffffffff,20000,e01727,d1=
 cfb730,c0bd19c4) at netbsd:uvm_map_prepare+0x19b
 uvm_map(c0bd1880,d1cfb794,20000,0,ffffffff,ffffffff,20000,e01727,c0bd19c6,c=
 04e0d8c) at netbsd:uvm_map+0xbd
 km_vacache_alloc(c0bd1950,2,d1cfb7cc,c0848676,2,cc4cc074,d1cfb7dc,c0b37a58,=
 0,c0bd19c4) at netbsd:km_vacache_alloc+0x64
 pool_grow(c0bd19c4,cc4cc074,d1cfb80c,c0848676,d1831d80,c0bd19c4,6,c0b37a58,=
 cc4cc074,0) at netbsd:pool_grow+0x2b
 pool_get(c0bd1950,2,6,cc4cc074,0,cc4cc000,cc4cc074,c04b25a2,cc4cc076,c04e0d=
 8c) at netbsd:pool_get+0x5b
 uvm_km_alloc_poolpage_cache(c0bd1880,0,d1cfb88c,c0848676,2,cc4cc0f4,d1cfb8a=
 c,c0b37a58,cc4cc0f4,cc4cc074) at netbsd:uvm_km_alloc_poolpage_cache+0x4c
 pool_grow(cc4cc074,d1f00d20,6,cc4cc0f4,0,cc4cc074,cc4cc0f4,c04b25a2,cc4cc0f=
 6,c04e19e3) at netbsd:pool_grow+0x2b
 pool_get(cc4cc000,2,0,0,cc4cc180,6,d1cfb94c,d1cfb94c,6,cc4cc180) at netbsd:=
 pool_get+0x5b
 pool_cache_get_slow(0,2,d1cf0010,0,1,0,6,c084a779,0,0) at netbsd:pool_cache=
 _get_slow+0x1ed
 pool_cache_get_paddr(cc4cc000,2,0,0,9c9f000,0,0,0,0,0) at netbsd:pool_cache=
 _get_paddr+0x180
 ld_ataraid_make_cbuf(9c9f000,0,cc504000,0,0,efe009c9,d1870744,c32d4f64,d187=
 06ae,0) at netbsd:ld_ataraid_make_cbuf+0x38
 ld_ataraid_start_raid0(d187064c,c31e84b4,d1cfba4c,10000,c32dca00,d18706ac,d=
 1870650,0,c31e84b4,d187064c) at netbsd:ld_ataraid_start_raid0+0x1be
 ldstart(6,c31e84b4,0,0,c04b510b,101,0,d1870db8,0,c32dca00) at netbsd:ldstar=
 t+0x62
 ldstrategy(c31e84b4,10000,10000,1,0,d1870da4,d1870db8,d1870dbc,bbbdd000,d1f=
 00d20) at netbsd:ldstrategy+0x165
 physio(c01f4b50,0,4500,0,c01f3950,d1cfbc5c,d1cfbb4c,c04d8090,4500,d1cfbc5c)=
  at netbsd:physio+0x251
 ldwrite(4500,d1cfbc5c,10,8,d1b0a720,d1cfbc5c,6,d1f00d20,d1cfbbe4,d1b0a680) =
 at netbsd:ldwrite+0x35
 cdev_write(4500,d1cfbc5c,10,2,d1b0a720,d180c000,d1cfbb8c,c0524857,d1b0a720,=
 1) at netbsd:cdev_write+0x70
 spec_write(d1cfbbe4,40000,c0888600,d1b0a680,2,20002,d1cfbbfc,c052fc98,c0888=
 100,d1b0a680) at netbsd:spec_write+0xa0
 VOP_WRITE(d1b0a680,d1cfbc5c,10,cc4c6b40,8,d1f00d20,0,16,40000,bbbbd000) at =
 netbsd:VOP_WRITE+0x6c
 vn_write(d1e1da00,d1cfbcc4,d1cfbc5c,cc4c6b40,0,1,d1cfbc8c,c0537f7c,d1cfbc6c=
 ,d1e1db00) at netbsd:vn_write+0xb1
 dofilewrite(4,d1e1da00,bbbbd000,40000,d1cfbcc4,0,d1cfbd28,c04de072,0,0) at =
 netbsd:dofilewrite+0x75
 sys_pwrite(d1f00d20,d1cfbd00,d1cfbd28,d1cfbd40,c059cc5f,d1841120,1,4,bbbbd0=
 00,40000) at netbsd:sys_pwrite+0xc7
 syscall(d1cfbd48,b3,ab,45001f,bfbf001f,0,9c9eee0,bfbfc8b8,0,0) at netbsd:sy=
 scall+0xab
 db{0}> ps /l
  PID         LID S     FLAGS       STRUCT LWP *               NAME WAIT
 >862       >   1 7         4           d1f00d20              newfs
  721           1 3        84           d1e74380                ksh pause
  295           1 3        84           d1e745e0                ksh pause
  296           1 3        84           d1e74840              xterm select
  303           1 3        84           d1e74aa0               rshd select
  299           1 3        84           d1e74d00              getty ttyraw
  297           1 3        84           d1e63100              getty ttyraw
  291           1 3        84           d1e63360              getty ttyraw
  304           1 3        84           d1e635c0              getty ttyraw
  302           1 3        84           d1e63820              getty ttyraw
  293           1 3        84           d1e63a80              getty ttyraw
  280           1 3        84           d1e63ce0              getty ttyraw
  294           1 3        84           d1847580              getty ttyraw
  287           1 3        84           d18477e0              getty ttyraw
  268           1 3        84           d1931a60               cron nanoslp
  279           1 3        84           d19310e0              inetd kqueue
  237           1 3        84           d1931800               ntpd pause
  156           1 3        84           d1931cc0          mount_mfs mfsidl
  114           1 3        84           d18470c0            syslogd kqueue
  1             1 3        84           cc4d97c0               init wait
  0            45 3       204           d19315a0            physiod physiod
               44 3       204           d1847a40        vmem_rehash vmem_reh=
 ash
               43 3       204           d1847ca0           aiodoned aiodoned
               42 3       204           cc4d90a0            ioflush syncer
               41 3       204           cc4d9300           pgdaemon pgdaemon
               40 3       204           cc4d9560          cryptoret crypto_w=
 ait
               39 3       204           cc4d8a00          atapibus0 sccomp
               38 3       204           cc4d8c60               usb2 usbevt
               37 3       204           cc4d5060               usb1 usbevt
               36 3       204           cc4d82e0         usbtask-dr usbtsk
               35 3       204           cc4d8080         usbtask-hc usbtsk
               34 3       204           cc4d9c80               usb0 usbevt
               33 3       204           cc4d9a20            acpitz0 acpitz0
               24 3       204           cc4d52c0               iic0 iicintr
               23 3       204           cc4d5520            atabus3 atath
               22 3       204           cc4d5780            atabus2 atath
               21 3       204           cc4d59e0            atabus1 atath
               20 3       204           cc4d5c40            atabus0 atath
               19 3       204           cc4d2040               pms0 pmsreset
               18 3       204           cc4d22a0               apm1 apmev
               17 3       204           cc4d2500            xcall/1 xcall
               16 1       204           cc4d2760          softser/1
               15 7       204           cc4d29c0          softclk/1
               14 1       204           cc4d2c20          softbio/1
               13 1       204           cc4ca020          softnet/1
               12 7       205           cc4ca280             idle/1
               11 3       204           cc4ca4e0             sysmon smtaskq
               10 3       204           cc4ca740           pmfevent pmfevent
                9 3       204           cc4ca9a0            cachegc cachegc
                8 3       204           cc4cac00              vrele vrele
                7 3       204           cc4c7000            xcall/0 xcall
                6 1       204           cc4c7260          softser/0
                5 1       204           cc4c74c0          softclk/0
                4 1       204           cc4c7720          softbio/0
                3 1       204           cc4c7980          softnet/0
                2 1       205           cc4c7be0             idle/0
                1 3       204           c0b349e0            swapper schedule
 db{0}> x /I 0x00000000c05dff28
 netbsd:cbufpool_ctor+0x28:      xorl    %eax,%eax
 db{0}> x /I 0x00000000d1f00d20
 0xd1f00d20:     addb    %al,0(%eax)
 db{0}> x /I 0x00000000c05dff15
 netbsd:cbufpool_ctor+0x15:      movl    0xc(%ebp),%eax
 db{0}>=20

 --=20
 						Greg A. Woods
 						Planix, Inc.

 <woods@planix.com>     +1 416 489-5852 x122     http://www.planix.com/

 --pgp-sign-Multipart_Wed_Sep_17_16:15:13_2008-1
 Content-Type: application/pgp-signature
 Content-Transfer-Encoding: 7bit

 -----BEGIN PGP SIGNATURE-----
 Version: PGPfreeware 5.0i for non-commercial use
 MessageID: SGfpa7pIspU5Nl55jDLMggx7kV4IcjlC

 iQA/AwUBSNFlUWZ9cbd4v/R/EQINxwCg3ZLhhdVmcqQ0Yg+kBFWaJQnwbwQAn3Kw
 A6xuGpYWrpJyfWxGE6JMDlNR
 =Trrt
 -----END PGP SIGNATURE-----

 --pgp-sign-Multipart_Wed_Sep_17_16:15:13_2008-1--

From: Juan RP <xtraeme@gmail.com>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: 
Subject: Re: kern/38273 panic: LOCKDEBUG,
 "lockdebug_barrier: spin lock held", from ld_ataraid_start_raid0()
Date: Thu, 18 Sep 2008 16:32:57 +0200

 On Wed, 17 Sep 2008 16:15:13 -0400
 "Greg A. Woods" <woods@planix.com> wrote:

 > At Wed, 17 Sep 2008 19:25:02 +0000 (UTC), Juan Romero Pardines wrote:
 > Subject: Re: kern/38273 panic: LOCKDEBUG, "lockdebug_barrier: spin lock held", from ld_ataraid_start_raid0()
 > > 
 > >  Do you have any local changes in ld.c? I'd like to know why the mutex hasn't
 > >  been acquired in ldstart().
 > 
 > OK, after reverting Juergen's changes to ld.c and back to having just
 > your changes to ld_ataraid.c, I'm back to booting but then getting a
 > panic (the same one?) while running "newfs /dev/rld0a":
 > 
 > Mutex error: lockdebug_barrier: spin lock held
 > 
 > lock address : 0x00000000d18706ac type     :               spin
 > initialized  : 0x00000000c01f46d9
 > shared holds :                  0 exclusive:                  1
 > shares wanted:                  0 exclusive:                  0
 > current cpu  :                  0 last held:                  0
 > current lwp  : 0x00000000d1f00d20 last held: 0x00000000d1f00d20
 > last locked  : 0x00000000c05dff28 unlocked : 0x00000000c05dff15
 > owner field  : 0x0000000000010600 wait/spin:                0/1

 I'd like to know why you are getting this, while I don't. I've been stressing 
 this code with debugging options turned on all the time on my core2duo and
 still couldn't make it crash... even after copying/removing lots of gigabytes.

 Could you try to modify the following in cbufpool_ctor():ld_ataraid.c:

         /* We release/reacquire the spinlock before calling buf_init() */
         mutex_exit(&ld->sc_mutex);
         buf_init(&cbp->cb_buf);
         mutex_enter(&ld->sc_mutex);

 to

         /* We release/reacquire the spinlock before calling buf_init() */
         mutex_exit(&ld->sc_mutex);
 	KERNEL_LOCK(1, NULL);
         buf_init(&cbp->cb_buf);
 	KERNEL_UNLOCK_ONE(NULL);
         mutex_enter(&ld->sc_mutex);

 Just to be sure, you don't have more local changes in ld or ataraid?

From: "Greg A. Woods" <woods@planix.com>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: Juan RP <xtraeme@gmail.com>
Subject: Re: kern/38273 panic: LOCKDEBUG, "lockdebug_barrier: spin lock held", from ld_ataraid_start_raid0()
Date: Thu, 18 Sep 2008 15:51:39 -0400

 --pgp-sign-Multipart_Thu_Sep_18_15:51:38_2008-1
 Content-Type: multipart/mixed;
  boundary="Multipart_Thu_Sep_18_15:51:38_2008-1"

 --Multipart_Thu_Sep_18_15:51:38_2008-1
 Content-Type: text/plain; charset=US-ASCII
 Content-Transfer-Encoding: quoted-printable

 At Thu, 18 Sep 2008 14:35:02 +0000 (UTC), Juan RP wrote:
 Subject: Re: kern/38273 panic: LOCKDEBUG, "lockdebug_barrier: spin lock hel=
 d", from ld_ataraid_start_raid0()
 >=20
 >  I'd like to know why you are getting this, while I don't. I've been stre=
 ssing=20
 >  this code with debugging options turned on all the time on my core2duo a=
 nd
 >  still couldn't make it crash... even after copying/removing lots of giga=
 bytes.
 > =20
 >  Could you try to modify the following in cbufpool_ctor():ld_ataraid.c:
 > =20
 >          /* We release/reacquire the spinlock before calling buf_init() */
 >          mutex_exit(&ld->sc_mutex);
 >          buf_init(&cbp->cb_buf);
 >          mutex_enter(&ld->sc_mutex);
 > =20
 >  to
 > =20
 >          /* We release/reacquire the spinlock before calling buf_init() */
 >          mutex_exit(&ld->sc_mutex);
 >  	KERNEL_LOCK(1, NULL);
 >          buf_init(&cbp->cb_buf);
 >  	KERNEL_UNLOCK_ONE(NULL);
 >          mutex_enter(&ld->sc_mutex);

 Done, with no apparent change in behaviour....  Again this was some ways
 into the "newfs /dev/rld0a"

 >  Just to be sure, you don't have more local changes in ld or ataraid?

 Only the added debug printfs in the _attach() routines as seen in the
 dmesg output below.

 --=20
 						Greg A. Woods
 						Planix, Inc.

 <woods@planix.com>     +1 416 489-5852 x122     http://www.planix.com/


 --Multipart_Thu_Sep_18_15:51:38_2008-1
 Content-Type: text/plain; charset=US-ASCII
 Content-Transfer-Encoding: quoted-printable

 NetBSD 4.99.72 (GENERIC) #9: Thu Sep 18 15:13:07 EDT 2008
 ...
 wd0 at atabus1 drive 0: <IBM-DTLA-307030>
 wd0: 29314 MB, 59560 cyl, 16 head, 63 sec, 512 bytes/sect x 60036480 sectors
 rnd: wd0 attached as an entropy source (collecting)
 wd1 at atabus2 drive 0: <WDC WD2000JD-00HBB0>
 wd1: 186 GB, 387621 cyl, 16 head, 63 sec, 512 bytes/sect x 390721968 sectors
 rnd: wd1 attached as an entropy source (collecting)
 wd2 at atabus3 drive 0: <WDC WD2000JD-00HBB0>
 wd2: 186 GB, 387621 cyl, 16 head, 63 sec, 512 bytes/sect x 390721968 sectors
 rnd: wd2 attached as an entropy source (collecting)
 ...
 ataraid0: found 1 RAID volume
 ld0 at ataraid0 vendtype 1 unit 0: Adaptec ATA RAID-1 array
 ld0: ld_ataraid_attach(): ld unit 0 (ld->sc_dv =3D 0xd187064c)
 ld0: ldattach(): unit 0
 ld0: 186 GB, 24321 cyl, 255 head, 63 sec, 512 bytes/sect x 390721536 sectors
 rnd: ld0 attached as an entropy source (collecting)
 ...
 Mutex error: lockdebug_barrier: spin lock held

 lock address : 0x00000000d18706ac type     :               spin
 initialized  : 0x00000000c01f46d9
 shared holds :                  0 exclusive:                  1
 shares wanted:                  0 exclusive:                  0
 current cpu  :                  1 last held:                  1
 current lwp  : 0x00000000d1931a60 last held: 0x00000000d1931a60
 last locked  : 0x00000000c05e0016 unlocked : 0x00000000c05dffe5
 owner field  : 0x0000000000010600 wait/spin:                0/1

 panic: LOCKDEBUG
 fatal breakpoint trap in supervisor mode
 trap type 1 code 0 eip c05ae43c cs 8 eflags 246 cr2 8089000 ilevel 6
 Stopped in pid 862.1 (newfs) at netbsd:breakpoint+0x4:  popl    %ebp
 db{1}> x/I 0x00000000c05e0016
 netbsd:cbufpool_ctor+0x46:      addl    $0x14,%esp
 db{1}> x/I 0x00000000c05dffe5
 netbsd:cbufpool_ctor+0x15:      movl    $0x1,0(%esp)
 db{1}> x/I 0x00000000d1931a60
 0xd1931a60:     addb    %al,0(%eax)
 db{1}> ps /l
  PID         LID S     FLAGS       STRUCT LWP *               NAME WAIT
 >862       >   1 7         4           d1931a60              newfs
  721           1 3        84           d1e1b5e0                ksh pause
  295           1 3        84           d1e1b840                ksh pause
  294           1 3        84           d1e1baa0              xterm select
  303           1 3        84           d1e1bd00               rshd select
  299           1 3        84           d1e14100              getty ttyraw
  297           1 3        84           d1e14360              getty ttyraw
  291           1 3        84           d1e145c0              getty ttyraw
  304           1 3        84           d1e14820              getty ttyraw
  296           1 3        84           d1e14a80              getty ttyraw
  280           1 3        84           d1e14ce0              getty ttyraw
  287           1 3        84           d19310e0              getty ttyraw
  293           1 3        84           d1847580              getty ttyraw
  302           1 3        84           d18477e0              getty ttyraw
  268           1 3        84           d1931340               cron nanoslp
  279           1 3        84           cc4d87a0              inetd kqueue
  237           1 3        84           d1931800               ntpd pause
  156           1 3        84           d1931cc0          mount_mfs mfsidl
  114           1 3        84           d18470c0            syslogd kqueue
  1             1 3        84           cc4d97c0               init wait
  0            45 3       204           d19315a0            physiod physiod
               44 3       204           d1847a40        vmem_rehash vmem_reh=
 ash
               43 3       204           d1847ca0           aiodoned aiodoned
               42 3       204           cc4d90a0            ioflush syncer
               41 3       204           cc4d9300           pgdaemon pgdaemon
               40 3       204           cc4d9560          cryptoret crypto_w=
 ait
               39 3       204           cc4d8a00          atapibus0 sccomp
               38 3       204           cc4d8c60               usb2 usbevt
               37 3       204           cc4d5060               usb1 usbevt
               36 3       204           cc4d82e0         usbtask-dr usbtsk
               35 3       204           cc4d8080         usbtask-hc usbtsk
               34 3       204           cc4d9c80               usb0 usbevt
               33 3       204           cc4d9a20            acpitz0 acpitz0
               24 3       204           cc4d52c0               iic0 iicintr
               23 3       204           cc4d5520            atabus3 atath
               22 3       204           cc4d5780            atabus2 atath
               21 3       204           cc4d59e0            atabus1 atath
               20 3       204           cc4d5c40            atabus0 atath
               19 3       204           cc4d2040               pms0 pmsreset
               18 3       204           cc4d22a0               apm1 apmev
               17 3       204           cc4d2500            xcall/1 xcall
               16 1       204           cc4d2760          softser/1
               15 1       204           cc4d29c0          softclk/1
               14 1       204           cc4d2c20          softbio/1
               13 1       204           cc4ca020          softnet/1
               12 1       205           cc4ca280             idle/1
               11 3       204           cc4ca4e0             sysmon smtaskq
               10 3       204           cc4ca740           pmfevent pmfevent
                9 3       204           cc4ca9a0            cachegc cachegc
                8 3       204           cc4cac00              vrele vrele
                7 3       204           cc4c7000            xcall/0 xcall
                6 1       204           cc4c7260          softser/0
                5 1       204           cc4c74c0          softclk/0
                4 1       204           cc4c7720          softbio/0
                3 1       204           cc4c7980          softnet/0
                2 7       205           cc4c7be0             idle/0
                1 3       204           c0b349e0            swapper schedule
 db{1}> trace
 breakpoint(c0b09c77,d1959538,c3246800,c04e51cf,6,1,0,0,d1959538,8) at netbs=
 d:breakpoint+0x4
 panic(c0aaba3c,c0aa7233,c0887090,c0aaba55,95cc,1000001,6,0,0,c0bd71b4) at n=
 etbsd:panic+0x1b8
 lockdebug_abort1(c0aaba55,1,0,0,c0bf4080,c0bf408c,d195959c,c08488b6,c0bf408=
 0,c0bd4a54) at netbsd:lockdebug_abort1+0xbb
 mutex_vector_enter(c0bd71b4,c04e3a2c,0,c0bd70c0,c0bd70c0,20,d195964c,c04e38=
 dc,c0bd70c0,0) at netbsd:mutex_vector_enter+0x437
 pool_cache_invalidate(c0bd70c0,0,d195964c,c04dea35,55,bd1928,c0bd4a54,0,c04=
 d776f,d1931a60) at netbsd:pool_cache_invalidate+0x20
 pool_reclaim(c0bd70c0,c04d776f,0,0,c0bd719c,c0bd1928,d195969c,c04d7798,c0bd=
 719c,c0bd70c0) at netbsd:pool_reclaim+0x4c
 pool_reclaim_callback(c0bd719c,c0bd70c0,0,c04b2672,c0bd188e,34,0,0,c90a8260=
 ,c0bd1880) at netbsd:pool_reclaim_callback+0x25
 callback_run_roundrobin(c0bd1928,0,20000,d19596f8,0,ffffffff,ffffffff,20000=
 ,e01727,2) at netbsd:callback_run_roundrobin+0x48
 uvm_map_prepare(c0bd1880,c31d9000,20000,0,ffffffff,ffffffff,20000,e01727,d1=
 959730,c0bd19c4) at netbsd:uvm_map_prepare+0x19b
 uvm_map(c0bd1880,d1959794,20000,0,ffffffff,ffffffff,20000,e01727,c0bd19c6,c=
 04e0e5c) at netbsd:uvm_map+0xbd
 km_vacache_alloc(c0bd1950,2,d19597cc,c08488b6,2,cc4cc074,d19597dc,c3246918,=
 0,c0bd19c4) at netbsd:km_vacache_alloc+0x64
 pool_grow(c0bd19c4,cc4cc074,d195980c,c08488b6,d1831d80,c0bd19c4,6,c3246918,=
 cc4cc074,0) at netbsd:pool_grow+0x2b
 pool_get(c0bd1950,2,6,cc4cc074,0,cc4cc000,cc4cc074,c04b2672,cc4cc076,c04e0e=
 5c) at netbsd:pool_get+0x5b
 uvm_km_alloc_poolpage_cache(c0bd1880,0,d195988c,c08488b6,2,cc4cc0f4,d19598a=
 c,c3246918,cc4cc0f4,cc4cc074) at netbsd:uvm_km_alloc_poolpage_cache+0x4c
 pool_grow(cc4cc074,d1931a60,6,cc4cc0f4,0,cc4cc074,cc4cc0f4,c04b2672,cc4cc0f=
 6,c04e1ab3) at netbsd:pool_grow+0x2b
 pool_get(cc4cc000,2,0,0,1,c31e8b6c,d195995c,d195994c,6,cc4daf00) at netbsd:=
 pool_get+0x5b
 pool_cache_get_slow(0,2,d1950010,0,1,0,6,c084a9b9,0,0) at netbsd:pool_cache=
 _get_slow+0x1ed
 pool_cache_get_paddr(cc4cc000,2,0,0,9c9f000,0,0,0,0,0) at netbsd:pool_cache=
 _get_paddr+0x180
 ld_ataraid_make_cbuf(9c9f000,0,cc504000,0,0,efe009c9,d1870744,c32d4f64,d187=
 06ae,0) at netbsd:ld_ataraid_make_cbuf+0x38
 ld_ataraid_start_raid0(d187064c,c31e8b6c,d1959a4c,10000,c32dca00,d18706ac,d=
 1870650,0,c31e8b6c,d187064c) at netbsd:ld_ataraid_start_raid0+0x1be
 ldstart(6,c31e8b6c,0,0,c04b51db,101,0,d1870d68,0,c32dca00) at netbsd:ldstar=
 t+0x62
 ldstrategy(c31e8b6c,10000,10000,1,0,d1870d54,d1870d68,d1870d6c,bbbdd000,d19=
 31a60) at netbsd:ldstrategy+0x165
 physio(c01f4b50,0,4500,0,c01f3950,d1959c5c,d1959b4c,c04d8160,4500,d1959c5c)=
  at netbsd:physio+0x251
 ldwrite(4500,d1959c5c,10,8,d1b0a720,d1959c5c,6,d1931a60,d1959be4,d1b0a680) =
 at netbsd:ldwrite+0x35
 cdev_write(4500,d1959c5c,10,2,d1b0a720,d180c000,d1959b8c,c0524927,d1b0a720,=
 1) at netbsd:cdev_write+0x70
 spec_write(d1959be4,40000,c0888840,d1b0a680,2,20002,d1959bfc,c052fd68,c0888=
 340,d1b0a680) at netbsd:spec_write+0xa0
 VOP_WRITE(d1b0a680,d1959c5c,10,cc4c69c0,0,10,0,16,40000,bbbbd000) at netbsd=
 :VOP_WRITE+0x6c
 vn_write(d1e11a80,d1959cc4,d1959c5c,cc4c69c0,0,80ac000,d1959c8c,c053804c,d1=
 959c6c,d1e11b80) at netbsd:vn_write+0xb1
 dofilewrite(4,d1e11a80,bbbbd000,40000,d1959cc4,0,d1959d28,c04de142,0,d1e11b=
 80) at netbsd:dofilewrite+0x75
 sys_pwrite(d1931a60,d1959d00,d1959d28,d1959d40,c059cd2f,d1841ae0,2,4,bbbbd0=
 00,40000) at netbsd:sys_pwrite+0xc7
 syscall(d1959d48,b3,ab,1f,1f,0,9c9eee0,bfbfc8b8,0,0) at netbsd:syscall+0xab
 db{1}>=20


 --Multipart_Thu_Sep_18_15:51:38_2008-1--

 --pgp-sign-Multipart_Thu_Sep_18_15:51:38_2008-1
 Content-Type: application/pgp-signature
 Content-Transfer-Encoding: 7bit

 -----BEGIN PGP SIGNATURE-----
 Version: PGPfreeware 5.0i for non-commercial use
 MessageID: i/fTtNMbSgsd/mThbr4qlYT6+HYK9CGh

 iQA/AwUBSNKxSmZ9cbd4v/R/EQIRJgCgvZMbWk/aAhKMcwkhUvDf7bqDHtwAoLpA
 C7kawFNUwo9nIVB/YkVz+Jsb
 =6EQ+
 -----END PGP SIGNATURE-----

 --pgp-sign-Multipart_Thu_Sep_18_15:51:38_2008-1--

From: "Greg A. Woods" <woods@planix.com>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: Juan RP <xtraeme@gmail.com>
Subject: Re: kern/38273 panic: LOCKDEBUG, "lockdebug_barrier: spin lock held", from ld_ataraid_start_raid0()
Date: Thu, 18 Sep 2008 15:57:39 -0400

 --pgp-sign-Multipart_Thu_Sep_18_15:57:38_2008-1
 Content-Type: text/plain; charset=US-ASCII
 Content-Transfer-Encoding: quoted-printable

 FYI, with yesterday's kernel the reboot from DDB hung, and after sending
 a BREAK I got the following backtrace.

 db{0}> reboot
 syncing disks... [-- break #1 sent -- `\z' -- Thu Sep 18 14:53:48 2008]
 fatal breakpoint trap in supervisor mode
 trap type 1 code 0 eip c05ae36c cs 8 eflags 202 cr2 d1c19000 ilevel 8
 Stopped in pid 0.9 (system) at  netbsd:breakpoint+0x4:  popl    %ebp
 db{0}> trace
 breakpoint(0,3f8,5,0,c0bd1e60,d0e80bcc,d0d0cf6c,c32c4000,c32c5000,800) at n=
 etbsd:breakpoint+0x4
 comintr(d0e80ac0,d098dc20,2f0fe20f,70ff10b,ef00f8f,2f0e0d8f,f0fb20f,50ff20f=
 ,ff00f0d,f0e850f) at netbsd:comintr+0x575
 DDB lost frame for netbsd:Xintr_ioapic_edge10+0xa9, trying 0xd0d0cf74
 Xintr_ioapic_edge10() at netbsd:Xintr_ioapic_edge10+0xa9
 --- interrupt ---
 --- switch to interrupt stack ---
 lockdebug_unlocked(d0879940,c051e57e,0,c051e510,c0b37940,c0a,d098dcbc,c051e=
 57e,d0879940,64) at netbsd:lockdebug_unlocked+0x1e
 mutex_vector_exit(d0879940,64,d098dd0c,c051e9a6,9c3000,0,64,0,0,3) at netbs=
 d:mutex_vector_exit+0x1fa
 cache_unlock_cpus(9c3000,0,64,0,0,3,19800,0,8,246) at netbsd:cache_unlock_c=
 pus+0x2e
 cache_reclaim(0,0,64,cc4c4e40,c051ef10,0,0,c01002e1,cc4ca9a0,0) at netbsd:c=
 ache_reclaim+0x276
 cache_thread(cc4ca9a0,0,c01002cd,0,c01002cd,0,0,0,0,0) at netbsd:cache_thre=
 ad+0x25
 db{0}> reboot
 rebooting...


 Today's kernel, with your KERNEL_LOCK() change, simply hangs, presumably
 with NMIs blocked entirely as the BREAK is being ignored even after
 several minutes:

 db{1}> reboot
 syncing disks... [halt sent]
 [halt sent]
 [halt sent]
 [halt sent]

 guess I'll have to give it a hard reset on the front panel....

 --=20
 						Greg A. Woods
 						Planix, Inc.

 <woods@planix.com>     +1 416 489-5852 x122     http://www.planix.com/

 --pgp-sign-Multipart_Thu_Sep_18_15:57:38_2008-1
 Content-Type: application/pgp-signature
 Content-Transfer-Encoding: 7bit

 -----BEGIN PGP SIGNATURE-----
 Version: PGPfreeware 5.0i for non-commercial use
 MessageID: w9W8F3ciPkzLEB/fjRaKM6YXXsv51WTf

 iQA/AwUBSNKysmZ9cbd4v/R/EQL06ACfb33nbSspU+gTt5Fd+0aMCTiJtHwAoMDs
 5lyBctrZiK4q/AwkTHdv2WSq
 =cDHc
 -----END PGP SIGNATURE-----

 --pgp-sign-Multipart_Thu_Sep_18_15:57:38_2008-1--

From: Juan RP <xtraeme@gmail.com>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: 
Subject: Re: kern/38273 panic: LOCKDEBUG,
 "lockdebug_barrier: spin lock held", from ld_ataraid_start_raid0()
Date: Thu, 18 Sep 2008 22:07:13 +0200

 On Thu, 18 Sep 2008 15:51:39 -0400
 "Greg A. Woods" <woods@planix.com> wrote:

 > Done, with no apparent change in behaviour....  Again this was some ways
 > into the "newfs /dev/rld0a"

 I'm out of ideas then. No idea how we will fix this if we are calling
 buf_init() with the ld's spin lock held!

 Or ldstart() is wrong and it should not acquire the spin lock there
 (AFAIK is the correct way) or the only option is to release/reacquire
 the spin lock as I had done in the patch.

 Also as the spin lock is held, acquiring the adaptive mutex from v_interlock
 will also cause another "spin lock held" panic later on (which I addressed
 with the softint(9)).

From: "Greg A. Woods" <woods@planix.com>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: Juan RP <xtraeme@gmail.com>
Subject: Re: kern/38273 panic: LOCKDEBUG, "lockdebug_barrier: spin lock held", from ld_ataraid_start_raid0()
Date: Fri, 19 Sep 2008 16:20:17 -0400

 --pgp-sign-Multipart_Fri_Sep_19_16:20:17_2008-1
 Content-Type: text/plain; charset=US-ASCII
 Content-Transfer-Encoding: quoted-printable

 At Thu, 18 Sep 2008 20:10:05 +0000 (UTC), Juan RP wrote:
 Subject: Re: kern/38273 panic: LOCKDEBUG, "lockdebug_barrier: spin lock hel=
 d", from ld_ataraid_start_raid0()
 >=20
 >  I'm out of ideas then. No idea how we will fix this if we are calling
 >  buf_init() with the ld's spin lock held!
 > =20
 >  Or ldstart() is wrong and it should not acquire the spin lock there
 >  (AFAIK is the correct way) or the only option is to release/reacquire
 >  the spin lock as I had done in the patch.
 > =20
 >  Also as the spin lock is held, acquiring the adaptive mutex from v_inter=
 lock
 >  will also cause another "spin lock held" panic later on (which I address=
 ed
 >  with the softint(9)).

 Perhaps we should post some further details about these issues on
 tech-kern.

 I'm certainly not knowledgeable enough about the twisty new maze of
 locking necessary for disk drivers, especially not middle-layer drivers
 like this one, to be of much help sorting this out.  (I would like to
 learn, but I'd like to do it from a detailed design document but I don't
 think one currently exists.)

 Perhaps someone else cognizant of all the issues could lend a hand.

 --=20
 						Greg A. Woods
 						Planix, Inc.

 <woods@planix.com>     +1 416 489-5852 x122     http://www.planix.com/

 --pgp-sign-Multipart_Fri_Sep_19_16:20:17_2008-1
 Content-Type: application/pgp-signature
 Content-Transfer-Encoding: 7bit

 -----BEGIN PGP SIGNATURE-----
 Version: PGPfreeware 5.0i for non-commercial use
 MessageID: Fwt7bzkhML8lgF11HQPAOw/rMdRomRcv

 iQA/AwUBSNQJgWZ9cbd4v/R/EQIJZgCaAtFZ30VWG/W7OLB185RLo/k7IWcAoPxh
 dOIoxSqWMmVQ82FAnXssLuwv
 =xUHP
 -----END PGP SIGNATURE-----

 --pgp-sign-Multipart_Fri_Sep_19_16:20:17_2008-1--

From: "Greg A. Woods" <woods@planix.com>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: Juan RP <xtraeme@gmail.com>
Subject: Re: kern/38273 panic: LOCKDEBUG, "lockdebug_barrier: spin lock held", from ld_ataraid_start_raid0()
Date: Fri, 19 Sep 2008 16:27:39 -0400

 --pgp-sign-Multipart_Fri_Sep_19_16:26:58_2008-1
 Content-Type: text/plain; charset=US-ASCII
 Content-Transfer-Encoding: quoted-printable

 FYI, a straight 'dd if=3D/dev/rld0a of=3D/dev/null bs=3D2m' eventually
 (after about 10-20 seconds) triggers what looks like the same panic:

 Mutex error: lockdebug_barrier: spin lock held

 lock address : 0x00000000d18706ac type     :               spin
 initialized  : 0x00000000c01f46d9
 shared holds :                  0 exclusive:                  1
 shares wanted:                  0 exclusive:                  0
 current cpu  :                  1 last held:                  1
 current lwp  : 0x00000000d1f01d20 last held: 0x00000000d1f01d20
 last locked  : 0x00000000c05dffe6 unlocked : 0x00000000c05dffb5
 owner field  : 0x0000000000010600 wait/spin:                0/1

 panic: LOCKDEBUG
 fatal breakpoint trap in supervisor mode
 trap type 1 code 0 eip c05ae40c cs 8 eflags 246 cr2 80537c8 ilevel 6
 Stopped in pid 578.1 (dd) at    netbsd:breakpoint+0x4:  popl    %ebp
 db{1}> x/I 0x00000000c05dffe6
 netbsd:cbufpool_ctor+0x46:      addl    $0x14,%esp
 db{1}> x/I 0x00000000c05dffb5
 netbsd:cbufpool_ctor+0x15:      movl    $0x1,0(%esp)
 db{1}> trace
 breakpoint(c0b09eef,d1cfb548,c3247800,c04e51cf,6,1,0,0,d1cfb548,8) at netbs=
 d:breakpoint+0x4
 panic(c0aabc9c,c0aa7493,c08872d0,c0aabcb5,b5dc,1000001,6,0,0,c0bd81b4) at n=
 etbsd:panic+0x1b8
 lockdebug_abort1(c0aabcb5,1,0,0,c0bf5080,c0bf508c,d1cfb5ac,c0848b06,c0bf508=
 0,c0bd5a54) at netbsd:lockdebug_abort1+0xbb
 mutex_vector_enter(c0bd81b4,c04e3a2c,0,c0bd80c0,c0bd80c0,20,d1cfb65c,c04e38=
 dc,c0bd80c0,0) at netbsd:mutex_vector_enter+0x437
 pool_cache_invalidate(c0bd80c0,0,d1cfb65c,c04dea35,fc,bd2928,c0bd5a54,0,c04=
 d776f,d1f01d20) at netbsd:pool_cache_invalidate+0x20
 pool_reclaim(c0bd80c0,c04d776f,0,0,c0bd819c,c0bd2928,d1cfb6ac,c04d7798,c0bd=
 819c,c0bd80c0) at netbsd:pool_reclaim+0x4c
 pool_reclaim_callback(c0bd819c,c0bd80c0,0,c04b2672,c0bd288e,34,0,0,c8aca2b4=
 ,c0bd2880) at netbsd:pool_reclaim_callback+0x25
 callback_run_roundrobin(c0bd2928,0,20000,d1cfb708,0,ffffffff,ffffffff,20000=
 ,e01727,2) at netbsd:callback_run_roundrobin+0x48
 uvm_map_prepare(c0bd2880,c31da000,20000,0,ffffffff,ffffffff,20000,e01727,d1=
 cfb740,c0bd29c4) at netbsd:uvm_map_prepare+0x19b
 uvm_map(c0bd2880,d1cfb7a4,20000,0,ffffffff,ffffffff,20000,e01727,c0bd29c6,c=
 04e0e5c) at netbsd:uvm_map+0xbd
 km_vacache_alloc(c0bd2950,2,d1cfb7dc,c0848b06,2,cc4cc074,d1cfb7ec,c3247918,=
 0,c0bd29c4) at netbsd:km_vacache_alloc+0x64
 pool_grow(c0bd29c4,cc4cc074,d1cfb81c,c0848b06,d1832d80,c0bd29c4,6,c3247918,=
 cc4cc074,0) at netbsd:pool_grow+0x2b
 pool_get(c0bd2950,2,6,cc4cc074,0,cc4cc000,cc4cc074,c04b2672,cc4cc076,c04e0e=
 5c) at netbsd:pool_get+0x5b
 uvm_km_alloc_poolpage_cache(c0bd2880,0,d1cfb89c,c0848b06,2,cc4cc0f4,d1cfb8b=
 c,c3247918,cc4cc0f4,cc4cc074) at netbsd:uvm_km_alloc_poolpage_cache+0x4c
 pool_grow(cc4cc074,d1f01d20,6,cc4cc0f4,0,cc4cc074,cc4cc0f4,c04b2672,cc4cc0f=
 6,c04e1ab3) at netbsd:pool_grow+0x2b
 pool_get(cc4cc000,2,0,0,0,cc4cc000,96,d1cfb95c,6,cc4daf00) at netbsd:pool_g=
 et+0x5b
 pool_cache_get_slow(0,2,10,0,1,0,6,c084ac09,0,0) at netbsd:pool_cache_get_s=
 low+0x1ed
 pool_cache_get_paddr(cc4cc000,2,0,0,49f000,0,0,0,0,0) at netbsd:pool_cache_=
 get_paddr+0x180
 ld_ataraid_make_cbuf(49f000,0,cc4e0000,0,0,f0000049,d1870744,c32d5f64,d1870=
 6ae,0) at netbsd:ld_ataraid_make_cbuf+0x38
 ld_ataraid_start_raid0(d187064c,c3443564,d1cfba5c,10000,c32dda00,d18706ac,d=
 1870650,0,c3443564,d187064c) at netbsd:ld_ataraid_start_raid0+0x1be
 ldstart(6,c3443564,0,0,c04b51db,101,0,d1870db8,0,c32dda00) at netbsd:ldstar=
 t+0x62
 ldstrategy(c3443564,10000,10000,2,0,d1870da4,d1870db8,d1870dbc,8065000,d1f0=
 1d20) at netbsd:ldstrategy+0x165
 physio(c01f4b50,0,4500,100000,c01f3950,d1cfbc7c,d1cfbb5c,c04d8200,4500,d1cf=
 bc7c) at netbsd:physio+0x251
 ldread(4500,d1cfbc7c,0,c0524964,0,d1cfbc7c,6,d1cfbc04,20001,d1b0a680) at ne=
 tbsd:ldread+0x38
 cdev_read(4500,d1cfbc7c,0,1,d1f01d20,10,6,1,d1b0a720,d180d000) at netbsd:cd=
 ev_read+0x70
 spec_read(d1cfbc04,d1f01d20,c0888a80,d1b0a680,1,20001,d1cfbc1c,c052fd28,c08=
 88540,d1b0a680) at netbsd:spec_read+0x234
 VOP_READ(d1b0a680,d1cfbc7c,0,cc4c6c00,cc4c4700,0,d1cfbc6c,16,200000,8065000=
 ) at netbsd:VOP_READ+0x6c
 vn_read(d1e1ea80,d1e1ea80,d1cfbc7c,cc4c6c00,1,d1e76524,cc4c4700,c04b26fa,8,=
 c04f7839) at netbsd:vn_read+0x93
 dofileread(3,d1e1ea80,8065000,200000,d1e1ea80,1,d1cfbd28,d1cfbd48,0,d1f01d2=
 0) at netbsd:dofileread+0x75
 sys_read(d1f01d20,d1cfbd00,d1cfbd28,d1cfbd40,c059ccef,d1841120,1,3,8065000,=
 200000) at netbsd:sys_read+0x6f
 syscall(d1cfbd48,b3,2000ab,bfbf001f,bfbf001f,0,200000,bfbfe418,0,0) at netb=
 sd:syscall+0xab
 db{1}> ps /l
  PID         LID S     FLAGS       STRUCT LWP *               NAME WAIT
 >578       >   1 7         4           d1f01d20                 dd
  730           1 3        84           d1e74380                ksh pause
  310           1 3        84           d1e745e0                ksh pause
  315           1 7         4           d1e74840              xterm
  281           1 3        84           d1e74aa0               rshd select
  299           1 3        84           d1e74d00              getty ttyraw
  297           1 3        84           d1e17100              getty ttyraw
  291           1 3        84           d1e17360              getty ttyraw
  304           1 3        84           d1e175c0              getty ttyraw
  287           1 3        84           d1e17820              getty ttyraw
  293           1 3        84           d1e17a80              getty ttyraw
  280           1 3        84           d1e17ce0              getty ttyraw
  296           1 3        84           d1847580              getty ttyraw
  302           1 3        84           d18477e0              getty ttyraw
  268           1 3        84           d19310e0               cron nanoslp
  279           1 3        84           d1847320              inetd kqueue
  237           1 3        84           d1931800               ntpd pause
  156           1 3        84           d1931cc0          mount_mfs mfsidl
  114           1 3        84           d18470c0            syslogd kqueue
  1             1 3        84           cc4d97c0               init wait
  0            45 3       204           d19315a0            physiod physiod
               44 3       204           d1847a40        vmem_rehash vmem_reh=
 ash
               43 3       204           d1847ca0           aiodoned aiodoned
               42 3       204           cc4d90a0            ioflush syncer
               41 3       204           cc4d9300           pgdaemon pgdaemon
               40 3       204           cc4d9560          cryptoret crypto_w=
 ait
               39 3       204           cc4d8a00          atapibus0 sccomp
               38 3       204           cc4d8c60               usb2 usbevt
               37 3       204           cc4d5060               usb1 usbevt
               36 3       204           cc4d82e0         usbtask-dr usbtsk
               35 3       204           cc4d8080         usbtask-hc usbtsk
               34 3       204           cc4d9c80               usb0 usbevt
               33 3       204           cc4d9a20            acpitz0 acpitz0
               24 3       204           cc4d52c0               iic0 iicintr
               23 3       204           cc4d5520            atabus3 atath
               22 3       204           cc4d5780            atabus2 atath
               21 3       204           cc4d59e0            atabus1 atath
               20 3       204           cc4d5c40            atabus0 atath
               19 3       204           cc4d2040               pms0 pmsreset
               18 3       204           cc4d22a0               apm1 apmev
               17 3       204           cc4d2500            xcall/1 xcall
               16 1       204           cc4d2760          softser/1
               15 1       204           cc4d29c0          softclk/1
               14 1       204           cc4d2c20          softbio/1
               13 1       204           cc4ca020          softnet/1
               12 1       205           cc4ca280             idle/1
               11 3       204           cc4ca4e0             sysmon smtaskq
               10 3       204           cc4ca740           pmfevent pmfevent
                9 3       204           cc4ca9a0            cachegc cachegc
                8 3       204           cc4cac00              vrele vrele
                7 3       204           cc4c7000            xcall/0 xcall
                6 1       204           cc4c7260          softser/0
                5 7       204           cc4c74c0          softclk/0
                4 1       204           cc4c7720          softbio/0
                3 1       204           cc4c7980          softnet/0
                2 1       205           cc4c7be0             idle/0
                1 3       204           c0b359e0            swapper schedule
 db{1}>=20


 (since this was just a read operation there was nothing dirty in the
 buffer cache, and so a reboot did _not_ hang on syncing disks)


 --=20
 						Greg A. Woods
 						Planix, Inc.

 <woods@planix.com>     +1 416 489-5852 x122     http://www.planix.com/

 --pgp-sign-Multipart_Fri_Sep_19_16:26:58_2008-1
 Content-Type: application/pgp-signature
 Content-Transfer-Encoding: 7bit

 -----BEGIN PGP SIGNATURE-----
 Version: PGPfreeware 5.0i for non-commercial use
 MessageID: pmUWCtzEd5ke94uK3VCNmspsYQYz7Mmd

 iQA/AwUBSNQLO2Z9cbd4v/R/EQIJKACgq3mv/VyQQWlrJlx6dZ/zaXKueHUAoP3C
 XyZToNwhZFh33zNix8OwQKSA
 =Yrou
 -----END PGP SIGNATURE-----

 --pgp-sign-Multipart_Fri_Sep_19_16:26:58_2008-1--

From: Juan RP <xtraeme@gmail.com>
To: NetBSD GNATS <gnats-bugs@NetBSD.org>
Cc: 
Subject: Re: kern/38273 panic: LOCKDEBUG,
 "lockdebug_barrier: spin lock held", from ld_ataraid_start_raid0()
Date: Sat, 20 Sep 2008 01:06:02 +0200

 On Fri, 19 Sep 2008 16:20:17 -0400
 "Greg A. Woods" <woods@planix.com> wrote:

 > Perhaps we should post some further details about these issues on
 > tech-kern.
 > 
 > I'm certainly not knowledgeable enough about the twisty new maze of
 > locking necessary for disk drivers, especially not middle-layer drivers
 > like this one, to be of much help sorting this out.  (I would like to
 > learn, but I'd like to do it from a detailed design document but I don't
 > think one currently exists.)
 > 
 > Perhaps someone else cognizant of all the issues could lend a hand.

 I did:

 http://mail-index.netbsd.org/tech-kern/2008/09/17/msg002734.html

 Nobody answered to this email yet.


From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/38273: panic: LOCKDEBUG, "lockdebug_barrier: spin lock
	held", from ld_ataraid_start_raid0()
Date: Wed, 17 Mar 2010 19:47:22 +0000

 Note that there is now a newer report in PR 42985, which may or may
 not reflect the same underlying problem. It's not clear to me if this
 PR should be closed or not, so for now I'll just make this
 crossreference. Maybe sometime we can get the problem(s) fixed...

 -- 
 David A. Holland
 dholland@netbsd.org

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.