NetBSD Problem Report #42205

From www@NetBSD.org  Tue Oct 20 16:07:41 2009
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id F2FF063B902
	for <gnats-bugs@gnats.netbsd.org>; Tue, 20 Oct 2009 16:07:40 +0000 (UTC)
Message-Id: <20091020160740.A3BFF63B877@www.NetBSD.org>
Date: Tue, 20 Oct 2009 16:07:40 +0000 (UTC)
From: 6bone@6bone.informatik.uni-leipzig.de
Reply-To: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Subject: kernel panic  at activated userquota
X-Send-Pr-Version: www-1.0

>Number:         42205
>Category:       kern
>Synopsis:       kernel panic  at activated userquota
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    bouyer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Oct 20 16:10:00 +0000 2009
>Closed-Date:    Wed Jan 27 21:38:06 +0000 2010
>Last-Modified:  Wed Jan 27 21:38:06 +0000 2010
>Originator:     Uwe Toenjes
>Release:        NetBSD 5.0_STABLE
>Organization:
University of Leipzig
>Environment:
NetBSD 6bone.informatik.uni-leipzig.de 5.0_STABLE NetBSD 5.0_STABLE (MYCONF) #0: Fri Oct 16 11:16:05 CEST 2009  root@6bone.informatik.uni-leipzig.de:/usr/obj/sys/arch/amd64/compile/MYCONF amd64
>Description:
at high i/o a kernel panic can occur if you are using quota. without userquota everything works fine.

uvm_fault(0xffffffff80c79620, 0x0, 1) -> e                                      
fatal page fault in supervisor mode                                             
trap type 6 code 0 rip ffffffff803e46b3 cs 8 rflags 10246 cr2  70 cpl 0 rsp fff0
kernel: page fault trap, code=0                                                 
Stopped in pid 0.61 (system) at netbsd:qsync+0x103:     movq    0x70(%rdx,%rax,8
),%r14                                                                          
db{0}> trace                                                                    
qsync() at netbsd:qsync+0x103                                                   
ffs_sync() at netbsd:ffs_sync+0x2eb                                             
VFS_SYNC() at netbsd:VFS_SYNC+0x33                                              
sync_fsync() at netbsd:sync_fsync+0x85                                          
VOP_FSYNC() at netbsd:VOP_FSYNC+0x71                                            
sched_sync() at netbsd:sched_sync+0x15d                                         
db{0}> show registers
ds          0xfd28                                                              
es          0x60a3                                                              
fs          0xaac0                                                              
gs          0x12                                                                
rdi         0xffff80007d9cfd28                                                  
rsi         0                                                                   
rbp         0xffff80007282ab20                                                  
rbx         0xffff80007d9cfd28                                                  
rdx         0                                                                   
rcx         0                                                                   
rax         0                                                                   
r8          0x7000000                                                           
r9          0x420                                                               
r10         0x2c0e57                                                            
r11         0x1                                                                 
r12         0xffff8000728164a0                                                  
r13         0xffff80008f4be758                                                  
r14         0                                                                   
r15         0xffff800087f56f28                                                  
rip         0xffffffff803e46b3  qsync+0x103                                     
cs          0x8                                                                 
rflags      0x10246                                                             
rsp         0xffff80007282aaf0                                                  
ss          0                                                                   
netbsd:qsync+0x103:     movq    0x70(%rdx,%rax,8),%r14                          
db{0}> continue                                                                 
uvm_fault(0xffffffff80c79620, 0x0, 1) -> e                                      
fatal page fault in supervisor mode                                             
trap type 6 code 0 rip ffffffff803e46b3 cs 8 rflags 10246 cr2  70 cpl 0 rsp fff0
kernel: page fault trap, code=0                                                 
Stopped in pid 0.61 (system) at netbsd:qsync+0x103:     movq    0x70(%rdx,%rax,8
),%r14                                                                          
db{0}> continue                                                                 
uvm_fault(0xffffffff80c79620, 0x0, 1) -> e                                      
fatal page fault in supervisor mode                                             
trap type 6 code 0 rip ffffffff803e46b3 cs 8 rflags 10246 cr2  70 cpl 0 rsp fff0
kernel: page fault trap, code=0                                                 
Stopped in pid 0.61 (system) at netbsd:qsync+0x103:     movq    0x70(%rdx,%rax,8
),%r14                        
>How-To-Repeat:
the panic occurs only by chance at high disk i/o with enabled userquota.
>Fix:

>Release-Note:

>Audit-Trail:
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/42205: kernel panic  at activated userquota
Date: Tue, 20 Oct 2009 21:39:02 +0200

 On Tue, Oct 20, 2009 at 04:10:00PM +0000, 6bone@6bone.informatik.uni-leipzig.de wrote:
 > >Number:         42205
 > >Category:       kern
 > >Synopsis:       kernel panic  at activated userquota
 > >Confidential:   no
 > >Severity:       serious
 > >Priority:       high
 > >Responsible:    kern-bug-people
 > >State:          open
 > >Class:          sw-bug
 > >Submitter-Id:   net
 > >Arrival-Date:   Tue Oct 20 16:10:00 +0000 2009
 > >Originator:     Uwe Toenjes
 > >Release:        NetBSD 5.0_STABLE
 > >Organization:
 > University of Leipzig
 > >Environment:
 > NetBSD 6bone.informatik.uni-leipzig.de 5.0_STABLE NetBSD 5.0_STABLE (MYCONF) #0: Fri Oct 16 11:16:05 CEST 2009  root@6bone.informatik.uni-leipzig.de:/usr/obj/sys/arch/amd64/compile/MYCONF amd64
 > >Description:
 > at high i/o a kernel panic can occur if you are using quota. without userquota everything works fine.
 > 
 > uvm_fault(0xffffffff80c79620, 0x0, 1) -> e                                      
 > fatal page fault in supervisor mode                                             
 > trap type 6 code 0 rip ffffffff803e46b3 cs 8 rflags 10246 cr2  70 cpl 0 rsp fff0
 > kernel: page fault trap, code=0                                                 
 > Stopped in pid 0.61 (system) at netbsd:qsync+0x103:     movq    0x70(%rdx,%rax,8
 > ),%r14                                                                          
 > db{0}> trace                                                                    
 > qsync() at netbsd:qsync+0x103                                                   
 > ffs_sync() at netbsd:ffs_sync+0x2eb                                             

 This seems to be:
 0xffffffff8025d7c3 is in qsync (/dsk/l1/misc/bouyer/netbsd-5/src/sys/ufs/ufs/ufs_quota.c:747).
 742                                     goto again;
 743                             }
 744                             continue;
 745                     }
 746                     for (i = 0; i < MAXQUOTAS; i++) {
 747                             dq = VTOI(vp)->i_dquot[i];

 (gdb) print &((struct inode *)0)->i_dquot[0]
 $1 = (struct dquot **) 0x70

 Another case where a vnode could be vlean'ed while vget drops the
 interlock to before getting the vn_lock.
 The attached patch may help, but it's untested and probably not the
 right way of fixing this.

 Any idea how to properly fix vget() anyone ?

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/42205: kernel panic  at activated userquota
Date: Tue, 20 Oct 2009 22:12:45 +0200

 --Bn2rw/3z4jIqBvZU
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 On Tue, Oct 20, 2009 at 09:39:02PM +0200, Manuel Bouyer wrote:
 > The attached patch may help, but it's untested and probably not the
 > right way of fixing this.

 Ops, I did it again.
 The patch is really attached this time

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

 --Bn2rw/3z4jIqBvZU
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename=diff

 Index: ufs_quota.c
 ===================================================================
 RCS file: /cvsroot/src/sys/ufs/ufs/ufs_quota.c,v
 retrieving revision 1.60.10.3
 diff -u -p -u -r1.60.10.3 ufs_quota.c
 --- ufs_quota.c	7 Aug 2009 05:59:44 -0000	1.60.10.3
 +++ ufs_quota.c	20 Oct 2009 19:38:13 -0000
 @@ -743,6 +743,13 @@ qsync(struct mount *mp)
  			}
  			continue;
  		}
 +		if (VTOI(vp) == NULL) {
 +			mutex_enter(&mntvnode_lock);
 +			vunmark(mvp);
 +			vlockmgr(vp->v_vnlock, LK_RELEASE);
 +			vrele(vp);
 +			goto again;
 +		}
  		for (i = 0; i < MAXQUOTAS; i++) {
  			dq = VTOI(vp)->i_dquot[i];
  			if (dq == NODQUOT)

 --Bn2rw/3z4jIqBvZU--

From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/42205: kernel panic  at activated userquota
Date: Wed, 21 Oct 2009 21:16:28 +0200 (CEST)

 On Tue, 20 Oct 2009, Manuel Bouyer wrote:
 >
 > Another case where a vnode could be vlean'ed while vget drops the
 > interlock to before getting the vn_lock.
 > The attached patch may help, but it's untested and probably not the
 > right way of fixing this.
 >
 > Any idea how to properly fix vget() anyone ?

 I applied the patch. There was no crash in the last 24h. But it needs some 
 more days to say if it now stable or not.

 Thank you for your efforts.

 Uwe

From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/42205: kernel panic  at activated userquota
Date: Mon, 26 Oct 2009 06:49:31 +0100 (CET)

 The patch works fine and solves the problem.


 Regards
 Uwe

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: 6bone@6bone.informatik.uni-leipzig.de
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
        netbsd-bugs@NetBSD.org
Subject: Re: kern/42205: kernel panic  at activated userquota
Date: Mon, 26 Oct 2009 12:32:22 +0100

 --NzB8fVQJ5HfG6fxh
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 On Mon, Oct 26, 2009 at 06:49:31AM +0100, 6bone@6bone.informatik.uni-leipzig.de wrote:
 > The patch works fine and solves the problem.

 Thanks. But, I think this was not the right way to fix the problem.
 Can you try the attached one instead ? It should fix your problem, and some
 others (including one that I fixed the wrong way some time ago, this is
 the code in #if 0/#endif in the patch).

 I'm running with this patch on several systems now and it seems to work fine.

 -- 
 Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
      NetBSD: 26 ans d'experience feront toujours la difference
 --

 --NzB8fVQJ5HfG6fxh
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="vget.diff"

 Index: sys/ufs/ufs/ufs_ihash.c
 ===================================================================
 RCS file: /cvsroot/src/sys/ufs/ufs/ufs_ihash.c,v
 retrieving revision 1.26.10.1
 diff -u -p -u -r1.26.10.1 ufs_ihash.c
 --- sys/ufs/ufs/ufs_ihash.c	28 Sep 2009 01:43:02 -0000	1.26.10.1
 +++ sys/ufs/ufs/ufs_ihash.c	24 Oct 2009 13:33:23 -0000
 @@ -152,6 +152,7 @@ ufs_ihashget(dev_t dev, ino_t inum, int 
  				mutex_exit(&ufs_ihash_lock);
  				if (vget(vp, flags | LK_INTERLOCK))
  					goto loop;
 +#if 0
  				if (VTOI(vp) != ip ||
  				    ip->i_number != inum || ip->i_dev != dev) {
  					/* lost race against vclean() */
 @@ -161,6 +162,7 @@ ufs_ihashget(dev_t dev, ino_t inum, int 
  					vp = NULL;
  					goto loop;
  				}
 +#endif
  			}
  			return (vp);
  		}
 Index: sys/kern/vfs_subr.c
 ===================================================================
 RCS file: /cvsroot/src/sys/kern/vfs_subr.c,v
 retrieving revision 1.357.4.5
 diff -u -p -u -r1.357.4.5 vfs_subr.c
 --- sys/kern/vfs_subr.c	21 Jul 2009 00:31:58 -0000	1.357.4.5
 +++ sys/kern/vfs_subr.c	24 Oct 2009 13:33:23 -0000
 @@ -370,6 +370,17 @@ try_nextlist:
  	vp->v_freelisthd = NULL;
  	mutex_exit(&vnode_free_list_lock);

 +	if (vp->v_usecount != 0) {
 +		/*
 +		 * was referenced again before we got the interlock
 +		 * Don't return to freelist - the holder of the last
 +		 * reference will destroy it.
 +		 */
 +		vrelel(vp, 0); /* releases vp->v_interlock */
 +		mutex_enter(&vnode_free_list_lock);
 +		goto retry;
 +	}
 +
  	/*
  	 * The vnode is still associated with a file system, so we must
  	 * clean it out before reusing it.  We need to add a reference
 @@ -1288,6 +1299,22 @@ vget(vnode_t *vp, int flags)
  		vrelel(vp, 0);
  		return ENOENT;
  	}
 +
 +	if ((vp->v_iflag & VI_INACTNOW) != 0) {
 +		/*
 +		 * if it's being desactived, wait for it to complete.
 +		 * Make sure to not return a clean vnode.
 +		 */
 +		 if ((flags & LK_NOWAIT) != 0) {
 +			vrelel(vp, 0);
 +			return EBUSY;
 +		}
 +		vwait(vp, VI_INACTNOW);
 +		if ((vp->v_iflag & VI_CLEAN) != 0) {
 +			vrelel(vp, 0);
 +			return ENOENT;
 +		}
 +	}
  	if (flags & LK_TYPE_MASK) {
  		error = vn_lock(vp, flags | LK_INTERLOCK);
  		if (error != 0) {
 @@ -1427,6 +1454,7 @@ vrelel(vnode_t *vp, int flags)
  			if (++vrele_pending > (desiredvnodes >> 8))
  				cv_signal(&vrele_cv); 
  			mutex_exit(&vrele_lock);
 +			cv_broadcast(&vp->v_cv);
  			mutex_exit(&vp->v_interlock);
  			return;
  		}
 @@ -1451,6 +1479,7 @@ vrelel(vnode_t *vp, int flags)
  		VOP_INACTIVE(vp, &recycle);
  		mutex_enter(&vp->v_interlock);
  		vp->v_iflag &= ~VI_INACTNOW;
 +		cv_broadcast(&vp->v_cv);
  		if (!recycle) {
  			if (vtryrele(vp)) {
  				mutex_exit(&vp->v_interlock);

 --NzB8fVQJ5HfG6fxh--

From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/42205: kernel panic  at activated userquota
Date: Wed, 28 Oct 2009 07:39:09 +0100 (CET)

 I applied the new patch. There was no crash in the last 24h. But it needs 
 some more days to say if it stable or not.

 Thank you for your efforts.

 Uwe



 On Mon, 26 Oct 2009, Manuel Bouyer wrote:

 > Thanks. But, I think this was not the right way to fix the problem.
 > Can you try the attached one instead ? It should fix your problem, and some
 > others (including one that I fixed the wrong way some time ago, this is
 > the code in #if 0/#endif in the patch).
 >
 > I'm running with this patch on several systems now and it seems to work fine.
 >
 > --
 > Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
 >      NetBSD: 26 ans d'experience feront toujours la difference
 > --

From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/42205: kernel panic  at activated userquota
Date: Thu, 29 Oct 2009 08:34:15 +0100 (CET)

 The new patch does not work. There was a crash last night.
 Unfortunately I can't report a trace, because ddb.onpanic was 0.


 Regards
 Uwe

 On Wed, 28 Oct 2009, 6bone@6bone.informatik.uni-leipzig.de wrote:

 > Date: Wed, 28 Oct 2009 06:40:04 +0000 (UTC)
 > From: 6bone@6bone.informatik.uni-leipzig.de
 > Reply-To: gnats-bugs@NetBSD.org
 > To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 >     netbsd-bugs@netbsd.org, 6bone@6bone.informatik.uni-leipzig.de
 > Subject: Re: kern/42205: kernel panic  at activated userquota
 > 
 > The following reply was made to PR kern/42205; it has been noted by GNATS.
 >
 > From: 6bone@6bone.informatik.uni-leipzig.de
 > To: gnats-bugs@NetBSD.org
 > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
 > Subject: Re: kern/42205: kernel panic  at activated userquota
 > Date: Wed, 28 Oct 2009 07:39:09 +0100 (CET)
 >
 > I applied the new patch. There was no crash in the last 24h. But it needs
 > some more days to say if it stable or not.
 >
 > Thank you for your efforts.
 >
 > Uwe
 >
 >
 >
 > On Mon, 26 Oct 2009, Manuel Bouyer wrote:
 >
 > > Thanks. But, I think this was not the right way to fix the problem.
 > > Can you try the attached one instead ? It should fix your problem, and some
 > > others (including one that I fixed the wrong way some time ago, this is
 > > the code in #if 0/#endif in the patch).
 > >
 > > I'm running with this patch on several systems now and it seems to work fine.
 > >
 > > --
 > > Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
 > >      NetBSD: 26 ans d'experience feront toujours la difference
 > > --
 >
 >

From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/42205: kernel panic  at activated userquota
Date: Fri, 30 Oct 2009 08:30:54 +0100 (CET)

 Hello

 now a dump from the last panic:

 uvm_fault(0xffffffff80c7a620, 0x0, 1) -> e
 fatal page fault in supervisor mode
 trap type 6 code 0 rip ffffffff803e4863 cs 8 rflags 10246 cr2  70 cpl 0 
 rsp fff0
 kernel: page fault trap, code=0
 Stopped in pid 0.61 (system) at netbsd:qsync+0x103:     movq 
 0x70(%rdx,%rax,8
 ),%r14
 db{2}> trace
 qsync() at netbsd:qsync+0x103
 ffs_sync() at netbsd:ffs_sync+0x2eb
 VFS_SYNC() at netbsd:VFS_SYNC+0x33
 sync_fsync() at netbsd:sync_fsync+0x85
 VOP_FSYNC() at netbsd:VOP_FSYNC+0x71
 sched_sync() at netbsd:sched_sync+0x15d
 db{2}> show registers
 ds          0xb3e8
 es          0x66a3
 fs          0xaac0
 gs          0x12
 rdi         0xffff8000ada6b3e8
 rsi         0
 rbp         0xffff80007282ab20
 rbx         0xffff8000ada6b3e8
 rdx         0
 rcx         0
 rax         0
 r8          0xffff80001f264000
 r9          0x7c
 r10         0xffff80001f264080
 r11         0
 r12         0xffff8000728164a0
 r13         0xffff80008cd4d188
 r14         0
 r15         0xffff800072dddd48
 rip         0xffffffff803e4863  qsync+0x103
 cs          0x8
 rflags      0x10246
 rsp         0xffff80007282aaf0
 ss          0x10
 netbsd:qsync+0x103:     movq    0x70(%rdx,%rax,8),%r14
 db{2}> cont
 uvm_fault(0xffffffff80c7a620, 0x0, 1) -> e
 fatal page fault in supervisor mode
 trap type 6 code 0 rip ffffffff803e4863 cs 8 rflags 10246 cr2  70 cpl 0 
 rsp fff0
 kernel: page fault trap, code=0
 Stopped in pid 0.61 (system) at netbsd:qsync+0x103:     movq 
 0x70(%rdx,%rax,8
 ),%r14


 regards
 Uwe


 On Thu, 29 Oct 2009, 
 6bone@6bone.informatik.uni-leipzig.de wrote:

 > Date: Thu, 29 Oct 2009 07:35:01 +0000 (UTC)
 > From: 6bone@6bone.informatik.uni-leipzig.de
 > Reply-To: gnats-bugs@NetBSD.org
 > To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 >     netbsd-bugs@netbsd.org, 6bone@6bone.informatik.uni-leipzig.de
 > Subject: Re: kern/42205: kernel panic  at activated userquota
 > 
 > The following reply was made to PR kern/42205; it has been noted by GNATS.
 >
 > From: 6bone@6bone.informatik.uni-leipzig.de
 > To: gnats-bugs@NetBSD.org
 > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
 > Subject: Re: kern/42205: kernel panic  at activated userquota
 > Date: Thu, 29 Oct 2009 08:34:15 +0100 (CET)
 >
 > The new patch does not work. There was a crash last night.
 > Unfortunately I can't report a trace, because ddb.onpanic was 0.
 >
 >
 > Regards
 > Uwe
 >
 > On Wed, 28 Oct 2009, 6bone@6bone.informatik.uni-leipzig.de wrote:
 >
 > > Date: Wed, 28 Oct 2009 06:40:04 +0000 (UTC)
 > > From: 6bone@6bone.informatik.uni-leipzig.de
 > > Reply-To: gnats-bugs@NetBSD.org
 > > To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 > >     netbsd-bugs@netbsd.org, 6bone@6bone.informatik.uni-leipzig.de
 > > Subject: Re: kern/42205: kernel panic  at activated userquota
 > >
 > > The following reply was made to PR kern/42205; it has been noted by GNATS.
 > >
 > > From: 6bone@6bone.informatik.uni-leipzig.de
 > > To: gnats-bugs@NetBSD.org
 > > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
 > > Subject: Re: kern/42205: kernel panic  at activated userquota
 > > Date: Wed, 28 Oct 2009 07:39:09 +0100 (CET)
 > >
 > > I applied the new patch. There was no crash in the last 24h. But it needs
 > > some more days to say if it stable or not.
 > >
 > > Thank you for your efforts.
 > >
 > > Uwe
 > >
 > >
 > >
 > > On Mon, 26 Oct 2009, Manuel Bouyer wrote:
 > >
 > > > Thanks. But, I think this was not the right way to fix the problem.
 > > > Can you try the attached one instead ? It should fix your problem, and some
 > > > others (including one that I fixed the wrong way some time ago, this is
 > > > the code in #if 0/#endif in the patch).
 > > >
 > > > I'm running with this patch on several systems now and it seems to work fine.
 > > >
 > > > --
 > > > Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
 > > >      NetBSD: 26 ans d'experience feront toujours la difference
 > > > --
 > >
 > >
 >
 >

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org,
        6bone@6bone.informatik.uni-leipzig.de
Subject: Re: kern/42205: kernel panic  at activated userquota
Date: Fri, 30 Oct 2009 16:27:52 +0100

 --QTprm0S8XgL7H0Dt
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 On Fri, Oct 30, 2009 at 07:35:02AM +0000, 6bone@6bone.informatik.uni-leipzig.de wrote:
 >  Hello
 >  
 >  now a dump from the last panic:
 >  
 >  uvm_fault(0xffffffff80c7a620, 0x0, 1) -> e
 >  fatal page fault in supervisor mode
 >  trap type 6 code 0 rip ffffffff803e4863 cs 8 rflags 10246 cr2  70 cpl 0 
 >  rsp fff0
 >  kernel: page fault trap, code=0
 >  Stopped in pid 0.61 (system) at netbsd:qsync+0x103:     movq 
 >  0x70(%rdx,%rax,8
 >  ),%r14
 >  db{2}> trace
 >  qsync() at netbsd:qsync+0x103
 >  ffs_sync() at netbsd:ffs_sync+0x2eb
 >  VFS_SYNC() at netbsd:VFS_SYNC+0x33
 >  sync_fsync() at netbsd:sync_fsync+0x85
 >  VOP_FSYNC() at netbsd:VOP_FSYNC+0x71
 >  sched_sync() at netbsd:sched_sync+0x15d

 So it's still VOTI(vp) being NULL.
 Can you install the attached patch ? When it runs on a null inode here
 it will print the associated vnode (and hopefully avoid the panic :)
 Please monitor console output or dmesg and when the vprint fires,
 report it there.

 -- 
 Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
      NetBSD: 26 ans d'experience feront toujours la difference
 --

 --QTprm0S8XgL7H0Dt
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="quota.diff"

 Index: ufs/ufs/ufs_quota.c
 ===================================================================
 RCS file: /cvsroot/src/sys/ufs/ufs/ufs_quota.c,v
 retrieving revision 1.60.10.3
 diff -u -p -u -r1.60.10.3 ufs_quota.c
 --- ufs/ufs/ufs_quota.c	7 Aug 2009 05:59:44 -0000	1.60.10.3
 +++ ufs/ufs/ufs_quota.c	30 Oct 2009 15:27:13 -0000
 @@ -743,6 +743,14 @@ qsync(struct mount *mp)
  			}
  			continue;
  		}
 +		if (VTOI(vp) == NULL) {
 +			vprint("qsync vp wihout ip", vp);
 +			mutex_enter(&mntvnode_lock);
 +			vunmark(mvp);
 +			vlockmgr(vp->v_vnlock, LK_RELEASE);
 +			vrele(vp);
 +			goto again;
 +		}
  		for (i = 0; i < MAXQUOTAS; i++) {
  			dq = VTOI(vp)->i_dquot[i];
  			if (dq == NODQUOT)

 --QTprm0S8XgL7H0Dt--

From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/42205: kernel panic  at activated userquota
Date: Thu, 5 Nov 2009 07:55:41 +0100 (CET)

 qsync vp wihout ip: vnode @ 0xffff8000974df5f0, flags (10<MPSAFE>)
          tag VT_UFS(1), type VLNK(5), usecount 1, writecount 0, holdcount 
 0
          freelisthd 0x0, mount 0xffff800072988000, data 
 0xffff8000974e0dc0 lock 0xffff8000974df6f8 recursecnt 0
          tag VT_UFS, ino 55338475, on dev 19, 4 flags 0x0, effnlink 1, 
 nlink 1
          mode 0120775, owner 1007, group 100, size 31


 Regards
 Uwe


 On Fri, 
 30 Oct 2009, Manuel Bouyer wrote:

 > Date: Fri, 30 Oct 2009 15:30:05 +0000 (UTC)
 > From: Manuel Bouyer <bouyer@antioche.eu.org>
 > Reply-To: gnats-bugs@NetBSD.org
 > To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 >     netbsd-bugs@netbsd.org, 6bone@6bone.informatik.uni-leipzig.de
 > Subject: Re: kern/42205: kernel panic  at activated userquota
 > 
 > The following reply was made to PR kern/42205; it has been noted by GNATS.
 >
 > From: Manuel Bouyer <bouyer@antioche.eu.org>
 > To: gnats-bugs@NetBSD.org
 > Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org,
 >        6bone@6bone.informatik.uni-leipzig.de
 > Subject: Re: kern/42205: kernel panic  at activated userquota
 > Date: Fri, 30 Oct 2009 16:27:52 +0100
 >
 > --QTprm0S8XgL7H0Dt
 > Content-Type: text/plain; charset=us-ascii
 > Content-Disposition: inline
 >
 > On Fri, Oct 30, 2009 at 07:35:02AM +0000, 6bone@6bone.informatik.uni-leipzig.de wrote:
 > >  Hello
 > >
 > >  now a dump from the last panic:
 > >
 > >  uvm_fault(0xffffffff80c7a620, 0x0, 1) -> e
 > >  fatal page fault in supervisor mode
 > >  trap type 6 code 0 rip ffffffff803e4863 cs 8 rflags 10246 cr2  70 cpl 0
 > >  rsp fff0
 > >  kernel: page fault trap, code=0
 > >  Stopped in pid 0.61 (system) at netbsd:qsync+0x103:     movq
 > >  0x70(%rdx,%rax,8
 > >  ),%r14
 > >  db{2}> trace
 > >  qsync() at netbsd:qsync+0x103
 > >  ffs_sync() at netbsd:ffs_sync+0x2eb
 > >  VFS_SYNC() at netbsd:VFS_SYNC+0x33
 > >  sync_fsync() at netbsd:sync_fsync+0x85
 > >  VOP_FSYNC() at netbsd:VOP_FSYNC+0x71
 > >  sched_sync() at netbsd:sched_sync+0x15d
 >
 > So it's still VOTI(vp) being NULL.
 > Can you install the attached patch ? When it runs on a null inode here
 > it will print the associated vnode (and hopefully avoid the panic :)
 > Please monitor console output or dmesg and when the vprint fires,
 > report it there.
 >
 > --
 > Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
 >      NetBSD: 26 ans d'experience feront toujours la difference
 > --
 >
 > --QTprm0S8XgL7H0Dt
 > Content-Type: text/plain; charset=us-ascii
 > Content-Disposition: attachment; filename="quota.diff"
 >
 > Index: ufs/ufs/ufs_quota.c
 > ===================================================================
 > RCS file: /cvsroot/src/sys/ufs/ufs/ufs_quota.c,v
 > retrieving revision 1.60.10.3
 > diff -u -p -u -r1.60.10.3 ufs_quota.c
 > --- ufs/ufs/ufs_quota.c	7 Aug 2009 05:59:44 -0000	1.60.10.3
 > +++ ufs/ufs/ufs_quota.c	30 Oct 2009 15:27:13 -0000
 > @@ -743,6 +743,14 @@ qsync(struct mount *mp)
 >  			}
 >  			continue;
 >  		}
 > +		if (VTOI(vp) == NULL) {
 > +			vprint("qsync vp wihout ip", vp);
 > +			mutex_enter(&mntvnode_lock);
 > +			vunmark(mvp);
 > +			vlockmgr(vp->v_vnlock, LK_RELEASE);
 > +			vrele(vp);
 > +			goto again;
 > +		}
 >  		for (i = 0; i < MAXQUOTAS; i++) {
 >  			dq = VTOI(vp)->i_dquot[i];
 >  			if (dq == NODQUOT)
 >
 > --QTprm0S8XgL7H0Dt--
 >
 >

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: 6bone@6bone.informatik.uni-leipzig.de
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
        netbsd-bugs@NetBSD.org
Subject: Re: kern/42205: kernel panic  at activated userquota
Date: Sat, 21 Nov 2009 18:58:17 +0100

 On Thu, Nov 05, 2009 at 07:55:41AM +0100, 6bone@6bone.informatik.uni-leipzig.de wrote:
 > qsync vp wihout ip: vnode @ 0xffff8000974df5f0, flags (10<MPSAFE>)
 >         tag VT_UFS(1), type VLNK(5), usecount 1, writecount 0, holdcount 
 > 0
 >         freelisthd 0x0, mount 0xffff800072988000, data 
 > 0xffff8000974e0dc0 lock 0xffff8000974df6f8 recursecnt 0
 >         tag VT_UFS, ino 55338475, on dev 19, 4 flags 0x0, effnlink 1, 
 > nlink 1
 >         mode 0120775, owner 1007, group 100, size 31

 Wow, not that's strange. we get there because VTOI(vp) == NULL.
 VTOI is ((struct inode *)(vp)->v_data), and v_data is obviously not NULL
 in this vnode. How could this happen ?

 Hum, can you send the dmesg and 'cpuctl identify cpu0' for this machine ?

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/42205: kernel panic  at activated userquota
Date: Sat, 21 Nov 2009 19:13:37 +0100 (CET)

 On Sat, 21 Nov 2009, Manuel Bouyer wrote:
 > Hum, can you send the dmesg and 'cpuctl identify cpu0' for this machine ?

 Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
      2006, 2007, 2008, 2009
      The NetBSD Foundation, Inc.  All rights reserved.
 Copyright (c) 1982, 1986, 1989, 1991, 1993
      The Regents of the University of California.  All rights reserved.

 NetBSD 5.0_STABLE (MYCONF) #0: Thu Nov 12 13:17:15 CET 2009
  	root@6bone.informatik.uni-leipzig.de:/usr/obj/sys/arch/amd64/compile/MYCONF
 total memory = 16378 MB
 avail memory = 15865 MB
 timecounter: Timecounters tick every 10.000 msec
 timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
 SMBIOS rev. 2.5 @ 0xbfb9c000 (66 entries)
 Dell Inc. PowerEdge 1950
 mainbus0 (root)
 cpu0 at mainbus0 apid 0: Intel 686-class, 1995MHz, id 0x6f6
 cpu1 at mainbus0 apid 6: Intel 686-class, 1995MHz, id 0x6f6
 cpu2 at mainbus0 apid 1: Intel 686-class, 1995MHz, id 0x6f6
 cpu3 at mainbus0 apid 7: Intel 686-class, 1995MHz, id 0x6f6
 ioapic0 at mainbus0 apid 8: pa 0xfec00000, version 20, 24 pins
 ioapic1 at mainbus0 apid 9: pa 0xfec81000, version 20, 24 pins
 acpi0 at mainbus0: Intel ACPICA 20080321
 acpi0: X/RSDT: OemId <DELL  ,PE_SC3  ,00000001>, AslId <DELL,00000001>
 acpi0: SCI interrupting at int 9
 acpi0: fixed-feature power button present
 timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
 ACPI-Fast 24-bit timer
 attimer1 at acpi0 (TMR, PNP0100): io 0x40-0x5f irq 0
 COMA (PNP0501) at acpi0 not configured
 COMB (PNP0501) at acpi0 not configured
 hpet0 at acpi0 (HPET, PNP0103-0): mem 0xfed00000-0xfed003ff
 timecounter: Timecounter "hpet0" frequency 14318179 Hz quality 2000
 ipmi0 at mainbus0
 pci0 at mainbus0 bus 0: configuration mode 1
 pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
 pchb0 at pci0 dev 0 function 0
 pchb0: vendor 0x8086 product 0x25c0 (rev. 0x12)
 ppb0 at pci0 dev 2 function 0: vendor 0x8086 product 0x25e2 (rev. 0x12)
 pci1 at ppb0 bus 6
 pci1: i/o space, memory space enabled, rd/line, wr/inv ok
 ppb1 at pci1 dev 0 function 0: vendor 0x8086 product 0x3500 (rev. 0x01)
 pci2 at ppb1 bus 7
 pci2: i/o space, memory space enabled, rd/line, wr/inv ok
 ppb2 at pci2 dev 0 function 0: vendor 0x8086 product 0x3510 (rev. 0x01)
 pci3 at ppb2 bus 8
 pci3: i/o space, memory space enabled, rd/line, wr/inv ok
 ppb3 at pci2 dev 1 function 0: vendor 0x8086 product 0x3514 (rev. 0x01)
 pci4 at ppb3 bus 10
 pci4: i/o space, memory space enabled, rd/line, wr/inv ok
 ppb4 at pci1 dev 0 function 3: vendor 0x8086 product 0x350c (rev. 0x01)
 ppb4: disabling notification events
 pci5 at ppb4 bus 11
 pci5: i/o space, memory space enabled, rd/line, wr/inv ok
 ppb5 at pci0 dev 3 function 0: vendor 0x8086 product 0x25e3 (rev. 0x12)
 pci6 at ppb5 bus 1
 pci6: i/o space, memory space enabled, rd/line, wr/inv ok
 ppb6 at pci6 dev 0 function 0: vendor 0x8086 product 0x0370 (rev. 0x00)
 ppb6: disabling notification events
 pci7 at ppb6 bus 2
 pci7: i/o space, memory space enabled, rd/line, wr/inv ok
 mfi0 at pci7 dev 14 function 0: Dell PERC 5/i integrated
 mfi0: interrupting at ioapic1 pin 14
 mfi0: logical drives 1, version 5.1.1-0040, 256MB RAM
 scsibus0 at mfi0: 64 targets, 8 luns per target
 ppb7 at pci6 dev 0 function 2: vendor 0x8086 product 0x0372 (rev. 0x00)
 ppb7: disabling notification events
 pci8 at ppb7 bus 3
 pci8: i/o space, memory space enabled, rd/line, wr/inv ok
 ppb8 at pci0 dev 4 function 0: vendor 0x8086 product 0x25f8 (rev. 0x12)
 pci9 at ppb8 bus 12
 pci9: i/o space, memory space enabled, rd/line, wr/inv ok
 wm0 at pci9 dev 0 function 0: Intel PRO/1000 PT (82571EB), rev. 6
 wm0: interrupting at ioapic0 pin 16
 wm0: PCI-Express bus
 wm0: 65536 word (16 address bits) SPI EEPROM
 wm0: Ethernet address 00:15:17:0e:98:5e
 igphy0 at wm0 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0
 igphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
 wm1 at pci9 dev 0 function 1: Intel PRO/1000 PT (82571EB), rev. 6
 wm1: interrupting at ioapic0 pin 17
 wm1: PCI-Express bus
 wm1: 65536 word (16 address bits) SPI EEPROM
 wm1: Ethernet address 00:15:17:0e:98:5f
 igphy1 at wm1 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0
 igphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
 ppb9 at pci0 dev 5 function 0: vendor 0x8086 product 0x25e5 (rev. 0x12)
 pci10 at ppb9 bus 13
 pci10: i/o space, memory space enabled, rd/line, wr/inv ok
 ppb10 at pci0 dev 6 function 0: vendor 0x8086 product 0x25f9 (rev. 0x12)
 pci11 at ppb10 bus 14
 pci11: i/o space, memory space enabled, rd/line, wr/inv ok
 ppb11 at pci11 dev 0 function 0: vendor 0x8086 product 0x0330 (rev. 0x07)
 ppb11: disabling notification events
 pci12 at ppb11 bus 15
 pci12: i/o space, memory space enabled, rd/line, wr/inv ok
 amr0 at pci12 dev 14 function 0: AMI RAID <PERC 4e/DC>
 amr0: interrupting at ioapic0 pin 18
 amr0: firmware 522A, BIOS H430, 128MB RAM
 ld0 at amr0 unit 0: RAID 0, optimal
 ld0: 1629 GB, 212749 cyl, 255 head, 63 sec, 512 bytes/sect x 3417825280 sectors
 ld1 at amr0 unit 1: RAID 0, optimal
 ld1: 1630 GB, 212847 cyl, 255 head, 63 sec, 512 bytes/sect x 3419402240 sectors
 ld2 at amr0 unit 2: RAID 0, optimal
 ld2: 1396 GB, 182350 cyl, 255 head, 63 sec, 512 bytes/sect x 2929459200 sectors
 ld3 at amr0 unit 3: RAID 0, optimal
 ld3: 1862 GB, 243149 cyl, 255 head, 63 sec, 512 bytes/sect x 3906191360 sectors
 ld4 at amr0 unit 4: RAID 0, optimal
 ld4: 1396 GB, 182350 cyl, 255 head, 63 sec, 512 bytes/sect x 2929459200 sectors
 ppb12 at pci11 dev 0 function 2: vendor 0x8086 product 0x0332 (rev. 0x07)
 ppb12: disabling notification events
 pci13 at ppb12 bus 16
 pci13: i/o space, memory space enabled, rd/line, wr/inv ok
 ppb13 at pci0 dev 7 function 0: vendor 0x8086 product 0x25e7 (rev. 0x12)
 pci14 at ppb13 bus 17
 pci14: i/o space, memory space enabled, rd/line, wr/inv ok
 pchb1 at pci0 dev 16 function 0
 pchb1: vendor 0x8086 product 0x25f0 (rev. 0x12)
 pchb2 at pci0 dev 16 function 1
 pchb2: vendor 0x8086 product 0x25f0 (rev. 0x12)
 pchb3 at pci0 dev 16 function 2
 pchb3: vendor 0x8086 product 0x25f0 (rev. 0x12)
 pchb4 at pci0 dev 17 function 0
 pchb4: vendor 0x8086 product 0x25f1 (rev. 0x12)
 pchb5 at pci0 dev 19 function 0
 pchb5: vendor 0x8086 product 0x25f3 (rev. 0x12)
 pchb6 at pci0 dev 21 function 0
 pchb6: vendor 0x8086 product 0x25f5 (rev. 0x12)
 pchb7 at pci0 dev 22 function 0
 pchb7: vendor 0x8086 product 0x25f6 (rev. 0x12)
 ppb14 at pci0 dev 28 function 0: vendor 0x8086 product 0x2690 (rev. 0x09)
 pci15 at ppb14 bus 4
 pci15: i/o space, memory space enabled, rd/line, wr/inv ok
 uhci0 at pci0 dev 29 function 0: vendor 0x8086 product 0x2688 (rev. 0x09)
 uhci0: interrupting at ioapic0 pin 21
 usb0 at uhci0: USB revision 1.0
 uhci1 at pci0 dev 29 function 1: vendor 0x8086 product 0x2689 (rev. 0x09)
 uhci1: interrupting at ioapic0 pin 20
 usb1 at uhci1: USB revision 1.0
 uhci2 at pci0 dev 29 function 2: vendor 0x8086 product 0x268a (rev. 0x09)
 uhci2: interrupting at ioapic0 pin 21
 usb2 at uhci2: USB revision 1.0
 ehci0 at pci0 dev 29 function 7: vendor 0x8086 product 0x268c (rev. 0x09)
 ehci0: interrupting at ioapic0 pin 21
 ehci0: EHCI version 1.0
 ehci0: companion controllers, 2 ports each: uhci0 uhci1 uhci2
 usb3 at ehci0: USB revision 2.0
 ppb15 at pci0 dev 30 function 0: vendor 0x8086 product 0x244e (rev. 0xd9)
 pci16 at ppb15 bus 18
 pci16: i/o space, memory space enabled
 vga0 at pci16 dev 13 function 0: vendor 0x1002 product 0x515e (rev. 0x02)
 wsdisplay0 at vga0 kbdmux 1
 wsmux1: connecting to wsdisplay0
 drm at vga0 not configured
 ichlpcib0 at pci0 dev 31 function 0
 ichlpcib0: vendor 0x8086 product 0x2670 (rev. 0x09)
 timecounter: Timecounter "ichlpcib0" frequency 3579545 Hz quality 1000
 ichlpcib0: 24-bit timer
 ichlpcib0: TCO (watchdog) timer configured.
 piixide0 at pci0 dev 31 function 1
 piixide0: Intel 631xESB/632xESB IDE Controller (rev. 0x09)
 piixide0: bus-master DMA support present
 piixide0: primary channel configured to compatibility mode
 piixide0: primary channel interrupting at ioapic0 pin 14
 atabus0 at piixide0 channel 0
 piixide0: secondary channel configured to compatibility mode
 piixide0: secondary channel ignored (disabled)
 isa0 at ichlpcib0
 com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
 com0: console
 com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
 pckbc0 at isa0 port 0x60-0x64
 pcppi0 at isa0 port 0x61
 midi0 at pcppi0: PC speaker (CPU-intensive output)
 sysbeep0 at pcppi0
 attimer1: attached to pcppi0
 timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
 timecounter: Timecounter "TSC" frequency 1995111120 Hz quality 3000
 scsibus0: waiting 2 seconds for devices to settle...
 atapibus0 at atabus0: 2 targets
 uhub0 at usb0: vendor 0x8086 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
 uhub0: 2 ports with 2 removable, self powered
 uhub1 at usb1: vendor 0x8086 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
 uhub1: 2 ports with 2 removable, self powered
 cd0 at atapibus0 drive 0: <HL-DT-STCD-RW/DVD-ROM GCC-4244N, , B101> cdrom removable
 cd0: 32-bit data port
 cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
 cd0(piixide0:0:0): using PIO mode 4, DMA mode 2 (using DMA)
 uhub2 at usb3: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
 uhub2: 6 ports with 6 removable, self powered
 uhub3 at usb2: vendor 0x8086 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
 uhub3: 2 ports with 2 removable, self powered
 uhub4 at uhub2 port 1: vendor 0x413c product 0xa001, class 9/0, rev 2.00/0.00, addr 2
 uhub4: multiple transaction translators
 sd0 at scsibus0 target 0 lun 0: <DELL, PERC 5/i, 1.03> disk fixed
 uhub4: 2 ports with 2 removable, self powered
 sd0: fabricating a geometry
 sd0: 136 GB, 139300 cyl, 64 head, 32 sec, 512 bytes/sect x 285286400 sectors
 sd0: fabricating a geometry
 uhidev0 at uhub4 port 1 configuration 1 interface 0
 uhidev0: Dell DRAC5, rev 1.10/0.00, addr 3, iclass 3/1
 ukbd0 at uhidev0
 wskbd0 at ukbd0 mux 1
 wskbd0: connecting to wsdisplay0
 uhidev1 at uhub4 port 1 configuration 1 interface 1
 uhidev1: Dell DRAC5, rev 1.10/0.00, addr 3, iclass 3/1
 ums0 at uhidev1
 ums0: X report 0x0002 not supported
 umass0 at uhub4 port 2 configuration 1 interface 0
 umass0: DELL  INC. DRAC5 VIRTUAL  MEDIA, rev 2.00/0.00, addr 4
 umass0: using SCSI over Bulk-Only
 scsibus1 at umass0: 2 targets, 1 lun per target
 umass1 at uhub4 port 2 configuration 1 interface 1cd1 at scsibus1 target 0 lun 0: <Dell, Virtual  CDROM, 123> cdrom removable

 umass1: DELL  INC. DRAC5 VIRTUAL  MEDIA, rev 2.00/0.00, addr 4
 umass1: using SCSI over Bulk-Only
 scsibus2 at umass1: 2 targets, 1 lun per target
 sd1 at scsibus2 target 0 lun 0: <Dell, Virtual  Floppy, 123> disk removable
 sd1: drive offline
 sd1(umass1:0:0:0):  Check Condition on CDB: 0x00 00 00 00 00 00
      SENSE KEY:  Not Ready
       ASC/ASCQ:  Medium Not Present

 sd1: unable to open device, error = 19
 uhub5 at uhub2 port 5: vendor 0x04b4 product 0x6560, class 9/0, rev 2.00/0.0b, addr 5
 uhub5: multiple transaction translators
 uhub5: 4 ports with 4 removable, self powered
 uhidev2 at uhub1 port 1 configuration 1 interface 0
 uhidev2: CHESEN PS2 to USB Converter, rev 1.10/0.10, addr 2, iclass 3/1
 ukbd1 at uhidev2
 wskbd1 at ukbd1 mux 1
 wskbd1: connecting to wsdisplay0
 uhidev3 at uhub1 port 1 configuration 1 interface 1
 uhidev3: CHESEN PS2 to USB Converter, rev 1.10/0.10, addr 2, iclass 3/1
 uhidev3: 3 report ids
 ums1 at uhidev3 reportid 1: 5 buttons and Z dir.
 wsmouse0 at ums1 mux 0
 uhid0 at uhidev3 reportid 2: input=1, output=0, feature=0
 uhid1 at uhidev3 reportid 3: input=3, output=0, feature=0
 ipmi0: version 2.0 interface KCS iobase 0xca8/8 spacing 4
 Kernelized RAIDframe activated
 pad0: outputs: 44100Hz, 16-bit, stereo
 audio0 at pad0: half duplex, playback, capture
 sd1(umass1:0:0:0):  Check Condition on CDB: 0x00 00 00 00 00 00
      SENSE KEY:  Not Ready
       ASC/ASCQ:  Medium Not Present

 sd1(umass1:0:0:0):  Check Condition on CDB: 0x00 00 00 00 00 00
      SENSE KEY:  Not Ready
       ASC/ASCQ:  Medium Not Present

 sd1(umass1:0:0:0):  Check Condition on CDB: 0x00 00 00 00 00 00
      SENSE KEY:  Not Ready
       ASC/ASCQ:  Medium Not Present

 sd1(umass1:0:0:0):  Check Condition on CDB: 0x00 00 00 00 00 00
      SENSE KEY:  Not Ready
       ASC/ASCQ:  Medium Not Present

 boot device: sd0
 root on sd0a dumps on sd0b
 root file system type: ffs
 mfi0: normal state on 'mfi0:0' (online)
 raid0: Component /dev/ld2a being configured at col: 0
           Column: 0 Num Columns: 3
           Version: 2 Serial Number: 1223334444 Mod Counter: 350
           Clean: No Status: 0
 /dev/ld2a is not clean!
 raid0: Component /dev/ld3a being configured at col: 1
           Column: 1 Num Columns: 3
           Version: 2 Serial Number: 1223334444 Mod Counter: 350
           Clean: No Status: 0
 /dev/ld3a is not clean!
 raid0: Component /dev/ld4a being configured at col: 2
           Column: 2 Num Columns: 3
           Version: 2 Serial Number: 1223334444 Mod Counter: 350
           Clean: No Status: 0
 /dev/ld4a is not clean!
 raid0: RAID Level 0
 raid0: Components: /dev/ld2a /dev/ld3a /dev/ld4a
 raid0: Total Sectors: 8788377216 (4291199 MB)
 raid0: GPT GUID: 1e0b51e4-bbfa-11de-9484-0015170e985e
 dk0 at raid0: 1e0b51ee-bbfa-11de-9484-0015170e985e
 dk0: 134217728 blocks at 34, type: swap
 dk1 at raid0: 1e0b51ee-bbfa-11de-9485-0015170e985e
 dk1: 8654159421 blocks at 134217762, type: ffs
 wsdisplay0: screen 1 added (80x25, vt100 emulation)
 wsdisplay0: screen 2 added (80x25, vt100 emulation)
 wsdisplay0: screen 3 added (80x25, vt100 emulation)
 wsdisplay0: screen 4 added (80x25, vt100 emulation)

 #################################################################

 cpuctl list

 Num  HwId Unbound LWPs Interrupts     Last change
 ---- ---- ------------ -------------- ----------------------------
 0    0    online       intr           Thu Nov 19 08:07:34 2009
 1    1    online       intr           Thu Nov 19 08:07:34 2009
 2    2    online       intr           Thu Nov 19 08:07:34 2009
 3    3    online       intr           Thu Nov 19 08:07:34 2009

 #################################################################

 cpuctl identify 0

 cpu0: Intel Core 2 (Merom) (686-class), 1995.11 MHz, id 0x6f6
 cpu0: features 0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
 cpu0: features 0xbfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
 cpu0: features 0xbfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
 cpu0: features2 0x4e33d<SSE3,DTES64,MONITOR,DS-CPL,VMX,TM2,SSSE3,CX16,xTPR,PDCM,DCA>
 cpu0: features3 0x20100800<SYSCALL/SYSRET,XD,EM64T>
 cpu0: "Intel(R) Xeon(R) CPU            5130  @ 2.00GHz"
 cpu0: I-cache 32KB 64B/line 8-way, D-cache 32KB 64B/line 8-way
 cpu0: L2 cache 4MB 64B/line 16-way
 cpu0: ITLB 128 4KB entries 4-way
 cpu0: DTLB 256 4KB entries 4-way, 16 4MB entries 4-way
 cpu0: Initial APIC ID 0
 cpu0: Cluster/Package ID 0
 cpu0: Core ID 0
 cpu0: family 06 model 0f extfamily 00 extmodel 00

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: 6bone@6bone.informatik.uni-leipzig.de
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
        netbsd-bugs@NetBSD.org
Subject: Re: kern/42205: kernel panic  at activated userquota
Date: Wed, 6 Jan 2010 19:29:55 +0100

 --/9DWx/yDrRhgMJTb
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 On Sat, Nov 21, 2009 at 06:58:17PM +0100, Manuel Bouyer wrote:
 > On Thu, Nov 05, 2009 at 07:55:41AM +0100, 6bone@6bone.informatik.uni-leipzig.de wrote:
 > > qsync vp wihout ip: vnode @ 0xffff8000974df5f0, flags (10<MPSAFE>)
 > >         tag VT_UFS(1), type VLNK(5), usecount 1, writecount 0, holdcount 
 > > 0
 > >         freelisthd 0x0, mount 0xffff800072988000, data 
 > > 0xffff8000974e0dc0 lock 0xffff8000974df6f8 recursecnt 0
 > >         tag VT_UFS, ino 55338475, on dev 19, 4 flags 0x0, effnlink 1, 
 > > nlink 1
 > >         mode 0120775, owner 1007, group 100, size 31
 > 
 > Wow, not that's strange. we get there because VTOI(vp) == NULL.
 > VTOI is ((struct inode *)(vp)->v_data), and v_data is obviously not NULL
 > in this vnode. How could this happen ?

 I have an idea on how this can happen; the vnode is put on the mnt list before
 initialisation is completed. But then its type should be VNON and so it should
 be skipped.

 Anyway, ffs_sync() checks for both v_type == VNON and VTOI(vp) == NULL, so
 we could do the same in qsync. While there also check for VCLEAN, like
 ffs_sync() although this should also not be needed.
 Can you see if the attached patch prevents the vprint from firing ?

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

 --/9DWx/yDrRhgMJTb
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="quota.diff2"

 Index: ufs/ufs_quota.c
 ===================================================================
 RCS file: /cvsroot/src/sys/ufs/ufs/ufs_quota.c,v
 retrieving revision 1.60.10.1
 diff -u -r1.60.10.1 ufs_quota.c
 --- ufs/ufs_quota.c	2 Feb 2009 18:24:17 -0000	1.60.10.1
 +++ ufs/ufs_quota.c	6 Jan 2010 18:22:55 -0000
 @@ -728,8 +728,9 @@
  	for (vp = TAILQ_FIRST(&mp->mnt_vnodelist); vp; vp = vunmark(mvp)) {
  		vmark(mvp, vp);
  		mutex_enter(&vp->v_interlock);
 -		if (vp->v_mount != mp || vismarker(vp) || vp->v_type == VNON ||
 -		    (vp->v_iflag & VI_CLEAN) != 0) {
 +		if (VTOI(vp) == NULL || vp->v_mount != mp || vismarker(vp) ||
 +		    vp->v_type == VNON ||
 +		    (vp->v_iflag & (VI_XLOCK | VI_CLEAN)) != 0) {
  			mutex_exit(&vp->v_interlock);
  			continue;
  		}

 --/9DWx/yDrRhgMJTb--

From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/42205: kernel panic  at activated userquota
Date: Thu, 7 Jan 2010 19:31:55 +0100 (CET)

 I installed the new kernel. No message in the last 24 hours. It needs some 
 more days to say if the can patch solve the problem or not.

 regards
 Uwe


 On Wed, 6 Jan 2010, Manuel Bouyer wrote:

 > Date: Wed,  6 Jan 2010 18:35:02 +0000 (UTC)
 > From: Manuel Bouyer <bouyer@antioche.eu.org>
 > Reply-To: gnats-bugs@NetBSD.org
 > To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 >     netbsd-bugs@netbsd.org, 6bone@6bone.informatik.uni-leipzig.de
 > Subject: Re: kern/42205: kernel panic  at activated userquota
 > 
 > The following reply was made to PR kern/42205; it has been noted by GNATS.
 >
 > From: Manuel Bouyer <bouyer@antioche.eu.org>
 > To: 6bone@6bone.informatik.uni-leipzig.de
 > Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
 >        netbsd-bugs@NetBSD.org
 > Subject: Re: kern/42205: kernel panic  at activated userquota
 > Date: Wed, 6 Jan 2010 19:29:55 +0100
 >
 > I have an idea on how this can happen; the vnode is put on the mnt list before
 > initialisation is completed. But then its type should be VNON and so it should
 > be skipped.
 >
 > Anyway, ffs_sync() checks for both v_type == VNON and VTOI(vp) == NULL, so
 > we could do the same in qsync. While there also check for VCLEAN, like
 > ffs_sync() although this should also not be needed.
 > Can you see if the attached patch prevents the vprint from firing ?


From: Manuel Bouyer <bouyer@antioche.eu.org>
To: 6bone@6bone.informatik.uni-leipzig.de
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
        netbsd-bugs@NetBSD.org
Subject: Re: kern/42205: kernel panic  at activated userquota
Date: Fri, 15 Jan 2010 13:26:42 +0100

 On Wed, Jan 06, 2010 at 07:29:55PM +0100, Manuel Bouyer wrote:
 > I have an idea on how this can happen; the vnode is put on the mnt list before
 > initialisation is completed. But then its type should be VNON and so it should
 > be skipped.

 It can also still be on the mnt list while being removed from the
 free list and cleaned. Especially, getcleanvnode() set v_type to
 VNON after releasing the interlock. this patch would indeed fix this.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Manuel Bouyer <bouyer@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/42205 CVS commit: src/sys/ufs/ufs
Date: Fri, 15 Jan 2010 19:46:35 +0000

 Module Name:	src
 Committed By:	bouyer
 Date:		Fri Jan 15 19:46:35 UTC 2010

 Modified Files:
 	src/sys/ufs/ufs: ufs_quota.c

 Log Message:
 vclean() actually sets v_tag to VT_NON but doesn't touch v_type.
 getcleanvnode() sets v_type to VNON after releasing v_interlock.
 So the thread doing quotaon(), quotaoff() or qsync() could vget()
 a vnode which is being recycled in getcleanvnode(), after is has
 been cleaned and v_interlock released, but before v_type has been
 reset, leading to KASSERT(vp->v_usecount == 1) firing in
 getnewvnode(), or qsync() dereferending a NULL pointer as in
 PR kern/42205.
 Fix by using the same tests as other ffs function traversing the mount
 list: also check for VTOI(vp) == NULL, and VI_XLOCK in addition
 to VI_CLEAN.


 To generate a diff of this commit:
 cvs rdiff -u -r1.64 -r1.65 src/sys/ufs/ufs/ufs_quota.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

Responsible-Changed-From-To: kern-bug-people->bouyer
Responsible-Changed-By: bouyer@NetBSD.org
Responsible-Changed-When: Sat, 16 Jan 2010 17:09:16 +0000
Responsible-Changed-Why:
I tried to track the problem down ...


State-Changed-From-To: open->feedback
State-Changed-By: bouyer@NetBSD.org
State-Changed-When: Sat, 16 Jan 2010 17:09:16 +0000
State-Changed-Why:
Hi,
any news from the last patch ?


From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Cc: bouyer@NetBSD.org, kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org, 
    gnats-admin@netbsd.org, bouyer@NetBSD.org
Subject: Re: kern/42205 (kernel panic  at activated userquota)
Date: Sun, 17 Jan 2010 19:58:55 +0100 (CET)

 On Sat, 16 Jan 2010, bouyer@NetBSD.org wrote:
 > any news from the last patch ?

 I applied the patch of 6 Jan 2010 against the kernel including the 
 previous patches. No crashes or vprint messages in the last days.

 I think the patches are solving the problem.


 Thank you for your efforts
 Uwe

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: 6bone@6bone.informatik.uni-leipzig.de
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, netbsd-bugs@NetBSD.org,
        gnats-admin@NetBSD.org
Subject: Re: kern/42205 (kernel panic  at activated userquota)
Date: Mon, 18 Jan 2010 12:37:59 +0100

 On Sun, Jan 17, 2010 at 07:58:55PM +0100, 6bone@6bone.informatik.uni-leipzig.de wrote:
 > On Sat, 16 Jan 2010, bouyer@NetBSD.org wrote:
 >> any news from the last patch ?
 >
 > I applied the patch of 6 Jan 2010 against the kernel including the  
 > previous patches. No crashes or vprint messages in the last days.
 >
 > I think the patches are solving the problem.

 Good, that's great news !
 I applied to HEAD the same patch, and a similar fix for
 quotaon() and quotaoff(), I will request a pullup to netbsd-5.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

State-Changed-From-To: feedback->pending-pullups
State-Changed-By: bouyer@NetBSD.org
State-Changed-When: Mon, 18 Jan 2010 20:42:00 +0000
State-Changed-Why:
Ticket pullu-5/1252


From: Stephen Borrill <sborrill@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/42205 CVS commit: [netbsd-5] src/sys/ufs/ufs
Date: Wed, 27 Jan 2010 21:26:45 +0000

 Module Name:	src
 Committed By:	sborrill
 Date:		Wed Jan 27 21:26:45 UTC 2010

 Modified Files:
 	src/sys/ufs/ufs [netbsd-5]: ufs_quota.c

 Log Message:
 Pull up the following revisions(s) (requested by bouyer in ticket #1252):
 	sys/ufs/ufs/ufs_quota.c:	revision 1.65

 vclean() actually sets v_tag to VT_NON but doesn't touch v_type.
 getcleanvnode() sets v_type to VNON after releasing v_interlock.
 So the thread doing quotaon(), quotaoff() or qsync() could vget()
 a vnode which is being recycled in getcleanvnode(), after it has
 been cleaned and v_interlock released, but before v_type has been
 reset, leading to KASSERT(vp->v_usecount == 1) firing in
 getnewvnode(), or qsync() dereferencing a NULL pointer as in
 PR kern/42205.
 Fix by using the same tests as other ffs functions traversing the mount
 list: also check for VTOI(vp) == NULL, and VI_XLOCK in addition
 to VI_CLEAN.


 To generate a diff of this commit:
 cvs rdiff -u -r1.60.10.3 -r1.60.10.4 src/sys/ufs/ufs/ufs_quota.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: bouyer@NetBSD.org
State-Changed-When: Wed, 27 Jan 2010 21:38:06 +0000
State-Changed-Why:
Pulled up to netbsd-5


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.