NetBSD Problem Report #42205
From www@NetBSD.org Tue Oct 20 16:07:41 2009
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id F2FF063B902
for <gnats-bugs@gnats.netbsd.org>; Tue, 20 Oct 2009 16:07:40 +0000 (UTC)
Message-Id: <20091020160740.A3BFF63B877@www.NetBSD.org>
Date: Tue, 20 Oct 2009 16:07:40 +0000 (UTC)
From: 6bone@6bone.informatik.uni-leipzig.de
Reply-To: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Subject: kernel panic at activated userquota
X-Send-Pr-Version: www-1.0
>Number: 42205
>Category: kern
>Synopsis: kernel panic at activated userquota
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: bouyer
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Oct 20 16:10:00 +0000 2009
>Closed-Date: Wed Jan 27 21:38:06 +0000 2010
>Last-Modified: Wed Jan 27 21:38:06 +0000 2010
>Originator: Uwe Toenjes
>Release: NetBSD 5.0_STABLE
>Organization:
University of Leipzig
>Environment:
NetBSD 6bone.informatik.uni-leipzig.de 5.0_STABLE NetBSD 5.0_STABLE (MYCONF) #0: Fri Oct 16 11:16:05 CEST 2009 root@6bone.informatik.uni-leipzig.de:/usr/obj/sys/arch/amd64/compile/MYCONF amd64
>Description:
at high i/o a kernel panic can occur if you are using quota. without userquota everything works fine.
uvm_fault(0xffffffff80c79620, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff803e46b3 cs 8 rflags 10246 cr2 70 cpl 0 rsp fff0
kernel: page fault trap, code=0
Stopped in pid 0.61 (system) at netbsd:qsync+0x103: movq 0x70(%rdx,%rax,8
),%r14
db{0}> trace
qsync() at netbsd:qsync+0x103
ffs_sync() at netbsd:ffs_sync+0x2eb
VFS_SYNC() at netbsd:VFS_SYNC+0x33
sync_fsync() at netbsd:sync_fsync+0x85
VOP_FSYNC() at netbsd:VOP_FSYNC+0x71
sched_sync() at netbsd:sched_sync+0x15d
db{0}> show registers
ds 0xfd28
es 0x60a3
fs 0xaac0
gs 0x12
rdi 0xffff80007d9cfd28
rsi 0
rbp 0xffff80007282ab20
rbx 0xffff80007d9cfd28
rdx 0
rcx 0
rax 0
r8 0x7000000
r9 0x420
r10 0x2c0e57
r11 0x1
r12 0xffff8000728164a0
r13 0xffff80008f4be758
r14 0
r15 0xffff800087f56f28
rip 0xffffffff803e46b3 qsync+0x103
cs 0x8
rflags 0x10246
rsp 0xffff80007282aaf0
ss 0
netbsd:qsync+0x103: movq 0x70(%rdx,%rax,8),%r14
db{0}> continue
uvm_fault(0xffffffff80c79620, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff803e46b3 cs 8 rflags 10246 cr2 70 cpl 0 rsp fff0
kernel: page fault trap, code=0
Stopped in pid 0.61 (system) at netbsd:qsync+0x103: movq 0x70(%rdx,%rax,8
),%r14
db{0}> continue
uvm_fault(0xffffffff80c79620, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff803e46b3 cs 8 rflags 10246 cr2 70 cpl 0 rsp fff0
kernel: page fault trap, code=0
Stopped in pid 0.61 (system) at netbsd:qsync+0x103: movq 0x70(%rdx,%rax,8
),%r14
>How-To-Repeat:
the panic occurs only by chance at high disk i/o with enabled userquota.
>Fix:
>Release-Note:
>Audit-Trail:
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/42205: kernel panic at activated userquota
Date: Tue, 20 Oct 2009 21:39:02 +0200
On Tue, Oct 20, 2009 at 04:10:00PM +0000, 6bone@6bone.informatik.uni-leipzig.de wrote:
> >Number: 42205
> >Category: kern
> >Synopsis: kernel panic at activated userquota
> >Confidential: no
> >Severity: serious
> >Priority: high
> >Responsible: kern-bug-people
> >State: open
> >Class: sw-bug
> >Submitter-Id: net
> >Arrival-Date: Tue Oct 20 16:10:00 +0000 2009
> >Originator: Uwe Toenjes
> >Release: NetBSD 5.0_STABLE
> >Organization:
> University of Leipzig
> >Environment:
> NetBSD 6bone.informatik.uni-leipzig.de 5.0_STABLE NetBSD 5.0_STABLE (MYCONF) #0: Fri Oct 16 11:16:05 CEST 2009 root@6bone.informatik.uni-leipzig.de:/usr/obj/sys/arch/amd64/compile/MYCONF amd64
> >Description:
> at high i/o a kernel panic can occur if you are using quota. without userquota everything works fine.
>
> uvm_fault(0xffffffff80c79620, 0x0, 1) -> e
> fatal page fault in supervisor mode
> trap type 6 code 0 rip ffffffff803e46b3 cs 8 rflags 10246 cr2 70 cpl 0 rsp fff0
> kernel: page fault trap, code=0
> Stopped in pid 0.61 (system) at netbsd:qsync+0x103: movq 0x70(%rdx,%rax,8
> ),%r14
> db{0}> trace
> qsync() at netbsd:qsync+0x103
> ffs_sync() at netbsd:ffs_sync+0x2eb
This seems to be:
0xffffffff8025d7c3 is in qsync (/dsk/l1/misc/bouyer/netbsd-5/src/sys/ufs/ufs/ufs_quota.c:747).
742 goto again;
743 }
744 continue;
745 }
746 for (i = 0; i < MAXQUOTAS; i++) {
747 dq = VTOI(vp)->i_dquot[i];
(gdb) print &((struct inode *)0)->i_dquot[0]
$1 = (struct dquot **) 0x70
Another case where a vnode could be vlean'ed while vget drops the
interlock to before getting the vn_lock.
The attached patch may help, but it's untested and probably not the
right way of fixing this.
Any idea how to properly fix vget() anyone ?
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/42205: kernel panic at activated userquota
Date: Tue, 20 Oct 2009 22:12:45 +0200
--Bn2rw/3z4jIqBvZU
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
On Tue, Oct 20, 2009 at 09:39:02PM +0200, Manuel Bouyer wrote:
> The attached patch may help, but it's untested and probably not the
> right way of fixing this.
Ops, I did it again.
The patch is really attached this time
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
--Bn2rw/3z4jIqBvZU
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename=diff
Index: ufs_quota.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ufs/ufs_quota.c,v
retrieving revision 1.60.10.3
diff -u -p -u -r1.60.10.3 ufs_quota.c
--- ufs_quota.c 7 Aug 2009 05:59:44 -0000 1.60.10.3
+++ ufs_quota.c 20 Oct 2009 19:38:13 -0000
@@ -743,6 +743,13 @@ qsync(struct mount *mp)
}
continue;
}
+ if (VTOI(vp) == NULL) {
+ mutex_enter(&mntvnode_lock);
+ vunmark(mvp);
+ vlockmgr(vp->v_vnlock, LK_RELEASE);
+ vrele(vp);
+ goto again;
+ }
for (i = 0; i < MAXQUOTAS; i++) {
dq = VTOI(vp)->i_dquot[i];
if (dq == NODQUOT)
--Bn2rw/3z4jIqBvZU--
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/42205: kernel panic at activated userquota
Date: Wed, 21 Oct 2009 21:16:28 +0200 (CEST)
On Tue, 20 Oct 2009, Manuel Bouyer wrote:
>
> Another case where a vnode could be vlean'ed while vget drops the
> interlock to before getting the vn_lock.
> The attached patch may help, but it's untested and probably not the
> right way of fixing this.
>
> Any idea how to properly fix vget() anyone ?
I applied the patch. There was no crash in the last 24h. But it needs some
more days to say if it now stable or not.
Thank you for your efforts.
Uwe
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/42205: kernel panic at activated userquota
Date: Mon, 26 Oct 2009 06:49:31 +0100 (CET)
The patch works fine and solves the problem.
Regards
Uwe
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: 6bone@6bone.informatik.uni-leipzig.de
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
netbsd-bugs@NetBSD.org
Subject: Re: kern/42205: kernel panic at activated userquota
Date: Mon, 26 Oct 2009 12:32:22 +0100
--NzB8fVQJ5HfG6fxh
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
On Mon, Oct 26, 2009 at 06:49:31AM +0100, 6bone@6bone.informatik.uni-leipzig.de wrote:
> The patch works fine and solves the problem.
Thanks. But, I think this was not the right way to fix the problem.
Can you try the attached one instead ? It should fix your problem, and some
others (including one that I fixed the wrong way some time ago, this is
the code in #if 0/#endif in the patch).
I'm running with this patch on several systems now and it seems to work fine.
--
Manuel Bouyer, LIP6, Universite Paris VI. Manuel.Bouyer@lip6.fr
NetBSD: 26 ans d'experience feront toujours la difference
--
--NzB8fVQJ5HfG6fxh
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="vget.diff"
Index: sys/ufs/ufs/ufs_ihash.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ufs/ufs_ihash.c,v
retrieving revision 1.26.10.1
diff -u -p -u -r1.26.10.1 ufs_ihash.c
--- sys/ufs/ufs/ufs_ihash.c 28 Sep 2009 01:43:02 -0000 1.26.10.1
+++ sys/ufs/ufs/ufs_ihash.c 24 Oct 2009 13:33:23 -0000
@@ -152,6 +152,7 @@ ufs_ihashget(dev_t dev, ino_t inum, int
mutex_exit(&ufs_ihash_lock);
if (vget(vp, flags | LK_INTERLOCK))
goto loop;
+#if 0
if (VTOI(vp) != ip ||
ip->i_number != inum || ip->i_dev != dev) {
/* lost race against vclean() */
@@ -161,6 +162,7 @@ ufs_ihashget(dev_t dev, ino_t inum, int
vp = NULL;
goto loop;
}
+#endif
}
return (vp);
}
Index: sys/kern/vfs_subr.c
===================================================================
RCS file: /cvsroot/src/sys/kern/vfs_subr.c,v
retrieving revision 1.357.4.5
diff -u -p -u -r1.357.4.5 vfs_subr.c
--- sys/kern/vfs_subr.c 21 Jul 2009 00:31:58 -0000 1.357.4.5
+++ sys/kern/vfs_subr.c 24 Oct 2009 13:33:23 -0000
@@ -370,6 +370,17 @@ try_nextlist:
vp->v_freelisthd = NULL;
mutex_exit(&vnode_free_list_lock);
+ if (vp->v_usecount != 0) {
+ /*
+ * was referenced again before we got the interlock
+ * Don't return to freelist - the holder of the last
+ * reference will destroy it.
+ */
+ vrelel(vp, 0); /* releases vp->v_interlock */
+ mutex_enter(&vnode_free_list_lock);
+ goto retry;
+ }
+
/*
* The vnode is still associated with a file system, so we must
* clean it out before reusing it. We need to add a reference
@@ -1288,6 +1299,22 @@ vget(vnode_t *vp, int flags)
vrelel(vp, 0);
return ENOENT;
}
+
+ if ((vp->v_iflag & VI_INACTNOW) != 0) {
+ /*
+ * if it's being desactived, wait for it to complete.
+ * Make sure to not return a clean vnode.
+ */
+ if ((flags & LK_NOWAIT) != 0) {
+ vrelel(vp, 0);
+ return EBUSY;
+ }
+ vwait(vp, VI_INACTNOW);
+ if ((vp->v_iflag & VI_CLEAN) != 0) {
+ vrelel(vp, 0);
+ return ENOENT;
+ }
+ }
if (flags & LK_TYPE_MASK) {
error = vn_lock(vp, flags | LK_INTERLOCK);
if (error != 0) {
@@ -1427,6 +1454,7 @@ vrelel(vnode_t *vp, int flags)
if (++vrele_pending > (desiredvnodes >> 8))
cv_signal(&vrele_cv);
mutex_exit(&vrele_lock);
+ cv_broadcast(&vp->v_cv);
mutex_exit(&vp->v_interlock);
return;
}
@@ -1451,6 +1479,7 @@ vrelel(vnode_t *vp, int flags)
VOP_INACTIVE(vp, &recycle);
mutex_enter(&vp->v_interlock);
vp->v_iflag &= ~VI_INACTNOW;
+ cv_broadcast(&vp->v_cv);
if (!recycle) {
if (vtryrele(vp)) {
mutex_exit(&vp->v_interlock);
--NzB8fVQJ5HfG6fxh--
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/42205: kernel panic at activated userquota
Date: Wed, 28 Oct 2009 07:39:09 +0100 (CET)
I applied the new patch. There was no crash in the last 24h. But it needs
some more days to say if it stable or not.
Thank you for your efforts.
Uwe
On Mon, 26 Oct 2009, Manuel Bouyer wrote:
> Thanks. But, I think this was not the right way to fix the problem.
> Can you try the attached one instead ? It should fix your problem, and some
> others (including one that I fixed the wrong way some time ago, this is
> the code in #if 0/#endif in the patch).
>
> I'm running with this patch on several systems now and it seems to work fine.
>
> --
> Manuel Bouyer, LIP6, Universite Paris VI. Manuel.Bouyer@lip6.fr
> NetBSD: 26 ans d'experience feront toujours la difference
> --
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/42205: kernel panic at activated userquota
Date: Thu, 29 Oct 2009 08:34:15 +0100 (CET)
The new patch does not work. There was a crash last night.
Unfortunately I can't report a trace, because ddb.onpanic was 0.
Regards
Uwe
On Wed, 28 Oct 2009, 6bone@6bone.informatik.uni-leipzig.de wrote:
> Date: Wed, 28 Oct 2009 06:40:04 +0000 (UTC)
> From: 6bone@6bone.informatik.uni-leipzig.de
> Reply-To: gnats-bugs@NetBSD.org
> To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
> netbsd-bugs@netbsd.org, 6bone@6bone.informatik.uni-leipzig.de
> Subject: Re: kern/42205: kernel panic at activated userquota
>
> The following reply was made to PR kern/42205; it has been noted by GNATS.
>
> From: 6bone@6bone.informatik.uni-leipzig.de
> To: gnats-bugs@NetBSD.org
> Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
> Subject: Re: kern/42205: kernel panic at activated userquota
> Date: Wed, 28 Oct 2009 07:39:09 +0100 (CET)
>
> I applied the new patch. There was no crash in the last 24h. But it needs
> some more days to say if it stable or not.
>
> Thank you for your efforts.
>
> Uwe
>
>
>
> On Mon, 26 Oct 2009, Manuel Bouyer wrote:
>
> > Thanks. But, I think this was not the right way to fix the problem.
> > Can you try the attached one instead ? It should fix your problem, and some
> > others (including one that I fixed the wrong way some time ago, this is
> > the code in #if 0/#endif in the patch).
> >
> > I'm running with this patch on several systems now and it seems to work fine.
> >
> > --
> > Manuel Bouyer, LIP6, Universite Paris VI. Manuel.Bouyer@lip6.fr
> > NetBSD: 26 ans d'experience feront toujours la difference
> > --
>
>
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/42205: kernel panic at activated userquota
Date: Fri, 30 Oct 2009 08:30:54 +0100 (CET)
Hello
now a dump from the last panic:
uvm_fault(0xffffffff80c7a620, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff803e4863 cs 8 rflags 10246 cr2 70 cpl 0
rsp fff0
kernel: page fault trap, code=0
Stopped in pid 0.61 (system) at netbsd:qsync+0x103: movq
0x70(%rdx,%rax,8
),%r14
db{2}> trace
qsync() at netbsd:qsync+0x103
ffs_sync() at netbsd:ffs_sync+0x2eb
VFS_SYNC() at netbsd:VFS_SYNC+0x33
sync_fsync() at netbsd:sync_fsync+0x85
VOP_FSYNC() at netbsd:VOP_FSYNC+0x71
sched_sync() at netbsd:sched_sync+0x15d
db{2}> show registers
ds 0xb3e8
es 0x66a3
fs 0xaac0
gs 0x12
rdi 0xffff8000ada6b3e8
rsi 0
rbp 0xffff80007282ab20
rbx 0xffff8000ada6b3e8
rdx 0
rcx 0
rax 0
r8 0xffff80001f264000
r9 0x7c
r10 0xffff80001f264080
r11 0
r12 0xffff8000728164a0
r13 0xffff80008cd4d188
r14 0
r15 0xffff800072dddd48
rip 0xffffffff803e4863 qsync+0x103
cs 0x8
rflags 0x10246
rsp 0xffff80007282aaf0
ss 0x10
netbsd:qsync+0x103: movq 0x70(%rdx,%rax,8),%r14
db{2}> cont
uvm_fault(0xffffffff80c7a620, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip ffffffff803e4863 cs 8 rflags 10246 cr2 70 cpl 0
rsp fff0
kernel: page fault trap, code=0
Stopped in pid 0.61 (system) at netbsd:qsync+0x103: movq
0x70(%rdx,%rax,8
),%r14
regards
Uwe
On Thu, 29 Oct 2009,
6bone@6bone.informatik.uni-leipzig.de wrote:
> Date: Thu, 29 Oct 2009 07:35:01 +0000 (UTC)
> From: 6bone@6bone.informatik.uni-leipzig.de
> Reply-To: gnats-bugs@NetBSD.org
> To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
> netbsd-bugs@netbsd.org, 6bone@6bone.informatik.uni-leipzig.de
> Subject: Re: kern/42205: kernel panic at activated userquota
>
> The following reply was made to PR kern/42205; it has been noted by GNATS.
>
> From: 6bone@6bone.informatik.uni-leipzig.de
> To: gnats-bugs@NetBSD.org
> Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
> Subject: Re: kern/42205: kernel panic at activated userquota
> Date: Thu, 29 Oct 2009 08:34:15 +0100 (CET)
>
> The new patch does not work. There was a crash last night.
> Unfortunately I can't report a trace, because ddb.onpanic was 0.
>
>
> Regards
> Uwe
>
> On Wed, 28 Oct 2009, 6bone@6bone.informatik.uni-leipzig.de wrote:
>
> > Date: Wed, 28 Oct 2009 06:40:04 +0000 (UTC)
> > From: 6bone@6bone.informatik.uni-leipzig.de
> > Reply-To: gnats-bugs@NetBSD.org
> > To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
> > netbsd-bugs@netbsd.org, 6bone@6bone.informatik.uni-leipzig.de
> > Subject: Re: kern/42205: kernel panic at activated userquota
> >
> > The following reply was made to PR kern/42205; it has been noted by GNATS.
> >
> > From: 6bone@6bone.informatik.uni-leipzig.de
> > To: gnats-bugs@NetBSD.org
> > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
> > Subject: Re: kern/42205: kernel panic at activated userquota
> > Date: Wed, 28 Oct 2009 07:39:09 +0100 (CET)
> >
> > I applied the new patch. There was no crash in the last 24h. But it needs
> > some more days to say if it stable or not.
> >
> > Thank you for your efforts.
> >
> > Uwe
> >
> >
> >
> > On Mon, 26 Oct 2009, Manuel Bouyer wrote:
> >
> > > Thanks. But, I think this was not the right way to fix the problem.
> > > Can you try the attached one instead ? It should fix your problem, and some
> > > others (including one that I fixed the wrong way some time ago, this is
> > > the code in #if 0/#endif in the patch).
> > >
> > > I'm running with this patch on several systems now and it seems to work fine.
> > >
> > > --
> > > Manuel Bouyer, LIP6, Universite Paris VI. Manuel.Bouyer@lip6.fr
> > > NetBSD: 26 ans d'experience feront toujours la difference
> > > --
> >
> >
>
>
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org,
6bone@6bone.informatik.uni-leipzig.de
Subject: Re: kern/42205: kernel panic at activated userquota
Date: Fri, 30 Oct 2009 16:27:52 +0100
--QTprm0S8XgL7H0Dt
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
On Fri, Oct 30, 2009 at 07:35:02AM +0000, 6bone@6bone.informatik.uni-leipzig.de wrote:
> Hello
>
> now a dump from the last panic:
>
> uvm_fault(0xffffffff80c7a620, 0x0, 1) -> e
> fatal page fault in supervisor mode
> trap type 6 code 0 rip ffffffff803e4863 cs 8 rflags 10246 cr2 70 cpl 0
> rsp fff0
> kernel: page fault trap, code=0
> Stopped in pid 0.61 (system) at netbsd:qsync+0x103: movq
> 0x70(%rdx,%rax,8
> ),%r14
> db{2}> trace
> qsync() at netbsd:qsync+0x103
> ffs_sync() at netbsd:ffs_sync+0x2eb
> VFS_SYNC() at netbsd:VFS_SYNC+0x33
> sync_fsync() at netbsd:sync_fsync+0x85
> VOP_FSYNC() at netbsd:VOP_FSYNC+0x71
> sched_sync() at netbsd:sched_sync+0x15d
So it's still VOTI(vp) being NULL.
Can you install the attached patch ? When it runs on a null inode here
it will print the associated vnode (and hopefully avoid the panic :)
Please monitor console output or dmesg and when the vprint fires,
report it there.
--
Manuel Bouyer, LIP6, Universite Paris VI. Manuel.Bouyer@lip6.fr
NetBSD: 26 ans d'experience feront toujours la difference
--
--QTprm0S8XgL7H0Dt
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="quota.diff"
Index: ufs/ufs/ufs_quota.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ufs/ufs_quota.c,v
retrieving revision 1.60.10.3
diff -u -p -u -r1.60.10.3 ufs_quota.c
--- ufs/ufs/ufs_quota.c 7 Aug 2009 05:59:44 -0000 1.60.10.3
+++ ufs/ufs/ufs_quota.c 30 Oct 2009 15:27:13 -0000
@@ -743,6 +743,14 @@ qsync(struct mount *mp)
}
continue;
}
+ if (VTOI(vp) == NULL) {
+ vprint("qsync vp wihout ip", vp);
+ mutex_enter(&mntvnode_lock);
+ vunmark(mvp);
+ vlockmgr(vp->v_vnlock, LK_RELEASE);
+ vrele(vp);
+ goto again;
+ }
for (i = 0; i < MAXQUOTAS; i++) {
dq = VTOI(vp)->i_dquot[i];
if (dq == NODQUOT)
--QTprm0S8XgL7H0Dt--
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/42205: kernel panic at activated userquota
Date: Thu, 5 Nov 2009 07:55:41 +0100 (CET)
qsync vp wihout ip: vnode @ 0xffff8000974df5f0, flags (10<MPSAFE>)
tag VT_UFS(1), type VLNK(5), usecount 1, writecount 0, holdcount
0
freelisthd 0x0, mount 0xffff800072988000, data
0xffff8000974e0dc0 lock 0xffff8000974df6f8 recursecnt 0
tag VT_UFS, ino 55338475, on dev 19, 4 flags 0x0, effnlink 1,
nlink 1
mode 0120775, owner 1007, group 100, size 31
Regards
Uwe
On Fri,
30 Oct 2009, Manuel Bouyer wrote:
> Date: Fri, 30 Oct 2009 15:30:05 +0000 (UTC)
> From: Manuel Bouyer <bouyer@antioche.eu.org>
> Reply-To: gnats-bugs@NetBSD.org
> To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
> netbsd-bugs@netbsd.org, 6bone@6bone.informatik.uni-leipzig.de
> Subject: Re: kern/42205: kernel panic at activated userquota
>
> The following reply was made to PR kern/42205; it has been noted by GNATS.
>
> From: Manuel Bouyer <bouyer@antioche.eu.org>
> To: gnats-bugs@NetBSD.org
> Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org,
> 6bone@6bone.informatik.uni-leipzig.de
> Subject: Re: kern/42205: kernel panic at activated userquota
> Date: Fri, 30 Oct 2009 16:27:52 +0100
>
> --QTprm0S8XgL7H0Dt
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: inline
>
> On Fri, Oct 30, 2009 at 07:35:02AM +0000, 6bone@6bone.informatik.uni-leipzig.de wrote:
> > Hello
> >
> > now a dump from the last panic:
> >
> > uvm_fault(0xffffffff80c7a620, 0x0, 1) -> e
> > fatal page fault in supervisor mode
> > trap type 6 code 0 rip ffffffff803e4863 cs 8 rflags 10246 cr2 70 cpl 0
> > rsp fff0
> > kernel: page fault trap, code=0
> > Stopped in pid 0.61 (system) at netbsd:qsync+0x103: movq
> > 0x70(%rdx,%rax,8
> > ),%r14
> > db{2}> trace
> > qsync() at netbsd:qsync+0x103
> > ffs_sync() at netbsd:ffs_sync+0x2eb
> > VFS_SYNC() at netbsd:VFS_SYNC+0x33
> > sync_fsync() at netbsd:sync_fsync+0x85
> > VOP_FSYNC() at netbsd:VOP_FSYNC+0x71
> > sched_sync() at netbsd:sched_sync+0x15d
>
> So it's still VOTI(vp) being NULL.
> Can you install the attached patch ? When it runs on a null inode here
> it will print the associated vnode (and hopefully avoid the panic :)
> Please monitor console output or dmesg and when the vprint fires,
> report it there.
>
> --
> Manuel Bouyer, LIP6, Universite Paris VI. Manuel.Bouyer@lip6.fr
> NetBSD: 26 ans d'experience feront toujours la difference
> --
>
> --QTprm0S8XgL7H0Dt
> Content-Type: text/plain; charset=us-ascii
> Content-Disposition: attachment; filename="quota.diff"
>
> Index: ufs/ufs/ufs_quota.c
> ===================================================================
> RCS file: /cvsroot/src/sys/ufs/ufs/ufs_quota.c,v
> retrieving revision 1.60.10.3
> diff -u -p -u -r1.60.10.3 ufs_quota.c
> --- ufs/ufs/ufs_quota.c 7 Aug 2009 05:59:44 -0000 1.60.10.3
> +++ ufs/ufs/ufs_quota.c 30 Oct 2009 15:27:13 -0000
> @@ -743,6 +743,14 @@ qsync(struct mount *mp)
> }
> continue;
> }
> + if (VTOI(vp) == NULL) {
> + vprint("qsync vp wihout ip", vp);
> + mutex_enter(&mntvnode_lock);
> + vunmark(mvp);
> + vlockmgr(vp->v_vnlock, LK_RELEASE);
> + vrele(vp);
> + goto again;
> + }
> for (i = 0; i < MAXQUOTAS; i++) {
> dq = VTOI(vp)->i_dquot[i];
> if (dq == NODQUOT)
>
> --QTprm0S8XgL7H0Dt--
>
>
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: 6bone@6bone.informatik.uni-leipzig.de
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
netbsd-bugs@NetBSD.org
Subject: Re: kern/42205: kernel panic at activated userquota
Date: Sat, 21 Nov 2009 18:58:17 +0100
On Thu, Nov 05, 2009 at 07:55:41AM +0100, 6bone@6bone.informatik.uni-leipzig.de wrote:
> qsync vp wihout ip: vnode @ 0xffff8000974df5f0, flags (10<MPSAFE>)
> tag VT_UFS(1), type VLNK(5), usecount 1, writecount 0, holdcount
> 0
> freelisthd 0x0, mount 0xffff800072988000, data
> 0xffff8000974e0dc0 lock 0xffff8000974df6f8 recursecnt 0
> tag VT_UFS, ino 55338475, on dev 19, 4 flags 0x0, effnlink 1,
> nlink 1
> mode 0120775, owner 1007, group 100, size 31
Wow, not that's strange. we get there because VTOI(vp) == NULL.
VTOI is ((struct inode *)(vp)->v_data), and v_data is obviously not NULL
in this vnode. How could this happen ?
Hum, can you send the dmesg and 'cpuctl identify cpu0' for this machine ?
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/42205: kernel panic at activated userquota
Date: Sat, 21 Nov 2009 19:13:37 +0100 (CET)
On Sat, 21 Nov 2009, Manuel Bouyer wrote:
> Hum, can you send the dmesg and 'cpuctl identify cpu0' for this machine ?
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009
The NetBSD Foundation, Inc. All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
NetBSD 5.0_STABLE (MYCONF) #0: Thu Nov 12 13:17:15 CET 2009
root@6bone.informatik.uni-leipzig.de:/usr/obj/sys/arch/amd64/compile/MYCONF
total memory = 16378 MB
avail memory = 15865 MB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
SMBIOS rev. 2.5 @ 0xbfb9c000 (66 entries)
Dell Inc. PowerEdge 1950
mainbus0 (root)
cpu0 at mainbus0 apid 0: Intel 686-class, 1995MHz, id 0x6f6
cpu1 at mainbus0 apid 6: Intel 686-class, 1995MHz, id 0x6f6
cpu2 at mainbus0 apid 1: Intel 686-class, 1995MHz, id 0x6f6
cpu3 at mainbus0 apid 7: Intel 686-class, 1995MHz, id 0x6f6
ioapic0 at mainbus0 apid 8: pa 0xfec00000, version 20, 24 pins
ioapic1 at mainbus0 apid 9: pa 0xfec81000, version 20, 24 pins
acpi0 at mainbus0: Intel ACPICA 20080321
acpi0: X/RSDT: OemId <DELL ,PE_SC3 ,00000001>, AslId <DELL,00000001>
acpi0: SCI interrupting at int 9
acpi0: fixed-feature power button present
timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
ACPI-Fast 24-bit timer
attimer1 at acpi0 (TMR, PNP0100): io 0x40-0x5f irq 0
COMA (PNP0501) at acpi0 not configured
COMB (PNP0501) at acpi0 not configured
hpet0 at acpi0 (HPET, PNP0103-0): mem 0xfed00000-0xfed003ff
timecounter: Timecounter "hpet0" frequency 14318179 Hz quality 2000
ipmi0 at mainbus0
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0
pchb0: vendor 0x8086 product 0x25c0 (rev. 0x12)
ppb0 at pci0 dev 2 function 0: vendor 0x8086 product 0x25e2 (rev. 0x12)
pci1 at ppb0 bus 6
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
ppb1 at pci1 dev 0 function 0: vendor 0x8086 product 0x3500 (rev. 0x01)
pci2 at ppb1 bus 7
pci2: i/o space, memory space enabled, rd/line, wr/inv ok
ppb2 at pci2 dev 0 function 0: vendor 0x8086 product 0x3510 (rev. 0x01)
pci3 at ppb2 bus 8
pci3: i/o space, memory space enabled, rd/line, wr/inv ok
ppb3 at pci2 dev 1 function 0: vendor 0x8086 product 0x3514 (rev. 0x01)
pci4 at ppb3 bus 10
pci4: i/o space, memory space enabled, rd/line, wr/inv ok
ppb4 at pci1 dev 0 function 3: vendor 0x8086 product 0x350c (rev. 0x01)
ppb4: disabling notification events
pci5 at ppb4 bus 11
pci5: i/o space, memory space enabled, rd/line, wr/inv ok
ppb5 at pci0 dev 3 function 0: vendor 0x8086 product 0x25e3 (rev. 0x12)
pci6 at ppb5 bus 1
pci6: i/o space, memory space enabled, rd/line, wr/inv ok
ppb6 at pci6 dev 0 function 0: vendor 0x8086 product 0x0370 (rev. 0x00)
ppb6: disabling notification events
pci7 at ppb6 bus 2
pci7: i/o space, memory space enabled, rd/line, wr/inv ok
mfi0 at pci7 dev 14 function 0: Dell PERC 5/i integrated
mfi0: interrupting at ioapic1 pin 14
mfi0: logical drives 1, version 5.1.1-0040, 256MB RAM
scsibus0 at mfi0: 64 targets, 8 luns per target
ppb7 at pci6 dev 0 function 2: vendor 0x8086 product 0x0372 (rev. 0x00)
ppb7: disabling notification events
pci8 at ppb7 bus 3
pci8: i/o space, memory space enabled, rd/line, wr/inv ok
ppb8 at pci0 dev 4 function 0: vendor 0x8086 product 0x25f8 (rev. 0x12)
pci9 at ppb8 bus 12
pci9: i/o space, memory space enabled, rd/line, wr/inv ok
wm0 at pci9 dev 0 function 0: Intel PRO/1000 PT (82571EB), rev. 6
wm0: interrupting at ioapic0 pin 16
wm0: PCI-Express bus
wm0: 65536 word (16 address bits) SPI EEPROM
wm0: Ethernet address 00:15:17:0e:98:5e
igphy0 at wm0 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0
igphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
wm1 at pci9 dev 0 function 1: Intel PRO/1000 PT (82571EB), rev. 6
wm1: interrupting at ioapic0 pin 17
wm1: PCI-Express bus
wm1: 65536 word (16 address bits) SPI EEPROM
wm1: Ethernet address 00:15:17:0e:98:5f
igphy1 at wm1 phy 1: Intel IGP01E1000 Gigabit PHY, rev. 0
igphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 1000baseT-FDX, auto
ppb9 at pci0 dev 5 function 0: vendor 0x8086 product 0x25e5 (rev. 0x12)
pci10 at ppb9 bus 13
pci10: i/o space, memory space enabled, rd/line, wr/inv ok
ppb10 at pci0 dev 6 function 0: vendor 0x8086 product 0x25f9 (rev. 0x12)
pci11 at ppb10 bus 14
pci11: i/o space, memory space enabled, rd/line, wr/inv ok
ppb11 at pci11 dev 0 function 0: vendor 0x8086 product 0x0330 (rev. 0x07)
ppb11: disabling notification events
pci12 at ppb11 bus 15
pci12: i/o space, memory space enabled, rd/line, wr/inv ok
amr0 at pci12 dev 14 function 0: AMI RAID <PERC 4e/DC>
amr0: interrupting at ioapic0 pin 18
amr0: firmware 522A, BIOS H430, 128MB RAM
ld0 at amr0 unit 0: RAID 0, optimal
ld0: 1629 GB, 212749 cyl, 255 head, 63 sec, 512 bytes/sect x 3417825280 sectors
ld1 at amr0 unit 1: RAID 0, optimal
ld1: 1630 GB, 212847 cyl, 255 head, 63 sec, 512 bytes/sect x 3419402240 sectors
ld2 at amr0 unit 2: RAID 0, optimal
ld2: 1396 GB, 182350 cyl, 255 head, 63 sec, 512 bytes/sect x 2929459200 sectors
ld3 at amr0 unit 3: RAID 0, optimal
ld3: 1862 GB, 243149 cyl, 255 head, 63 sec, 512 bytes/sect x 3906191360 sectors
ld4 at amr0 unit 4: RAID 0, optimal
ld4: 1396 GB, 182350 cyl, 255 head, 63 sec, 512 bytes/sect x 2929459200 sectors
ppb12 at pci11 dev 0 function 2: vendor 0x8086 product 0x0332 (rev. 0x07)
ppb12: disabling notification events
pci13 at ppb12 bus 16
pci13: i/o space, memory space enabled, rd/line, wr/inv ok
ppb13 at pci0 dev 7 function 0: vendor 0x8086 product 0x25e7 (rev. 0x12)
pci14 at ppb13 bus 17
pci14: i/o space, memory space enabled, rd/line, wr/inv ok
pchb1 at pci0 dev 16 function 0
pchb1: vendor 0x8086 product 0x25f0 (rev. 0x12)
pchb2 at pci0 dev 16 function 1
pchb2: vendor 0x8086 product 0x25f0 (rev. 0x12)
pchb3 at pci0 dev 16 function 2
pchb3: vendor 0x8086 product 0x25f0 (rev. 0x12)
pchb4 at pci0 dev 17 function 0
pchb4: vendor 0x8086 product 0x25f1 (rev. 0x12)
pchb5 at pci0 dev 19 function 0
pchb5: vendor 0x8086 product 0x25f3 (rev. 0x12)
pchb6 at pci0 dev 21 function 0
pchb6: vendor 0x8086 product 0x25f5 (rev. 0x12)
pchb7 at pci0 dev 22 function 0
pchb7: vendor 0x8086 product 0x25f6 (rev. 0x12)
ppb14 at pci0 dev 28 function 0: vendor 0x8086 product 0x2690 (rev. 0x09)
pci15 at ppb14 bus 4
pci15: i/o space, memory space enabled, rd/line, wr/inv ok
uhci0 at pci0 dev 29 function 0: vendor 0x8086 product 0x2688 (rev. 0x09)
uhci0: interrupting at ioapic0 pin 21
usb0 at uhci0: USB revision 1.0
uhci1 at pci0 dev 29 function 1: vendor 0x8086 product 0x2689 (rev. 0x09)
uhci1: interrupting at ioapic0 pin 20
usb1 at uhci1: USB revision 1.0
uhci2 at pci0 dev 29 function 2: vendor 0x8086 product 0x268a (rev. 0x09)
uhci2: interrupting at ioapic0 pin 21
usb2 at uhci2: USB revision 1.0
ehci0 at pci0 dev 29 function 7: vendor 0x8086 product 0x268c (rev. 0x09)
ehci0: interrupting at ioapic0 pin 21
ehci0: EHCI version 1.0
ehci0: companion controllers, 2 ports each: uhci0 uhci1 uhci2
usb3 at ehci0: USB revision 2.0
ppb15 at pci0 dev 30 function 0: vendor 0x8086 product 0x244e (rev. 0xd9)
pci16 at ppb15 bus 18
pci16: i/o space, memory space enabled
vga0 at pci16 dev 13 function 0: vendor 0x1002 product 0x515e (rev. 0x02)
wsdisplay0 at vga0 kbdmux 1
wsmux1: connecting to wsdisplay0
drm at vga0 not configured
ichlpcib0 at pci0 dev 31 function 0
ichlpcib0: vendor 0x8086 product 0x2670 (rev. 0x09)
timecounter: Timecounter "ichlpcib0" frequency 3579545 Hz quality 1000
ichlpcib0: 24-bit timer
ichlpcib0: TCO (watchdog) timer configured.
piixide0 at pci0 dev 31 function 1
piixide0: Intel 631xESB/632xESB IDE Controller (rev. 0x09)
piixide0: bus-master DMA support present
piixide0: primary channel configured to compatibility mode
piixide0: primary channel interrupting at ioapic0 pin 14
atabus0 at piixide0 channel 0
piixide0: secondary channel configured to compatibility mode
piixide0: secondary channel ignored (disabled)
isa0 at ichlpcib0
com0 at isa0 port 0x3f8-0x3ff irq 4: ns16550a, working fifo
com0: console
com1 at isa0 port 0x2f8-0x2ff irq 3: ns16550a, working fifo
pckbc0 at isa0 port 0x60-0x64
pcppi0 at isa0 port 0x61
midi0 at pcppi0: PC speaker (CPU-intensive output)
sysbeep0 at pcppi0
attimer1: attached to pcppi0
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
timecounter: Timecounter "TSC" frequency 1995111120 Hz quality 3000
scsibus0: waiting 2 seconds for devices to settle...
atapibus0 at atabus0: 2 targets
uhub0 at usb0: vendor 0x8086 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub0: 2 ports with 2 removable, self powered
uhub1 at usb1: vendor 0x8086 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub1: 2 ports with 2 removable, self powered
cd0 at atapibus0 drive 0: <HL-DT-STCD-RW/DVD-ROM GCC-4244N, , B101> cdrom removable
cd0: 32-bit data port
cd0: drive supports PIO mode 4, DMA mode 2, Ultra-DMA mode 2 (Ultra/33)
cd0(piixide0:0:0): using PIO mode 4, DMA mode 2 (using DMA)
uhub2 at usb3: vendor 0x8086 EHCI root hub, class 9/0, rev 2.00/1.00, addr 1
uhub2: 6 ports with 6 removable, self powered
uhub3 at usb2: vendor 0x8086 UHCI root hub, class 9/0, rev 1.00/1.00, addr 1
uhub3: 2 ports with 2 removable, self powered
uhub4 at uhub2 port 1: vendor 0x413c product 0xa001, class 9/0, rev 2.00/0.00, addr 2
uhub4: multiple transaction translators
sd0 at scsibus0 target 0 lun 0: <DELL, PERC 5/i, 1.03> disk fixed
uhub4: 2 ports with 2 removable, self powered
sd0: fabricating a geometry
sd0: 136 GB, 139300 cyl, 64 head, 32 sec, 512 bytes/sect x 285286400 sectors
sd0: fabricating a geometry
uhidev0 at uhub4 port 1 configuration 1 interface 0
uhidev0: Dell DRAC5, rev 1.10/0.00, addr 3, iclass 3/1
ukbd0 at uhidev0
wskbd0 at ukbd0 mux 1
wskbd0: connecting to wsdisplay0
uhidev1 at uhub4 port 1 configuration 1 interface 1
uhidev1: Dell DRAC5, rev 1.10/0.00, addr 3, iclass 3/1
ums0 at uhidev1
ums0: X report 0x0002 not supported
umass0 at uhub4 port 2 configuration 1 interface 0
umass0: DELL INC. DRAC5 VIRTUAL MEDIA, rev 2.00/0.00, addr 4
umass0: using SCSI over Bulk-Only
scsibus1 at umass0: 2 targets, 1 lun per target
umass1 at uhub4 port 2 configuration 1 interface 1cd1 at scsibus1 target 0 lun 0: <Dell, Virtual CDROM, 123> cdrom removable
umass1: DELL INC. DRAC5 VIRTUAL MEDIA, rev 2.00/0.00, addr 4
umass1: using SCSI over Bulk-Only
scsibus2 at umass1: 2 targets, 1 lun per target
sd1 at scsibus2 target 0 lun 0: <Dell, Virtual Floppy, 123> disk removable
sd1: drive offline
sd1(umass1:0:0:0): Check Condition on CDB: 0x00 00 00 00 00 00
SENSE KEY: Not Ready
ASC/ASCQ: Medium Not Present
sd1: unable to open device, error = 19
uhub5 at uhub2 port 5: vendor 0x04b4 product 0x6560, class 9/0, rev 2.00/0.0b, addr 5
uhub5: multiple transaction translators
uhub5: 4 ports with 4 removable, self powered
uhidev2 at uhub1 port 1 configuration 1 interface 0
uhidev2: CHESEN PS2 to USB Converter, rev 1.10/0.10, addr 2, iclass 3/1
ukbd1 at uhidev2
wskbd1 at ukbd1 mux 1
wskbd1: connecting to wsdisplay0
uhidev3 at uhub1 port 1 configuration 1 interface 1
uhidev3: CHESEN PS2 to USB Converter, rev 1.10/0.10, addr 2, iclass 3/1
uhidev3: 3 report ids
ums1 at uhidev3 reportid 1: 5 buttons and Z dir.
wsmouse0 at ums1 mux 0
uhid0 at uhidev3 reportid 2: input=1, output=0, feature=0
uhid1 at uhidev3 reportid 3: input=3, output=0, feature=0
ipmi0: version 2.0 interface KCS iobase 0xca8/8 spacing 4
Kernelized RAIDframe activated
pad0: outputs: 44100Hz, 16-bit, stereo
audio0 at pad0: half duplex, playback, capture
sd1(umass1:0:0:0): Check Condition on CDB: 0x00 00 00 00 00 00
SENSE KEY: Not Ready
ASC/ASCQ: Medium Not Present
sd1(umass1:0:0:0): Check Condition on CDB: 0x00 00 00 00 00 00
SENSE KEY: Not Ready
ASC/ASCQ: Medium Not Present
sd1(umass1:0:0:0): Check Condition on CDB: 0x00 00 00 00 00 00
SENSE KEY: Not Ready
ASC/ASCQ: Medium Not Present
sd1(umass1:0:0:0): Check Condition on CDB: 0x00 00 00 00 00 00
SENSE KEY: Not Ready
ASC/ASCQ: Medium Not Present
boot device: sd0
root on sd0a dumps on sd0b
root file system type: ffs
mfi0: normal state on 'mfi0:0' (online)
raid0: Component /dev/ld2a being configured at col: 0
Column: 0 Num Columns: 3
Version: 2 Serial Number: 1223334444 Mod Counter: 350
Clean: No Status: 0
/dev/ld2a is not clean!
raid0: Component /dev/ld3a being configured at col: 1
Column: 1 Num Columns: 3
Version: 2 Serial Number: 1223334444 Mod Counter: 350
Clean: No Status: 0
/dev/ld3a is not clean!
raid0: Component /dev/ld4a being configured at col: 2
Column: 2 Num Columns: 3
Version: 2 Serial Number: 1223334444 Mod Counter: 350
Clean: No Status: 0
/dev/ld4a is not clean!
raid0: RAID Level 0
raid0: Components: /dev/ld2a /dev/ld3a /dev/ld4a
raid0: Total Sectors: 8788377216 (4291199 MB)
raid0: GPT GUID: 1e0b51e4-bbfa-11de-9484-0015170e985e
dk0 at raid0: 1e0b51ee-bbfa-11de-9484-0015170e985e
dk0: 134217728 blocks at 34, type: swap
dk1 at raid0: 1e0b51ee-bbfa-11de-9485-0015170e985e
dk1: 8654159421 blocks at 134217762, type: ffs
wsdisplay0: screen 1 added (80x25, vt100 emulation)
wsdisplay0: screen 2 added (80x25, vt100 emulation)
wsdisplay0: screen 3 added (80x25, vt100 emulation)
wsdisplay0: screen 4 added (80x25, vt100 emulation)
#################################################################
cpuctl list
Num HwId Unbound LWPs Interrupts Last change
---- ---- ------------ -------------- ----------------------------
0 0 online intr Thu Nov 19 08:07:34 2009
1 1 online intr Thu Nov 19 08:07:34 2009
2 2 online intr Thu Nov 19 08:07:34 2009
3 3 online intr Thu Nov 19 08:07:34 2009
#################################################################
cpuctl identify 0
cpu0: Intel Core 2 (Merom) (686-class), 1995.11 MHz, id 0x6f6
cpu0: features 0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR>
cpu0: features 0xbfebfbff<PGE,MCA,CMOV,PAT,PSE36,CFLUSH,DS,ACPI,MMX>
cpu0: features 0xbfebfbff<FXSR,SSE,SSE2,SS,HTT,TM,SBF>
cpu0: features2 0x4e33d<SSE3,DTES64,MONITOR,DS-CPL,VMX,TM2,SSSE3,CX16,xTPR,PDCM,DCA>
cpu0: features3 0x20100800<SYSCALL/SYSRET,XD,EM64T>
cpu0: "Intel(R) Xeon(R) CPU 5130 @ 2.00GHz"
cpu0: I-cache 32KB 64B/line 8-way, D-cache 32KB 64B/line 8-way
cpu0: L2 cache 4MB 64B/line 16-way
cpu0: ITLB 128 4KB entries 4-way
cpu0: DTLB 256 4KB entries 4-way, 16 4MB entries 4-way
cpu0: Initial APIC ID 0
cpu0: Cluster/Package ID 0
cpu0: Core ID 0
cpu0: family 06 model 0f extfamily 00 extmodel 00
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: 6bone@6bone.informatik.uni-leipzig.de
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
netbsd-bugs@NetBSD.org
Subject: Re: kern/42205: kernel panic at activated userquota
Date: Wed, 6 Jan 2010 19:29:55 +0100
--/9DWx/yDrRhgMJTb
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
On Sat, Nov 21, 2009 at 06:58:17PM +0100, Manuel Bouyer wrote:
> On Thu, Nov 05, 2009 at 07:55:41AM +0100, 6bone@6bone.informatik.uni-leipzig.de wrote:
> > qsync vp wihout ip: vnode @ 0xffff8000974df5f0, flags (10<MPSAFE>)
> > tag VT_UFS(1), type VLNK(5), usecount 1, writecount 0, holdcount
> > 0
> > freelisthd 0x0, mount 0xffff800072988000, data
> > 0xffff8000974e0dc0 lock 0xffff8000974df6f8 recursecnt 0
> > tag VT_UFS, ino 55338475, on dev 19, 4 flags 0x0, effnlink 1,
> > nlink 1
> > mode 0120775, owner 1007, group 100, size 31
>
> Wow, not that's strange. we get there because VTOI(vp) == NULL.
> VTOI is ((struct inode *)(vp)->v_data), and v_data is obviously not NULL
> in this vnode. How could this happen ?
I have an idea on how this can happen; the vnode is put on the mnt list before
initialisation is completed. But then its type should be VNON and so it should
be skipped.
Anyway, ffs_sync() checks for both v_type == VNON and VTOI(vp) == NULL, so
we could do the same in qsync. While there also check for VCLEAN, like
ffs_sync() although this should also not be needed.
Can you see if the attached patch prevents the vprint from firing ?
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
--/9DWx/yDrRhgMJTb
Content-Type: text/plain; charset=us-ascii
Content-Disposition: attachment; filename="quota.diff2"
Index: ufs/ufs_quota.c
===================================================================
RCS file: /cvsroot/src/sys/ufs/ufs/ufs_quota.c,v
retrieving revision 1.60.10.1
diff -u -r1.60.10.1 ufs_quota.c
--- ufs/ufs_quota.c 2 Feb 2009 18:24:17 -0000 1.60.10.1
+++ ufs/ufs_quota.c 6 Jan 2010 18:22:55 -0000
@@ -728,8 +728,9 @@
for (vp = TAILQ_FIRST(&mp->mnt_vnodelist); vp; vp = vunmark(mvp)) {
vmark(mvp, vp);
mutex_enter(&vp->v_interlock);
- if (vp->v_mount != mp || vismarker(vp) || vp->v_type == VNON ||
- (vp->v_iflag & VI_CLEAN) != 0) {
+ if (VTOI(vp) == NULL || vp->v_mount != mp || vismarker(vp) ||
+ vp->v_type == VNON ||
+ (vp->v_iflag & (VI_XLOCK | VI_CLEAN)) != 0) {
mutex_exit(&vp->v_interlock);
continue;
}
--/9DWx/yDrRhgMJTb--
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/42205: kernel panic at activated userquota
Date: Thu, 7 Jan 2010 19:31:55 +0100 (CET)
I installed the new kernel. No message in the last 24 hours. It needs some
more days to say if the can patch solve the problem or not.
regards
Uwe
On Wed, 6 Jan 2010, Manuel Bouyer wrote:
> Date: Wed, 6 Jan 2010 18:35:02 +0000 (UTC)
> From: Manuel Bouyer <bouyer@antioche.eu.org>
> Reply-To: gnats-bugs@NetBSD.org
> To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
> netbsd-bugs@netbsd.org, 6bone@6bone.informatik.uni-leipzig.de
> Subject: Re: kern/42205: kernel panic at activated userquota
>
> The following reply was made to PR kern/42205; it has been noted by GNATS.
>
> From: Manuel Bouyer <bouyer@antioche.eu.org>
> To: 6bone@6bone.informatik.uni-leipzig.de
> Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
> netbsd-bugs@NetBSD.org
> Subject: Re: kern/42205: kernel panic at activated userquota
> Date: Wed, 6 Jan 2010 19:29:55 +0100
>
> I have an idea on how this can happen; the vnode is put on the mnt list before
> initialisation is completed. But then its type should be VNON and so it should
> be skipped.
>
> Anyway, ffs_sync() checks for both v_type == VNON and VTOI(vp) == NULL, so
> we could do the same in qsync. While there also check for VCLEAN, like
> ffs_sync() although this should also not be needed.
> Can you see if the attached patch prevents the vprint from firing ?
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: 6bone@6bone.informatik.uni-leipzig.de
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org,
netbsd-bugs@NetBSD.org
Subject: Re: kern/42205: kernel panic at activated userquota
Date: Fri, 15 Jan 2010 13:26:42 +0100
On Wed, Jan 06, 2010 at 07:29:55PM +0100, Manuel Bouyer wrote:
> I have an idea on how this can happen; the vnode is put on the mnt list before
> initialisation is completed. But then its type should be VNON and so it should
> be skipped.
It can also still be on the mnt list while being removed from the
free list and cleaned. Especially, getcleanvnode() set v_type to
VNON after releasing the interlock. this patch would indeed fix this.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Manuel Bouyer <bouyer@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/42205 CVS commit: src/sys/ufs/ufs
Date: Fri, 15 Jan 2010 19:46:35 +0000
Module Name: src
Committed By: bouyer
Date: Fri Jan 15 19:46:35 UTC 2010
Modified Files:
src/sys/ufs/ufs: ufs_quota.c
Log Message:
vclean() actually sets v_tag to VT_NON but doesn't touch v_type.
getcleanvnode() sets v_type to VNON after releasing v_interlock.
So the thread doing quotaon(), quotaoff() or qsync() could vget()
a vnode which is being recycled in getcleanvnode(), after is has
been cleaned and v_interlock released, but before v_type has been
reset, leading to KASSERT(vp->v_usecount == 1) firing in
getnewvnode(), or qsync() dereferending a NULL pointer as in
PR kern/42205.
Fix by using the same tests as other ffs function traversing the mount
list: also check for VTOI(vp) == NULL, and VI_XLOCK in addition
to VI_CLEAN.
To generate a diff of this commit:
cvs rdiff -u -r1.64 -r1.65 src/sys/ufs/ufs/ufs_quota.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Responsible-Changed-From-To: kern-bug-people->bouyer
Responsible-Changed-By: bouyer@NetBSD.org
Responsible-Changed-When: Sat, 16 Jan 2010 17:09:16 +0000
Responsible-Changed-Why:
I tried to track the problem down ...
State-Changed-From-To: open->feedback
State-Changed-By: bouyer@NetBSD.org
State-Changed-When: Sat, 16 Jan 2010 17:09:16 +0000
State-Changed-Why:
Hi,
any news from the last patch ?
From: 6bone@6bone.informatik.uni-leipzig.de
To: gnats-bugs@NetBSD.org
Cc: bouyer@NetBSD.org, kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org,
gnats-admin@netbsd.org, bouyer@NetBSD.org
Subject: Re: kern/42205 (kernel panic at activated userquota)
Date: Sun, 17 Jan 2010 19:58:55 +0100 (CET)
On Sat, 16 Jan 2010, bouyer@NetBSD.org wrote:
> any news from the last patch ?
I applied the patch of 6 Jan 2010 against the kernel including the
previous patches. No crashes or vprint messages in the last days.
I think the patches are solving the problem.
Thank you for your efforts
Uwe
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: 6bone@6bone.informatik.uni-leipzig.de
Cc: gnats-bugs@NetBSD.org, kern-bug-people@NetBSD.org, netbsd-bugs@NetBSD.org,
gnats-admin@NetBSD.org
Subject: Re: kern/42205 (kernel panic at activated userquota)
Date: Mon, 18 Jan 2010 12:37:59 +0100
On Sun, Jan 17, 2010 at 07:58:55PM +0100, 6bone@6bone.informatik.uni-leipzig.de wrote:
> On Sat, 16 Jan 2010, bouyer@NetBSD.org wrote:
>> any news from the last patch ?
>
> I applied the patch of 6 Jan 2010 against the kernel including the
> previous patches. No crashes or vprint messages in the last days.
>
> I think the patches are solving the problem.
Good, that's great news !
I applied to HEAD the same patch, and a similar fix for
quotaon() and quotaoff(), I will request a pullup to netbsd-5.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
State-Changed-From-To: feedback->pending-pullups
State-Changed-By: bouyer@NetBSD.org
State-Changed-When: Mon, 18 Jan 2010 20:42:00 +0000
State-Changed-Why:
Ticket pullu-5/1252
From: Stephen Borrill <sborrill@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/42205 CVS commit: [netbsd-5] src/sys/ufs/ufs
Date: Wed, 27 Jan 2010 21:26:45 +0000
Module Name: src
Committed By: sborrill
Date: Wed Jan 27 21:26:45 UTC 2010
Modified Files:
src/sys/ufs/ufs [netbsd-5]: ufs_quota.c
Log Message:
Pull up the following revisions(s) (requested by bouyer in ticket #1252):
sys/ufs/ufs/ufs_quota.c: revision 1.65
vclean() actually sets v_tag to VT_NON but doesn't touch v_type.
getcleanvnode() sets v_type to VNON after releasing v_interlock.
So the thread doing quotaon(), quotaoff() or qsync() could vget()
a vnode which is being recycled in getcleanvnode(), after it has
been cleaned and v_interlock released, but before v_type has been
reset, leading to KASSERT(vp->v_usecount == 1) firing in
getnewvnode(), or qsync() dereferencing a NULL pointer as in
PR kern/42205.
Fix by using the same tests as other ffs functions traversing the mount
list: also check for VTOI(vp) == NULL, and VI_XLOCK in addition
to VI_CLEAN.
To generate a diff of this commit:
cvs rdiff -u -r1.60.10.3 -r1.60.10.4 src/sys/ufs/ufs/ufs_quota.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: bouyer@NetBSD.org
State-Changed-When: Wed, 27 Jan 2010 21:38:06 +0000
State-Changed-Why:
Pulled up to netbsd-5
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.