NetBSD Problem Report #44206
From martin@aprisoft.de Wed Dec 8 09:02:15 2010
Return-Path: <martin@aprisoft.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id 4E5FA63B87A
for <gnats-bugs@gnats.NetBSD.org>; Wed, 8 Dec 2010 09:02:15 +0000 (UTC)
Message-Id: <20101208090207.7F440AF580E@emmas.aprisoft.de>
Date: Wed, 8 Dec 2010 10:02:07 +0100 (CET)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@gnats.NetBSD.org
Subject: reproducable (for me) NFS panic
X-Send-Pr-Version: 3.95
>Number: 44206
>Category: kern
>Synopsis: reproducable (for me) NFS panic
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Dec 08 09:05:01 +0000 2010
>Closed-Date: Tue Dec 14 16:42:14 +0000 2010
>Last-Modified: Tue Dec 14 16:42:14 +0000 2010
>Originator: Martin Husemann
>Release: NetBSD 5.99.41
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD after-hours.aprisoft.de 5.99.41 NetBSD 5.99.41 (GENERIC) #25: Tue Dec 7 12:41:46 CET 2010 martin@emmas.aprisoft.de:/nelly/usr/src/sys/arch/sparc64/compile/GENERIC sparc64
Architecture: sparc64
Machine: sparc64
>Description:
This machine uses pkgsrc (including pkgsrc/distfiles) over NFS. Whenever
I make pkgsrc download a distfile, I get this panic:
panic: kernel diagnostic assertion "mb->m_next != NULL" failed: file "../../../../nfs/nfs_vnops.c", line 1377
Stopped in pid 0.50 (system) at netbsd:cpu_Debugger+0x4: nop
db{1}> bt
kern_assert(1726ec0, 176c488, 561, 176c588, 1, f2c57d0) at netbsd:kern_assert+0x2c
nfs_writerpc(0, e9d5d08, 1, e9d5bf0, e9d5be0, ede8a20) at netbsd:nfs_writerpc+0xe84
nfs_doio(8000, e909400, e9d5cc8, e9d5d08, ede8b30, f802cb0) at netbsd:nfs_doio+0x510
nfssvc_iod(e909400, e909400, 0, d210e00, 10b9e80, 189b000) at netbsd:nfssvc_iod+0x150
db{1}> mach cpu 0
db{0}> bt
intr_biglock_wrapper(400aa80, 0, e0017ed0, 6, 120dd80, 1c14000) at netbsd:intr_biglock_wrapper+0x4
sparc_interrupt(0, d22f400, 0, d210e00, 175f400, d213c50) at netbsd:sparc_interrupt+0x1e8
kernel_lock(0, 0, 592, 17958b0, 0, 60) at netbsd:_kernel_lock+0xc0
sleepq_block(0, 0, d22f400, 1, 0, d22f400) at netbsd:sleepq_block+0x1e4
cv_timedwait(ede8a30, ede8a20, 0, 4590d70, 0, 0) at netbsd:cv_timedwait+0x108
nfs_rcvlock(0, eec67e0, 4590d70, eec67e0, 0, 45d8da8) at netbsd:nfs_rcvlock+0xc8
nfs_request(0, ede8a20, 2034, d22f400, d20ef00, 189d000) at netbsd:nfs_request+0x320
nfs_writerpc(1b, e085d08, 1, e085bf0, e085be0, ede8a20) at netbsd:nfs_writerpc+0x3ec
nfs_doio(0, d22f400, e085cc8, e085d08, ede8b30, f802cb0) at netbsd:nfs_doio+0x510
nfssvc_iod(d22f400, d22f400, 0, d210e00, 10b9e80, 189b000) at netbsd:nfssvc_iod+0x150
>How-To-Repeat:
s/a
>Fix:
any hints welcome
>Release-Note:
>Audit-Trail:
From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Wed, 8 Dec 2010 23:20:44 +0000 (UTC)
> panic: kernel diagnostic assertion "mb->m_next != NULL" failed: file "../../../../nfs/nfs_vnops.c", line 1377
the assertion is broken. see PR/42455.
YAMAMOTO Takashi
From: Christoph Egger <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org
Cc: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>,
kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, martin@NetBSD.org
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Thu, 09 Dec 2010 08:22:03 +0100
On 09.12.10 00:25, YAMAMOTO Takashi wrote:
> The following reply was made to PR kern/44206; it has been noted by GNATS.
>
> From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
> To: gnats-bugs@NetBSD.org
> Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
> netbsd-bugs@netbsd.org
> Subject: Re: kern/44206: reproducable (for me) NFS panic
> Date: Wed, 8 Dec 2010 23:20:44 +0000 (UTC)
>
> > panic: kernel diagnostic assertion "mb->m_next != NULL" failed: file "../../../../nfs/nfs_vnops.c", line 1377
>
> the assertion is broken. see PR/42455.
When you remove it does the machine hang in cv_wait() then?
(That's the case for me hence I haven't removed it yet)
Christoph
From: Martin Husemann <martin@duskware.de>
To: Christoph Egger <Christoph_Egger@gmx.de>
Cc: gnats-bugs@NetBSD.org, YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>,
kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, martin@NetBSD.org
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Thu, 9 Dec 2010 10:11:32 +0100
On Thu, Dec 09, 2010 at 08:22:03AM +0100, Christoph Egger wrote:
> When you remove it does the machine hang in cv_wait() then?
> (That's the case for me hence I haven't removed it yet)
No, w/o the assertion it seems to work fine.
Martin
From: Reinoud Zandijk <reinoud@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Mon, 13 Dec 2010 22:08:50 +0100
I've got a new trace when running a NetBSD kernel with root on NFS. It seems
only to be triggered at writing?
Note that it can only trigger on NFSv3
-----
luiaard# mail
panic: kernel diagnostic assertion "mb->m_next != NULL" failed: file
"../../../../nfs/nfs_vnops.c", line 1377
entering kgdb
fatal breakpoint trap in supervisor mode
(gdb) where
#0 0xc0217ef4 in breakpoint ()
#1 0xc048dc1e in kgdb_connect (verbose=0)
at ../../../../arch/i386/i386/kgdb_machdep.c:228
#2 0xc048dc72 in kgdb_panic () at
../../../../arch/i386/i386/kgdb_machdep.c:245
#3 0xc0617048 in panic (
fmt=0xc0a60574 "kernel %sassertion \"%s\" failed: file \"%s\", line
%d") at ../../../../kern/subr_prf.c:274
#4 0xc07672f0 in kern_assert (t=0xc09b793b "diagnostic ",
f=0xc0a17161 "../../../../nfs/nfs_vnops.c", l=1377,
e=0xc0a17201 "mb->m_next != NULL")
at ../../../../../../lib/libkern/kern_assert.c:50
#5 0xc04fe6a0 in nfs_writerpc (vp=0xc719f8f4,
uiop=0xc6e1acc0, iomode=0xc6e1ace4,
pageprotected=true, stalewriteverfp=0xc6e1aceb)
at ../../../../nfs/nfs_vnops.c:1377
#6 0xc04e86cd in nfs_doio (bp=0xc0f55e70)
at ../../../../nfs/nfs_bio.c:1068
#7 0xc04ee981 in nfssvc_iod (arg=0xc6e23540)
at ../../../../nfs/nfs_iod.c:158
#8 0xc0100321 in lwp_trampoline ()
bj = {vmobjlock = {u = {mtxa_owner = 0}}, pgops = 0xc09a4b08, memq = {
tqh_first = 0xc0e27740, tqh_last = 0xc0e2f554}, uo_npages = 160, uo_refs = 2,
rb_tree = {rbt_root = 0xc0e27620, rbt_ops = 0xc09a4a3c, rbt_minmax =
{0xc0e27740, 0xc0e2f540}}},
v_cv = {cv_opaque = {0x0, 0xc719f91c, 0xc0a28ca5}},
v_size = 655360, v_writesize = 655360, v_iflag = 16384,
v_vflag = 0, v_uflag = 0,
v_numoutput = 6, v_writecount = 1, v_holdcnt = 1, v_synclist_slot = 18,
v_mount = 0xc6e1f204, v_op = 0xc61ed00c, v_freelist = {tqe_next = 0x0,
tqe_prev = 0x0}, v_freelisthd = 0x0, v_mntvnodes = {tqe_next = 0xc719f848,
tqe_prev = 0xc719fa14}, v_cleanblkhd = {lh_first = 0x0}, v_dirtyblkhd = {
lh_first = 0x0}, v_synclist = {tqe_next = 0x0, tqe_prev = 0xc613ed34},
v_dnclist = {lh_first = 0x0}, v_nclist = {lh_first = 0x0}, v_un = {
vu_mountedhere = 0x0, vu_socket = 0x0, vu_specnode = 0x0,
vu_fifoinfo = 0x0,
vu_ractx = 0x0}, v_type = VREG, v_tag = VT_NFS, v_lock = {rw_owner = 0},
v_data = 0xc719e140, v_klist = {slh_first = 0x0}}
(gdb) print *mb
Cannot access memory at address 0xffffff84
(gdb) print ctx.nwc_mbufcount
$2 = 1
(???)
(gdb) print *vp
$4 = {v_obj = {vmobjlock = {u = {mtxa_owner = 0}}, pgops = 0xc09a4b08, memq = {
tqh_first = 0xc0e27740, tqh_last = 0xc0e2f554}, uo_npages = 160, uo_refs = 2,
rb_tree = {rbt_root = 0xc0e27620, rbt_ops = 0xc09a4a3c, rbt_minmax =
{0xc0e27740, 0xc0e2f540}}},
v_cv = {cv_opaque = {0x0, 0xc719f91c, 0xc0a28ca5}},
v_size = 655360, v_writesize = 655360, v_iflag = 16384,
v_vflag = 0, v_uflag = 0,
v_numoutput = 6, v_writecount = 1, v_holdcnt = 1, v_synclist_slot = 18,
v_mount = 0xc6e1f204, v_op = 0xc61ed00c, v_freelist = {tqe_next = 0x0,
tqe_prev = 0x0}, v_freelisthd = 0x0, v_mntvnodes = {tqe_next = 0xc719f848,
tqe_prev = 0xc719fa14}, v_cleanblkhd = {lh_first = 0x0}, v_dirtyblkhd = {
lh_first = 0x0}, v_synclist = {tqe_next = 0x0, tqe_prev = 0xc613ed34},
v_dnclist = {lh_first = 0x0}, v_nclist = {lh_first = 0x0}, v_un = {
vu_mountedhere = 0x0, vu_socket = 0x0, vu_specnode = 0x0,
vu_fifoinfo = 0x0,
vu_ractx = 0x0}, v_type = VREG, v_tag = VT_NFS, v_lock = {rw_owner = 0},
v_data = 0xc719e140, v_klist = {slh_first = 0x0}}
------
From: Christoph Egger <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org
Cc: Reinoud Zandijk <reinoud@NetBSD.org>, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, martin@NetBSD.org
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Mon, 13 Dec 2010 23:34:52 +0100
On 13.12.10 22:10, Reinoud Zandijk wrote:
> The following reply was made to PR kern/44206; it has been noted by GNATS.
>
> From: Reinoud Zandijk <reinoud@NetBSD.org>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: kern/44206: reproducable (for me) NFS panic
> Date: Mon, 13 Dec 2010 22:08:50 +0100
>
> I've got a new trace when running a NetBSD kernel with root on NFS. It seems
> only to be triggered at writing?
>
> Note that it can only trigger on NFSv3
>
> -----
> luiaard# mail
> panic: kernel diagnostic assertion "mb->m_next != NULL" failed: file
> "../../../../nfs/nfs_vnops.c", line 1377
Does it hang in cv_wait() in nfs_writerpc() when you remove the KASSERT?
Do you use nfs over tcp or udp?
Christoph
From: Reinoud Zandijk <reinoud@NetBSD.org>
To: Christoph Egger <Christoph_Egger@gmx.de>
Cc: gnats-bugs@NetBSD.org, Reinoud Zandijk <reinoud@NetBSD.org>,
kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, martin@NetBSD.org
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Tue, 14 Dec 2010 00:00:25 +0100
Hi Christoph,
On Mon, Dec 13, 2010 at 11:34:52PM +0100, Christoph Egger wrote:
> On 13.12.10 22:10, Reinoud Zandijk wrote:
> > Note that it can only trigger on NFSv3
> >
> > -----
> > luiaard# mail
> > panic: kernel diagnostic assertion "mb->m_next != NULL" failed: file
> > "../../../../nfs/nfs_vnops.c", line 1377
>
> Does it hang in cv_wait() in nfs_writerpc() when you remove the KASSERT?
> Do you use nfs over tcp or udp?
Its main mounts are :
aaa.local:/usr/exports/luiaard / nfs rw 0 0
bbb.local:/home /home nfs rw 0 0
and the servers are running with:
/usr/sbin/nfsd -6 -u -t -n 6
When i remove the KASSERT() it just seems to work; tried to replicate it but
it won't crash nor hang.
With regards,
Reinoud
From: Christoph Egger <Christoph_Egger@gmx.de>
To: Reinoud Zandijk <reinoud@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, martin@NetBSD.org
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Tue, 14 Dec 2010 00:05:06 +0100
On 14.12.10 00:00, Reinoud Zandijk wrote:
> Hi Christoph,
>
> On Mon, Dec 13, 2010 at 11:34:52PM +0100, Christoph Egger wrote:
>> On 13.12.10 22:10, Reinoud Zandijk wrote:
>>> Note that it can only trigger on NFSv3
>>>
>>> -----
>>> luiaard# mail
>>> panic: kernel diagnostic assertion "mb->m_next != NULL" failed: file
>>> "../../../../nfs/nfs_vnops.c", line 1377
>>
>> Does it hang in cv_wait() in nfs_writerpc() when you remove the KASSERT?
>> Do you use nfs over tcp or udp?
>
> Its main mounts are :
> aaa.local:/usr/exports/luiaard / nfs rw 0 0
> bbb.local:/home /home nfs rw 0 0
Are those over tcp or udp ?
> and the servers are running with:
> /usr/sbin/nfsd -6 -u -t -n 6
>
> When i remove the KASSERT() it just seems to work; tried to replicate it but
> it won't crash nor hang.
Ok, thank you for the information.
Christoph
From: Martin Husemann <martin@duskware.de>
To: Christoph Egger <Christoph_Egger@gmx.de>
Cc: gnats-bugs@NetBSD.org, Reinoud Zandijk <reinoud@NetBSD.org>,
kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, martin@NetBSD.org
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Tue, 14 Dec 2010 09:39:32 +0100
On Mon, Dec 13, 2010 at 11:34:52PM +0100, Christoph Egger wrote:
> Does it hang in cv_wait() in nfs_writerpc() when you remove the KASSERT?
> Do you use nfs over tcp or udp?
For the record: I'm using NFS over UDP and removing the KASSERT fixes the
problem for me, no hangs.
Martin
From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, martin@NetBSD.org, netbsd-bugs@netbsd.org,
gnats-admin@netbsd.org, kern-bug-people@netbsd.org
Cc:
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Tue, 14 Dec 2010 13:45:15 +0100
> On Mon, Dec 13, 2010 at 11:34:52PM +0100, Christoph Egger wrote:
> > Does it hang in cv_wait() in nfs_writerpc() when you
> > remove the KASSERT?
> > Do you use nfs over tcp or udp?
>
> For the record: I'm using NFS over UDP and removing
> the KASSERT fixes the problem for me, no hangs.
Thank you for the information. Can you try if you
can trigger the KASSERT with NFS over TCP, please?
If so will removing the KASSERT cause a hang
in cv_wait() in nfs_writerpc() then?
Christoph
From: Martin Husemann <martin@duskware.de>
To: Christoph Egger <Christoph_Egger@gmx.de>
Cc: gnats-bugs@NetBSD.org, martin@NetBSD.org, netbsd-bugs@netbsd.org,
gnats-admin@netbsd.org, kern-bug-people@netbsd.org
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Tue, 14 Dec 2010 16:48:29 +0100
On Tue, Dec 14, 2010 at 01:45:15PM +0100, Christoph Egger wrote:
> If so will removing the KASSERT cause a hang
> in cv_wait() in nfs_writerpc() then?
With a TCP mount I can still trigger the KASSERT. Removing it makes the
machine work stable.
Martin
From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, martin@NetBSD.org, netbsd-bugs@netbsd.org,
gnats-admin@netbsd.org, kern-bug-people@netbsd.org
Cc:
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Tue, 14 Dec 2010 17:14:58 +0100
> On Tue, Dec 14, 2010 at 01:45:15PM +0100, Christoph Egger wrote:
> > If so will removing the KASSERT cause a hang
> > in cv_wait() in nfs_writerpc() then?
>
> With a TCP mount I can still trigger the KASSERT. Removing it
> makes the machine work stable.
Ok. Which network driver are you using?
Christoph
From: Martin Husemann <martin@duskware.de>
To: Christoph Egger <Christoph_Egger@gmx.de>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Tue, 14 Dec 2010 17:17:08 +0100
On Tue, Dec 14, 2010 at 05:14:58PM +0100, Christoph Egger wrote:
> Ok. Which network driver are you using?
gem(4)
Martin
From: "Christoph Egger" <cegger@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/44206 CVS commit: src/sys/nfs
Date: Tue, 14 Dec 2010 16:25:19 +0000
Module Name: src
Committed By: cegger
Date: Tue Dec 14 16:25:19 UTC 2010
Modified Files:
src/sys/nfs: nfs_vnops.c
Log Message:
back out rev. 1.285. The problem I try to hunt down
in PR 42455 is not in the network stack as shown by PR 44206.
To generate a diff of this commit:
cvs rdiff -u -r1.287 -r1.288 src/sys/nfs/nfs_vnops.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->closed
State-Changed-By: martin@NetBSD.org
State-Changed-When: Tue, 14 Dec 2010 16:42:14 +0000
State-Changed-Why:
the KASSERT is gone
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.