NetBSD Problem Report #44206

From martin@aprisoft.de  Wed Dec  8 09:02:15 2010
Return-Path: <martin@aprisoft.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 4E5FA63B87A
	for <gnats-bugs@gnats.NetBSD.org>; Wed,  8 Dec 2010 09:02:15 +0000 (UTC)
Message-Id: <20101208090207.7F440AF580E@emmas.aprisoft.de>
Date: Wed,  8 Dec 2010 10:02:07 +0100 (CET)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@gnats.NetBSD.org
Subject: reproducable (for me) NFS panic
X-Send-Pr-Version: 3.95

>Number:         44206
>Category:       kern
>Synopsis:       reproducable (for me) NFS panic
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Dec 08 09:05:01 +0000 2010
>Closed-Date:    Tue Dec 14 16:42:14 +0000 2010
>Last-Modified:  Tue Dec 14 16:42:14 +0000 2010
>Originator:     Martin Husemann
>Release:        NetBSD 5.99.41
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD after-hours.aprisoft.de 5.99.41 NetBSD 5.99.41 (GENERIC) #25: Tue Dec 7 12:41:46 CET 2010 martin@emmas.aprisoft.de:/nelly/usr/src/sys/arch/sparc64/compile/GENERIC sparc64
Architecture: sparc64
Machine: sparc64
>Description:

This machine uses pkgsrc (including pkgsrc/distfiles) over NFS. Whenever
I make pkgsrc download a distfile, I get this panic:

panic: kernel diagnostic assertion "mb->m_next != NULL" failed: file "../../../../nfs/nfs_vnops.c", line 1377
Stopped in pid 0.50 (system) at netbsd:cpu_Debugger+0x4:        nop
db{1}> bt
kern_assert(1726ec0, 176c488, 561, 176c588, 1, f2c57d0) at netbsd:kern_assert+0x2c
nfs_writerpc(0, e9d5d08, 1, e9d5bf0, e9d5be0, ede8a20) at netbsd:nfs_writerpc+0xe84
nfs_doio(8000, e909400, e9d5cc8, e9d5d08, ede8b30, f802cb0) at netbsd:nfs_doio+0x510
nfssvc_iod(e909400, e909400, 0, d210e00, 10b9e80, 189b000) at netbsd:nfssvc_iod+0x150
db{1}> mach cpu 0
db{0}> bt
intr_biglock_wrapper(400aa80, 0, e0017ed0, 6, 120dd80, 1c14000) at netbsd:intr_biglock_wrapper+0x4
sparc_interrupt(0, d22f400, 0, d210e00, 175f400, d213c50) at netbsd:sparc_interrupt+0x1e8
kernel_lock(0, 0, 592, 17958b0, 0, 60) at netbsd:_kernel_lock+0xc0
sleepq_block(0, 0, d22f400, 1, 0, d22f400) at netbsd:sleepq_block+0x1e4
cv_timedwait(ede8a30, ede8a20, 0, 4590d70, 0, 0) at netbsd:cv_timedwait+0x108
nfs_rcvlock(0, eec67e0, 4590d70, eec67e0, 0, 45d8da8) at netbsd:nfs_rcvlock+0xc8
nfs_request(0, ede8a20, 2034, d22f400, d20ef00, 189d000) at netbsd:nfs_request+0x320
nfs_writerpc(1b, e085d08, 1, e085bf0, e085be0, ede8a20) at netbsd:nfs_writerpc+0x3ec
nfs_doio(0, d22f400, e085cc8, e085d08, ede8b30, f802cb0) at netbsd:nfs_doio+0x510
nfssvc_iod(d22f400, d22f400, 0, d210e00, 10b9e80, 189b000) at netbsd:nfssvc_iod+0x150


>How-To-Repeat:
s/a

>Fix:
any hints welcome

>Release-Note:

>Audit-Trail:
From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Wed,  8 Dec 2010 23:20:44 +0000 (UTC)

 > panic: kernel diagnostic assertion "mb->m_next != NULL" failed: file "../../../../nfs/nfs_vnops.c", line 1377

 the assertion is broken.  see PR/42455.

 YAMAMOTO Takashi

From: Christoph Egger <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org
Cc: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>, 
 kern-bug-people@netbsd.org, gnats-admin@netbsd.org, 
 netbsd-bugs@netbsd.org, martin@NetBSD.org
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Thu, 09 Dec 2010 08:22:03 +0100

 On 09.12.10 00:25, YAMAMOTO Takashi wrote:
 > The following reply was made to PR kern/44206; it has been noted by GNATS.
 > 
 > From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
 > To: gnats-bugs@NetBSD.org
 > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 > 	netbsd-bugs@netbsd.org
 > Subject: Re: kern/44206: reproducable (for me) NFS panic
 > Date: Wed,  8 Dec 2010 23:20:44 +0000 (UTC)
 > 
 >  > panic: kernel diagnostic assertion "mb->m_next != NULL" failed: file "../../../../nfs/nfs_vnops.c", line 1377
 >  
 >  the assertion is broken.  see PR/42455.

 When you remove it does the machine hang in cv_wait() then?
 (That's the case for me hence I haven't removed it yet)

 Christoph

From: Martin Husemann <martin@duskware.de>
To: Christoph Egger <Christoph_Egger@gmx.de>
Cc: gnats-bugs@NetBSD.org, YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>,
	kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, martin@NetBSD.org
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Thu, 9 Dec 2010 10:11:32 +0100

 On Thu, Dec 09, 2010 at 08:22:03AM +0100, Christoph Egger wrote:
 > When you remove it does the machine hang in cv_wait() then?
 > (That's the case for me hence I haven't removed it yet)

 No, w/o the assertion it seems to work fine.

 Martin

From: Reinoud Zandijk <reinoud@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Mon, 13 Dec 2010 22:08:50 +0100

 I've got a new trace when running a NetBSD kernel with root on NFS. It seems
 only to be triggered at writing?

 Note that it can only trigger on NFSv3

 -----
 luiaard# mail
 panic: kernel diagnostic assertion "mb->m_next != NULL" failed: file
 "../../../../nfs/nfs_vnops.c", line 1377
 entering kgdb
 fatal breakpoint trap in supervisor mode

 (gdb) where
 #0  0xc0217ef4 in breakpoint ()
 #1  0xc048dc1e in kgdb_connect (verbose=0)
     at ../../../../arch/i386/i386/kgdb_machdep.c:228
 #2  0xc048dc72 in kgdb_panic () at
     ../../../../arch/i386/i386/kgdb_machdep.c:245
 #3  0xc0617048 in panic (
         fmt=0xc0a60574 "kernel %sassertion \"%s\" failed: file \"%s\", line
 	%d") at ../../../../kern/subr_prf.c:274
 #4  0xc07672f0 in kern_assert (t=0xc09b793b "diagnostic ", 
         f=0xc0a17161 "../../../../nfs/nfs_vnops.c", l=1377, 
 	e=0xc0a17201 "mb->m_next != NULL")
         at ../../../../../../lib/libkern/kern_assert.c:50
 #5  0xc04fe6a0 in nfs_writerpc (vp=0xc719f8f4,
 	uiop=0xc6e1acc0, iomode=0xc6e1ace4, 
 	pageprotected=true, stalewriteverfp=0xc6e1aceb)
 	at ../../../../nfs/nfs_vnops.c:1377
 #6  0xc04e86cd in nfs_doio (bp=0xc0f55e70)
 	at ../../../../nfs/nfs_bio.c:1068
 #7  0xc04ee981 in nfssvc_iod (arg=0xc6e23540)
 	at ../../../../nfs/nfs_iod.c:158
 #8  0xc0100321 in lwp_trampoline ()

 bj = {vmobjlock = {u = {mtxa_owner = 0}}, pgops = 0xc09a4b08, memq = {
   tqh_first = 0xc0e27740, tqh_last = 0xc0e2f554}, uo_npages = 160, uo_refs = 2, 
   rb_tree = {rbt_root = 0xc0e27620, rbt_ops = 0xc09a4a3c, rbt_minmax =
 	{0xc0e27740, 0xc0e2f540}}},
 	 v_cv = {cv_opaque = {0x0, 0xc719f91c, 0xc0a28ca5}}, 
   v_size = 655360, v_writesize = 655360, v_iflag = 16384,
   v_vflag = 0, v_uflag = 0, 
   v_numoutput = 6, v_writecount = 1, v_holdcnt = 1, v_synclist_slot = 18, 
   v_mount = 0xc6e1f204, v_op = 0xc61ed00c, v_freelist = {tqe_next = 0x0, 
     tqe_prev = 0x0}, v_freelisthd = 0x0, v_mntvnodes = {tqe_next = 0xc719f848, 
     tqe_prev = 0xc719fa14}, v_cleanblkhd = {lh_first = 0x0}, v_dirtyblkhd = {
     lh_first = 0x0}, v_synclist = {tqe_next = 0x0, tqe_prev = 0xc613ed34}, 
   v_dnclist = {lh_first = 0x0}, v_nclist = {lh_first = 0x0}, v_un = {
     vu_mountedhere = 0x0, vu_socket = 0x0, vu_specnode = 0x0,
     vu_fifoinfo = 0x0, 
     vu_ractx = 0x0}, v_type = VREG, v_tag = VT_NFS, v_lock = {rw_owner = 0}, 
   v_data = 0xc719e140, v_klist = {slh_first = 0x0}}

 (gdb) print *mb
 Cannot access memory at address 0xffffff84
 (gdb) print ctx.nwc_mbufcount
 $2 = 1
 (???)

 (gdb) print *vp
 $4 = {v_obj = {vmobjlock = {u = {mtxa_owner = 0}}, pgops = 0xc09a4b08, memq = {
   tqh_first = 0xc0e27740, tqh_last = 0xc0e2f554}, uo_npages = 160, uo_refs = 2, 
   rb_tree = {rbt_root = 0xc0e27620, rbt_ops = 0xc09a4a3c, rbt_minmax =
 	{0xc0e27740, 0xc0e2f540}}},
 	 v_cv = {cv_opaque = {0x0, 0xc719f91c, 0xc0a28ca5}}, 
   v_size = 655360, v_writesize = 655360, v_iflag = 16384,
   v_vflag = 0, v_uflag = 0, 
   v_numoutput = 6, v_writecount = 1, v_holdcnt = 1, v_synclist_slot = 18, 
   v_mount = 0xc6e1f204, v_op = 0xc61ed00c, v_freelist = {tqe_next = 0x0, 
     tqe_prev = 0x0}, v_freelisthd = 0x0, v_mntvnodes = {tqe_next = 0xc719f848, 
     tqe_prev = 0xc719fa14}, v_cleanblkhd = {lh_first = 0x0}, v_dirtyblkhd = {
     lh_first = 0x0}, v_synclist = {tqe_next = 0x0, tqe_prev = 0xc613ed34}, 
   v_dnclist = {lh_first = 0x0}, v_nclist = {lh_first = 0x0}, v_un = {
     vu_mountedhere = 0x0, vu_socket = 0x0, vu_specnode = 0x0,
     vu_fifoinfo = 0x0, 
     vu_ractx = 0x0}, v_type = VREG, v_tag = VT_NFS, v_lock = {rw_owner = 0}, 
   v_data = 0xc719e140, v_klist = {slh_first = 0x0}}

 ------


From: Christoph Egger <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org
Cc: Reinoud Zandijk <reinoud@NetBSD.org>, kern-bug-people@netbsd.org, 
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, martin@NetBSD.org
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Mon, 13 Dec 2010 23:34:52 +0100

 On 13.12.10 22:10, Reinoud Zandijk wrote:
 > The following reply was made to PR kern/44206; it has been noted by GNATS.
 > 
 > From: Reinoud Zandijk <reinoud@NetBSD.org>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: kern/44206: reproducable (for me) NFS panic
 > Date: Mon, 13 Dec 2010 22:08:50 +0100
 > 
 >  I've got a new trace when running a NetBSD kernel with root on NFS. It seems
 >  only to be triggered at writing?
 >  
 >  Note that it can only trigger on NFSv3
 >  
 >  -----
 >  luiaard# mail
 >  panic: kernel diagnostic assertion "mb->m_next != NULL" failed: file
 >  "../../../../nfs/nfs_vnops.c", line 1377

 Does it hang in cv_wait() in nfs_writerpc() when you remove the KASSERT?
 Do you use nfs over tcp or udp?

 Christoph

From: Reinoud Zandijk <reinoud@NetBSD.org>
To: Christoph Egger <Christoph_Egger@gmx.de>
Cc: gnats-bugs@NetBSD.org, Reinoud Zandijk <reinoud@NetBSD.org>,
	kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, martin@NetBSD.org
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Tue, 14 Dec 2010 00:00:25 +0100

 Hi Christoph,

 On Mon, Dec 13, 2010 at 11:34:52PM +0100, Christoph Egger wrote:
 > On 13.12.10 22:10, Reinoud Zandijk wrote:
 > >  Note that it can only trigger on NFSv3
 > >  
 > >  -----
 > >  luiaard# mail
 > >  panic: kernel diagnostic assertion "mb->m_next != NULL" failed: file
 > >  "../../../../nfs/nfs_vnops.c", line 1377
 > 
 > Does it hang in cv_wait() in nfs_writerpc() when you remove the KASSERT?
 > Do you use nfs over tcp or udp?

 Its main mounts are :
 aaa.local:/usr/exports/luiaard         / nfs rw 0 0
 bbb.local:/home                        /home nfs rw 0 0

 and the servers are running with:
 /usr/sbin/nfsd -6 -u -t -n 6 

 When i remove the KASSERT() it just seems to work; tried to replicate it but
 it won't crash nor hang.

 With regards,
 Reinoud

From: Christoph Egger <Christoph_Egger@gmx.de>
To: Reinoud Zandijk <reinoud@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, martin@NetBSD.org
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Tue, 14 Dec 2010 00:05:06 +0100

 On 14.12.10 00:00, Reinoud Zandijk wrote:
 > Hi Christoph,
 > 
 > On Mon, Dec 13, 2010 at 11:34:52PM +0100, Christoph Egger wrote:
 >> On 13.12.10 22:10, Reinoud Zandijk wrote:
 >>>  Note that it can only trigger on NFSv3
 >>>  
 >>>  -----
 >>>  luiaard# mail
 >>>  panic: kernel diagnostic assertion "mb->m_next != NULL" failed: file
 >>>  "../../../../nfs/nfs_vnops.c", line 1377
 >>
 >> Does it hang in cv_wait() in nfs_writerpc() when you remove the KASSERT?
 >> Do you use nfs over tcp or udp?
 > 
 > Its main mounts are :
 > aaa.local:/usr/exports/luiaard         / nfs rw 0 0
 > bbb.local:/home                        /home nfs rw 0 0

 Are those over tcp or udp ?

 > and the servers are running with:
 > /usr/sbin/nfsd -6 -u -t -n 6 
 > 
 > When i remove the KASSERT() it just seems to work; tried to replicate it but
 > it won't crash nor hang.

 Ok, thank you for the information.

 Christoph

From: Martin Husemann <martin@duskware.de>
To: Christoph Egger <Christoph_Egger@gmx.de>
Cc: gnats-bugs@NetBSD.org, Reinoud Zandijk <reinoud@NetBSD.org>,
	kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, martin@NetBSD.org
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Tue, 14 Dec 2010 09:39:32 +0100

 On Mon, Dec 13, 2010 at 11:34:52PM +0100, Christoph Egger wrote:
 > Does it hang in cv_wait() in nfs_writerpc() when you remove the KASSERT?
 > Do you use nfs over tcp or udp?

 For the record: I'm using NFS over UDP and removing the KASSERT fixes the
 problem for me, no hangs.

 Martin

From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, martin@NetBSD.org, netbsd-bugs@netbsd.org,
 gnats-admin@netbsd.org, kern-bug-people@netbsd.org
Cc: 
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Tue, 14 Dec 2010 13:45:15 +0100

 >  On Mon, Dec 13, 2010 at 11:34:52PM +0100, Christoph Egger wrote:
 >  > Does it hang in cv_wait() in nfs_writerpc() when you
 >  > remove the KASSERT?
 >  > Do you use nfs over tcp or udp?
 >  
 >  For the record: I'm using NFS over UDP and removing
 >  the KASSERT fixes the problem for me, no hangs.

 Thank you for the information. Can you try if you
 can trigger the KASSERT with NFS over TCP, please?
 If so will removing the KASSERT cause a hang
 in cv_wait() in nfs_writerpc() then?

 Christoph

From: Martin Husemann <martin@duskware.de>
To: Christoph Egger <Christoph_Egger@gmx.de>
Cc: gnats-bugs@NetBSD.org, martin@NetBSD.org, netbsd-bugs@netbsd.org,
	gnats-admin@netbsd.org, kern-bug-people@netbsd.org
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Tue, 14 Dec 2010 16:48:29 +0100

 On Tue, Dec 14, 2010 at 01:45:15PM +0100, Christoph Egger wrote:
 > If so will removing the KASSERT cause a hang
 > in cv_wait() in nfs_writerpc() then?

 With a TCP mount I can still trigger the KASSERT. Removing it makes the
 machine work stable.

 Martin

From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, martin@NetBSD.org, netbsd-bugs@netbsd.org,
 gnats-admin@netbsd.org, kern-bug-people@netbsd.org
Cc: 
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Tue, 14 Dec 2010 17:14:58 +0100

 >  On Tue, Dec 14, 2010 at 01:45:15PM +0100, Christoph Egger wrote:
 >  > If so will removing the KASSERT cause a hang
 >  > in cv_wait() in nfs_writerpc() then?
 >  
 >  With a TCP mount I can still trigger the KASSERT. Removing it
 >  makes the machine work stable.

 Ok. Which network driver are you using?

 Christoph

From: Martin Husemann <martin@duskware.de>
To: Christoph Egger <Christoph_Egger@gmx.de>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/44206: reproducable (for me) NFS panic
Date: Tue, 14 Dec 2010 17:17:08 +0100

 On Tue, Dec 14, 2010 at 05:14:58PM +0100, Christoph Egger wrote:
 > Ok. Which network driver are you using?

 gem(4)

 Martin

From: "Christoph Egger" <cegger@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/44206 CVS commit: src/sys/nfs
Date: Tue, 14 Dec 2010 16:25:19 +0000

 Module Name:	src
 Committed By:	cegger
 Date:		Tue Dec 14 16:25:19 UTC 2010

 Modified Files:
 	src/sys/nfs: nfs_vnops.c

 Log Message:
 back out rev. 1.285. The problem I try to hunt down
 in PR 42455 is not in the network stack as shown by PR 44206.


 To generate a diff of this commit:
 cvs rdiff -u -r1.287 -r1.288 src/sys/nfs/nfs_vnops.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: martin@NetBSD.org
State-Changed-When: Tue, 14 Dec 2010 16:42:14 +0000
State-Changed-Why:
the KASSERT is gone


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.