NetBSD Problem Report #42455
From www@NetBSD.org Tue Dec 15 09:37:12 2009
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id 7FD8D63C3A9
for <gnats-bugs@gnats.NetBSD.org>; Tue, 15 Dec 2009 09:37:12 +0000 (UTC)
Message-Id: <20091215093712.3EC3F63B844@www.NetBSD.org>
Date: Tue, 15 Dec 2009 09:37:12 +0000 (UTC)
From: Christoph_Egger@gmx.de
Reply-To: Christoph_Egger@gmx.de
To: gnats-bugs@NetBSD.org
Subject: tstile hang with nfs
X-Send-Pr-Version: www-1.0
>Number: 42455
>Category: kern
>Synopsis: tstile hang with nfs
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Dec 15 09:40:01 +0000 2009
>Last-Modified: Sat Feb 04 14:25:05 +0000 2012
>Originator: Christoph Egger
>Release: NetBSD 5.99.22
>Organization:
>Environment:
NetBSD 5.99.22/Xen amd64
>Description:
The Dom0 has guest images on nfs.
The DomUs do IO on vnd(4). They suddenly freeze while they do IO.
When this happens, commands like 'ls' still work in the Dom0
but comands like 'sync' hang in tstile then and don't give the
shell prompt back.
>How-To-Repeat:
Boot Xen Dom0 and launch a DomU. The guest image for the DomU
must be on nfs.
Do heavy IO in the DomU and wait for the freeze (may take a few
minutes, may also take a few hours).
You notice the freeze when the DomU doesn't react on interrupts
anymore.
Then in the Dom0, commands like 'ls' in the nfs mount works, but
'sync' hangs in tstile.
>Fix:
>Audit-Trail:
From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
kern-bug-people@netbsd.org
Cc:
Subject: Re: kern/42455: tstile hang with nfs
Date: Tue, 12 Jan 2010 17:27:21 +0100
I have a backtrace of the 'sync' process:
trace: pid 1068 lid 1 at 0xffffa00026ba7a00
sleepq_block() at netbsd:sleepq_block+0xbf
turnstile_block() at netbsd:turnstile_block+0x2c7
rw_enter() at netbsd:rw_enter+0x1d7
vlockmgr() at netbsd:vlockmgr+0xdf
VOP_LOCK() at netbsd:VOP_LOCK+0x28
vn_lock() at netbsd:vn_lock+0xd7
vget() at netbsd:vget+0xeb
nfs_sync() at netbsd:nfs_sync+0xba
sys_sync() at netbsd:sys_sync+0xe6
syscall() at netbsd:syscall+0xa8
Christoph
From: David Holland <dholland-bugs@netbsd.org>
To: Christoph Egger <Christoph_Egger@gmx.de>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/42455: tstile hang with nfs
Date: Tue, 12 Jan 2010 16:29:20 +0000
On Tue, Jan 12, 2010 at 05:27:21PM +0100, Christoph Egger wrote:
> I have a backtrace of the 'sync' process:
> [snip]
Sure, it's blocked on a vnode lock. What's holding the vnode lock?
--
David A. Holland
dholland@netbsd.org
From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
kern-bug-people@netbsd.org
Cc:
Subject: Re: kern/42455: tstile hang with nfs
Date: Tue, 12 Jan 2010 18:20:23 +0100
db> ps /a
PID COMMAND STRUCT PROC * UAREA * VMSPACE/VM_MAP
1068 sync ffffa00026b8a8a0 ffffa00026ba7d80 ffffa000264e55c0
[...]
db> show files 0xffffa00026b8a8a0
OBJECT 0xffffa00026bc6ac0: locked=0, pgops=0xffffffff808c7b00, npages=0, refs=2
VNODE flags 0x4<ISTTY>
mp 0xffffa000269dd008 numoutput 0 size 0x0 writesize 0x0
data 0xffffa00001261380 writecount 1 holdcnt 0
tag VT_PTYFS(24) type VCHR(4) mount 0xffffa000269dd008 typedata 0xffffa000265921
28
v_lock 0xffffa00026bc6bc8 v_vnlock 0xffffa00026bc6bc8
v_uobj.vmobjlock lock details:
lock address : 0xffffa00026bc6ac0 type : sleep/adaptive
initialized : 0xffffffff806850ea
shared holds : 0 exclusive: 0
shares wanted: 0 exclusive: 0
current cpu : 0 last held: 0
current lwp : 0xffffa0002320eba0 last held: 000000000000000000
last locked : 0xffffffff8051b18f unlocked : 0xffffffff8051b1b5
owner field : 000000000000000000 wait/spin: 0/0
Turnstile chain at 0xffffffff80c24600.
=> No active turnstile for this lock.
OBJECT 0xffffa00026bc6ac0: locked=0, pgops=0xffffffff808c7b00, npages=0, refs=2
VNODE flags 0x4<ISTTY>
mp 0xffffa000269dd008 numoutput 0 size 0x0 writesize 0x0
data 0xffffa00001261380 writecount 1 holdcnt 0
tag VT_PTYFS(24) type VCHR(4) mount 0xffffa000269dd008 typedata 0xffffa000265921
28
v_lock 0xffffa00026bc6bc8 v_vnlock 0xffffa00026bc6bc8
v_uobj.vmobjlock lock details:
lock address : 0xffffa00026bc6ac0 type : sleep/adaptive
initialized : 0xffffffff806850ea
shared holds : 0 exclusive: 0
shares wanted: 0 exclusive: 0
current cpu : 0 last held: 0
current lwp : 0xffffa0002320eba0 last held: 000000000000000000
last locked : 0xffffffff8051b18f unlocked : 0xffffffff8051b1b5
owner field : 000000000000000000 wait/spin: 0/0
Turnstile chain at 0xffffffff80c24600.
=> No active turnstile for this lock.
OBJECT 0xffffa00026bc6ac0: locked=0, pgops=0xffffffff808c7b00, npages=0, refs=2
VNODE flags 0x4<ISTTY>
mp 0xffffa000269dd008 numoutput 0 size 0x0 writesize 0x0
data 0xffffa00001261380 writecount 1 holdcnt 0
tag VT_PTYFS(24) type VCHR(4) mount 0xffffa000269dd008 typedata 0xffffa000265921
28
v_lock 0xffffa00026bc6bc8 v_vnlock 0xffffa00026bc6bc8
v_uobj.vmobjlock lock details:
lock address : 0xffffa00026bc6ac0 type : sleep/adaptive
initialized : 0xffffffff806850ea
shared holds : 0 exclusive: 0
shares wanted: 0 exclusive: 0
current cpu : 0 last held: 0
current lwp : 0xffffa0002320eba0 last held: 000000000000000000
last locked : 0xffffffff8051b18f unlocked : 0xffffffff8051b1b5
owner field : 000000000000000000 wait/spin: 0/0
Turnstile chain at 0xffffffff80c24600.
=> No active turnstile for this lock.
db> show lock 0xffffa00026bc6bc8
lock address : 0xffffa00026bc6bc8 type : sleep/adaptive
initialized : 0xffffffff8068516b
shared holds : 0 exclusive: 0
shares wanted: 0 exclusive: 0
current cpu : 0 last held: 0
current lwp : 0xffffa0002320eba0 last held: 000000000000000000
last locked : 0xffffffff806829f1 unlocked : 0xffffffff80682a67
owner/count : 000000000000000000 flags : 0x0000000000000008
Turnstile chain at 0xffffffff80c24810.
=> No active turnstile for this lock.
How should I proceed to find out the vnode lock holder ?
Christoph
From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
kern-bug-people@netbsd.org
Cc:
Subject: Re: kern/42455: tstile hang with nfs
Date: Fri, 30 Apr 2010 13:24:53 +0200
db> ps /w
PID LID COMMAND EMUL PRI WAIT-MSG WAIT-CHANNEL
1206 1 sync netbsd 43 tstile ffffa00026b52498
327 237 qemu-dm netbsd 43 0
327 2 qemu-dm netbsd 43 netio ffffa0000114f3b8
327 1 qemu-dm netbsd 34 genput ffffa000005a0840
db> show lock 0xffffa00026b52498
lock address : 0xffffa00026b52498 type : sleep/adaptive
initialized : 0xffffffff80693b1b
shared holds : 0 exclusive: 1
shares wanted: 0 exclusive: 1
current cpu : 0 last held: 0
current lwp : 0xffffa0002320ec00 last held: 0xffffa00026628400
last locked : 0xffffffff8069161d unlocked : 0xffffffff80691696
owner/count : 0xffffa00026628400 flags : 0x0000000000000007
Turnstile chain at 0xffffffff80c383f0.
=> Turnstile at 0xffffa000260de308 (wrq=0xffffa000260de328, rdq=0xffffa000260de3
38).
=> 0 waiting readers:
=> 1 waiting writers: 0xffffa00026b57000
This is the hanging 'sync' process:
db> tr /a 0xffffa00026b57000
trace: pid 1206 lid 1 at 0xffffa00026bffa00
sleepq_block() at netbsd:sleepq_block+0xbf
turnstile_block() at netbsd:turnstile_block+0x2c7
rw_enter() at netbsd:rw_enter+0x1d0
vlockmgr() at netbsd:vlockmgr+0xe2
VOP_LOCK() at netbsd:VOP_LOCK+0x28
vn_lock() at netbsd:vn_lock+0xd5
vget() at netbsd:vget+0xe9
nfs_sync() at netbsd:nfs_sync+0xb9
sys_sync() at netbsd:sys_sync+0xe6
syscall() at netbsd:syscall+0xa8
This is the hanging 'qemu-dm' process:
db> tr /a 0xffffa00026628400
trace: pid 327 lid 1 at 0xffffa0002661e880
sleepq_block() at netbsd:sleepq_block+0xbf
mtsleep() at netbsd:mtsleep+0x128
genfs_do_putpages() at netbsd:genfs_do_putpages+0x7f9
VOP_PUTPAGES() at netbsd:VOP_PUTPAGES+0x30
nfs_flush() at netbsd:nfs_flush+0x27
VOP_FSYNC() at netbsd:VOP_FSYNC+0x34
sys_fsync() at netbsd:sys_fsync+0x53
syscall() at netbsd:syscall+0xa8
So it is qemu-dm who hangs first and any subsequent 'sync'
call hangs, too.
That explains why the Xen DomU suddendly hangs during IO.
The question is why does the 'sync' syscall hang in first
place ?
Christoph
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/42455: tstile hang with nfs
Date: Sun, 2 May 2010 01:44:16 +0000
On Fri, Apr 30, 2010 at 11:25:02AM +0000, Christoph Egger wrote:
> 327 2 qemu-dm netbsd 43 netio ffffa0000114f3b8
> 327 1 qemu-dm netbsd 34 genput ffffa000005a0840
>
> [...]
> This is the hanging 'qemu-dm' process:
>
> db> tr /a 0xffffa00026628400
> trace: pid 327 lid 1 at 0xffffa0002661e880
>
> [...]
> The question is why does the 'sync' syscall hang in first
> place ?
It looks to me as if pid 327 lid 1 is probably waiting for pid 327 lid
2, which is hanging on the network. It's blocked in genfs_do_putpages
waiting for the page it's trying to write out to become unbusy, and
lid 2 there is a clear candidate for holding it.
I don't immediately see how to inspect the uvm structures to find out
whether this is in fact the case, but maybe someone who knows more
about uvm can help.
(btw, sorry I never got around to answering your previous mail in this
PR... it's been marked unread-and-urgent the last three months. sigh)
--
David A. Holland
dholland@netbsd.org
From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
kern-bug-people@netbsd.org
Cc:
Subject: Re: kern/42455: tstile hang with nfs
Date: Mon, 03 May 2010 18:10:21 +0200
I reproduced it again:
db> ps /w
PID LID COMMAND EMUL PRI WAIT-MSG WAIT-CHANNEL
619 732 qemu-dm netbsd 43 0
619 2 qemu-dm netbsd 43 netio ffffa00001310a88
619 1 qemu-dm netbsd 34 genput ffffa00000415d40
db> show lock 0xffffa00001310a88
lock address : 0xffffa00001310a88 type : spin
initialized : 0xffffffff806449e3 interlock: 0xffffa000260da780
'initialized' points to sys/kern/uipc_socket2.c:326
db> show lock 0xffffa000260da780
lock address : 0xffffa000260da780 type : sleep/adaptive
initialized : 0xffffffff80405e28
shared holds : 0 exclusive: 0
shares wanted: 0 exclusive: 0
current cpu : 0 last held: 0
current lwp : 0xffffa0002320ec00 last held: 000000000000000000
last locked : 0xffffffff8064238e unlocked : 0xffffffff803ef060
owner field : 000000000000000000 wait/spin: 0/0
Turnstile chain at 0xffffffff80c385c0.
=> No active turnstile for this lock.
'initialized' points to sys/kern/kern_mutex_obj.c:92 ,
'last locked' points to sys/sys/socketvar.h:475
and 'unlocked' points to sys/kern/kern_condvar.c:154
From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
kern-bug-people@netbsd.org
Cc:
Subject: Re: kern/42455: tstile hang with nfs
Date: Tue, 04 May 2010 09:00:08 +0200
>
> I reproduced it again:
>
> db> ps /w
> PID LID COMMAND EMUL PRI WAIT-MSG WAIT-CHANNEL
> 619 732 qemu-dm netbsd 43 0
> 619 2 qemu-dm netbsd 43 netio ffffa00001310a88
> 619 1 qemu-dm netbsd 34 genput ffffa00000415d40
>
> db> show lock 0xffffa00001310a88
> lock address : 0xffffa00001310a88 type : spin
> initialized : 0xffffffff806449e3 interlock: 0xffffa000260da780
>
> 'initialized' points to sys/kern/uipc_socket2.c:326
>
> db> show lock 0xffffa000260da780
> lock address : 0xffffa000260da780 type : sleep/adaptive
> initialized : 0xffffffff80405e28
> shared holds : 0 exclusive: 0
> shares wanted: 0 exclusive: 0
> current cpu : 0 last held: 0
> current lwp : 0xffffa0002320ec00 last held: 000000000000000000
> last locked : 0xffffffff8064238e unlocked : 0xffffffff803ef060
> owner field : 000000000000000000 wait/spin: 0/0
>
> Turnstile chain at 0xffffffff80c385c0.
> => No active turnstile for this lock.
>
> 'initialized' points to sys/kern/kern_mutex_obj.c:92 ,
> 'last locked' points to sys/sys/socketvar.h:475
> and 'unlocked' points to sys/kern/kern_condvar.c:154
>
some more information:
db> ps /l
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
619 732 5 0 0 ffffa0002666d800 qemu-dm
619 2 3 0 80 ffffa00026c24800 qemu-dm netio
619 1 3 0 0 ffffa00026ba6c00 qemu-dm genput
db> tr /a ffffa00026c24800
trace: pid 619 lid 2 at 0xffffa00026c4e9f0
sleepq_block() at netbsd:sleepq_block+0xbf
cv_timedwait_sig() at netbsd:cv_timedwait_sig+0x10c
sbwait() at netbsd:sbwait+0x73
soreceive() at netbsd:soreceive+0xb0d
dofileread() at netbsd:dofileread+0x6f
sys_read() at netbsd:sys_read+0x73
syscall() at netbsd:syscall+0xa8
From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
kern-bug-people@netbsd.org
Cc:
Subject: Re: kern/42455: tstile hang with nfs
Date: Tue, 04 May 2010 09:50:09 +0200
I used nfs with tcp and r/w size = 8192.
I tried with r/w size = 1024 and the issue is a lot easier
to reproduce.
From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
kern-bug-people@netbsd.org
Cc:
Subject: Re: kern/42455: tstile hang with nfs
Date: Tue, 04 May 2010 11:07:03 +0200
netstat shows that bridge(4) is unable to send out data.
The Oerrs increase very fast.
netstat also shows that bge(4) still receive packets.
# netstat -in
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls
bge0 1500 <Link> 00:e0:81:80:6f:be 277851 0 358606 0 0
bge0 1500 fe80::/64 fe80::2e0:81ff:fe 277851 0 358606 0 0
bge0 1500 165.204.15/24 165.204.15.79 277851 0 358606 0 0
bge1* 1500 <Link> 00:e0:81:80:6f:bf 0 0 0 0 0
lo0 33648 <Link> 0 0 0 0 0
lo0 33648 127/8 127.0.0.1 0 0 0 0 0
lo0 33648 ::1/128 ::1 0 0 0 0 0
lo0 33648 fe80::/64 fe80::1 0 0 0 0 0
bridg 1500 <Link> 21358 0 361184 0 0
xvif1 1500 <Link> 00:16:3e:6e:39:6f 0 0 0 0 0
xvif1 1500 fe80::/64 fe80::216:3eff:fe 0 0 0 0 0
tap0 1500 <Link> f2:0b:a4:6d:d8:04 5275 0 15769 0 0
tap0 1500 fe80::/64 fe80::f00b:a4ff:f 5275 0 15769 0 0
# netstat -in
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls
bge0 1500 <Link> 00:e0:81:80:6f:be 277902 0 358621 0 0
bge0 1500 fe80::/64 fe80::2e0:81ff:fe 277902 0 358621 0 0
bge0 1500 165.204.15/24 165.204.15.79 277902 0 358621 0 0
bge1* 1500 <Link> 00:e0:81:80:6f:bf 0 0 0 0 0
lo0 33648 <Link> 0 0 0 0 0
lo0 33648 127/8 127.0.0.1 0 0 0 0 0
lo0 33648 ::1/128 ::1 0 0 0 0 0
lo0 33648 fe80::/64 fe80::1 0 0 0 0 0
bridg 1500 <Link> 21399 0 361210 24 0
xvif1 1500 <Link> 00:16:3e:6e:39:6f 0 0 0 0 0
xvif1 1500 fe80::/64 fe80::216:3eff:fe 0 0 0 0 0
tap0 1500 <Link> f2:0b:a4:6d:d8:04 5275 0 15769 0 0
tap0 1500 fe80::/64 fe80::f00b:a4ff:f 5275 0 15769 0 0
# netstat -in
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls
bge0 1500 <Link> 00:e0:81:80:6f:be 278000 0 358667 0 0
bge0 1500 fe80::/64 fe80::2e0:81ff:fe 278000 0 358667 0 0
bge0 1500 165.204.15/24 165.204.15.79 278000 0 358667 0 0
bge1* 1500 <Link> 00:e0:81:80:6f:bf 0 0 0 0 0
lo0 33648 <Link> 0 0 0 0 0
lo0 33648 127/8 127.0.0.1 0 0 0 0 0
lo0 33648 ::1/128 ::1 0 0 0 0 0
lo0 33648 fe80::/64 fe80::1 0 0 0 0 0
bridg 1500 <Link> 21452 0 361256 77 0
xvif1 1500 <Link> 00:16:3e:6e:39:6f 0 0 0 0 0
xvif1 1500 fe80::/64 fe80::216:3eff:fe 0 0 0 0 0
tap0 1500 <Link> f2:0b:a4:6d:d8:04 5275 0 15769 0 0
tap0 1500 fe80::/64 fe80::f00b:a4ff:f 5275 0 15769 0 0
# netstat -in
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls
bge0 1500 <Link> 00:e0:81:80:6f:be 278020 0 358683 0 0
bge0 1500 fe80::/64 fe80::2e0:81ff:fe 278020 0 358683 0 0
bge0 1500 165.204.15/24 165.204.15.79 278020 0 358683 0 0
bge1* 1500 <Link> 00:e0:81:80:6f:bf 0 0 0 0 0
lo0 33648 <Link> 0 0 0 0 0
lo0 33648 127/8 127.0.0.1 0 0 0 0 0
lo0 33648 ::1/128 ::1 0 0 0 0 0
lo0 33648 fe80::/64 fe80::1 0 0 0 0 0
bridg 1500 <Link> 21457 0 361272 82 0
xvif1 1500 <Link> 00:16:3e:6e:39:6f 0 0 0 0 0
xvif1 1500 fe80::/64 fe80::216:3eff:fe 0 0 0 0 0
tap0 1500 <Link> f2:0b:a4:6d:d8:04 5275 0 15769 0 0
tap0 1500 fe80::/64 fe80::f00b:a4ff:f 5275 0 15769 0 0
# netstat -in
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls
bge0 1500 <Link> 00:e0:81:80:6f:be 278041 0 358698 0 0
bge0 1500 fe80::/64 fe80::2e0:81ff:fe 278041 0 358698 0 0
bge0 1500 165.204.15/24 165.204.15.79 278041 0 358698 0 0
bge1* 1500 <Link> 00:e0:81:80:6f:bf 0 0 0 0 0
lo0 33648 <Link> 0 0 0 0 0
lo0 33648 127/8 127.0.0.1 0 0 0 0 0
lo0 33648 ::1/128 ::1 0 0 0 0 0
lo0 33648 fe80::/64 fe80::1 0 0 0 0 0
bridg 1500 <Link> 21464 0 361287 89 0
xvif1 1500 <Link> 00:16:3e:6e:39:6f 0 0 0 0 0
xvif1 1500 fe80::/64 fe80::216:3eff:fe 0 0 0 0 0
tap0 1500 <Link> f2:0b:a4:6d:d8:04 5275 0 15769 0 0
tap0 1500 fe80::/64 fe80::f00b:a4ff:f 5275 0 15769 0 0
# netstat -in
Name Mtu Network Address Ipkts Ierrs Opkts Oerrs Colls
bge0 1500 <Link> 00:e0:81:80:6f:be 278530 0 358977 0 0
bge0 1500 fe80::/64 fe80::2e0:81ff:fe 278530 0 358977 0 0
bge0 1500 165.204.15/24 165.204.15.79 278530 0 358977 0 0
bge1* 1500 <Link> 00:e0:81:80:6f:bf 0 0 0 0 0
lo0 33648 <Link> 0 0 0 0 0
lo0 33648 127/8 127.0.0.1 0 0 0 0 0
lo0 33648 ::1/128 ::1 0 0 0 0 0
lo0 33648 fe80::/64 fe80::1 0 0 0 0 0
bridg 1500 <Link> 21703 0 361566 307 0
xvif1 1500 <Link> 00:16:3e:6e:39:6f 0 0 0 0 0
xvif1 1500 fe80::/64 fe80::216:3eff:fe 0 0 0 0 0
tap0 1500 <Link> f2:0b:a4:6d:d8:04 5275 0 15769 0 0
tap0 1500 fe80::/64 fe80::f00b:a4ff:f 5275 0 15769 0 0
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Christoph Egger <Christoph_Egger@gmx.de>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org, gnats-admin@NetBSD.org,
kern-bug-people@NetBSD.org
Subject: Re: kern/42455: tstile hang with nfs
Date: Tue, 4 May 2010 13:24:05 +0200
On Tue, May 04, 2010 at 11:07:03AM +0200, Christoph Egger wrote:
>
> netstat shows that bridge(4) is unable to send out data.
> The Oerrs increase very fast.
>
> netstat also shows that bge(4) still receive packets.
What do netstat -m and vmstat -m show ?
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
kern-bug-people@netbsd.org
Cc:
Subject: Re: kern/42455: tstile hang with nfs
Date: Tue, 04 May 2010 14:24:53 +0200
> What do netstat -m and vmstat -m show ?
# netstat -m
341 mbufs in use:
329 mbufs allocated to data
6 mbufs allocated to packet headers
6 mbufs allocated to socket names and addresses
0 calls to protocol drain routines
# netstat -m
354 mbufs in use:
342 mbufs allocated to data
6 mbufs allocated to packet headers
6 mbufs allocated to socket names and addresses
0 calls to protocol drain routines
# netstat -m
359 mbufs in use:
347 mbufs allocated to data
6 mbufs allocated to packet headers
6 mbufs allocated to socket names and addresses
0 calls to protocol drain routines
# netstat -m
365 mbufs in use:
353 mbufs allocated to data
6 mbufs allocated to packet headers
6 mbufs allocated to socket names and addresses
0 calls to protocol drain routines
So more and more mbufs are allocated by then.
# vmstat -m
Memory statistics by bucket size
Size In Use Free Requests HighWater Couldfree
32 1344 64 6965 640 0
64 263 57 4763 320 0
128 2838 10 3340 160 0
256 175 1 299 80 0
512 32 0 39 40 0
1024 641 3 991 20 0
2048 278 0 288 10 0
4096 21 1 39 5 0
8192 5 0 6 5 0
16384 3 0 21 5 0
32768 1 0 1 5 0
65536 4 0 4 5 0
131072 1 0 1 5 0
Memory usage type by bucket size
Size Type(s)
32 ptyfs mount, kernfs mount, USB, USB device, soname, sockaddr, vmem,
NETSMBDEV, sysctldata, devbuf, pcb, temp, routetbl, in_multi, ifmedia,
ifaddr, ether_multi, acpi
64 USB, packet tags, tcpcongctl, RAIDframe, NDP, sysctldata, devbuf, pcb,
temp, routetbl, fragtbl, ifaddr, acpi
128 USB, USB device, soname, vmem, NDP, sysctlnode, sysctldata, devbuf,
DMA map, pcb, temp, acpi
256 prop dictionary, VM page, USB, USB device, sysctldata, devbuf, temp,
routetbl, ip_moptions, in_multi, ifaddr, acpi
512 USB, sysctldata, devbuf, DMA map, temp, ifaddr, acpi
1024 VM map, USB, vmem, sysctlnode, sysctldata, devbuf, DMA map, temp,
crypto, acpi
2048 USB device, RAIDframe, sysctlnode, sysctldata, devbuf, DMA map, temp
4096 vmem, RAIDframe, sysctlnode, devbuf, temp
8192 USB, vmem, sysctlnode, devbuf
16384 sysctlnode, devbuf
32768 devbuf
65536 devbuf
131072 devbuf
Memory statistics by type Type Kern
Type InUse MemUse HighUse Limit Requests Limit Limit Size(s)
ptyfs mount 1 1K 1K 314573K 1 0 0 32
kernfs mount 1 1K 1K 314573K 1 0 0 32
prop dictionary 95 24K 24K 314573K 97 0 0 256
VM page 1 1K 1K 314573K 1 0 0 256
VM map 2 2K 2K 314573K 2 0 0 1024
USB 45 16K 16K 314573K 51 0 0 32,64,128,256,512,1024,8192
USB device 18 19K 19K 314573K 18 0 0 32,128,256,2048
soname 32 4K 5K 314573K 101 0 0 32,128
packet tags 0 0K 1K 314573K 28 0 0 64
sockaddr 78 3K 3K 314573K 185 0 0 32
tcpcongctl 2 1K 1K 314573K 2 0 0 64
vmem 8 25K 26K 314573K 12 0 0 32,128,1024,4096,8192
NETSMBDEV 1 1K 1K 314573K 1 0 0 32
RAIDframe 10 35K 35K 314573K 10 0 0 64,2048,4096
NDP 10 1K 1K 314573K 12 0 0 64,128
sysctlnode 139 191K 191K 314573K 170 0 0 128,1024,2048,4096,8192,16384
sysctldata 85 9K 9K 314573K 119 0 0 32,64,128,256,512,1024,2048
devbuf 417 304K 307K 314573K 2224 0 0 32,64,128,256,512,1024,2048,4096,8192,16384,32768,65536,131072
DMA map 1072 1064K 1064K 314573K 1072 0 0 128,512,1024,2048
pcb 95 12K 13K 314573K 172 0 0 32,64,128
temp 20 7K 8K 314573K 321 0 0 32,64,128,256,512,1024,2048,4096
routetbl 40 7K 7K 314573K 76 0 0 32,64,256
fragtbl 0 0K 1K 314573K 131 0 0 64
ip_moptions 0 0K 1K 314573K 1 0 0 256
in_multi 38 5K 5K 314573K 38 0 0 32,256
ifmedia 23 1K 1K 314573K 23 0 0 32
ifaddr 39 10K 10K 314573K 40 0 0 32,64,256,512
ether_multi 9 1K 1K 314573K 10 0 0 32
crypto 1 1K 1K 314573K 1 0 0 1024
acpi 3324 330K 332K 314573K 11837 0 0 32,64,128,256,512,1024
Memory totals: In Use Free Requests
2062K 15K 16757
Memory resource pool statistics
Name Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
amappl 80 591 0 0 12 0 12 12 0 inf 0
anonpl 24 11407 0 0 68 0 68 68 0 inf 0
aobj 128 10 0 0 1 0 1 1 0 inf 0
ataspl 96 21 0 21 1 0 1 1 0 inf 1
biopl 280 6 0 0 1 0 1 1 0 inf 0
brtpl 56 91 0 5 2 0 2 2 0 inf 0
buf1k 1024 53 0 0 1 0 1 1 1 1 0
buf2k 2048 3 0 0 1 0 1 1 1 1 0
buf8k 8192 62 0 0 8 0 8 8 1 1 0
bufpl 280 118 0 0 9 0 9 9 0 inf 0
cwdi 40 25 0 0 1 0 1 1 0 inf 0
dirent_tmpfs_0xffffa000265f6008 40 1 0 0 1 0 1 1 0 inf 0
execargs 262144 267 0 267 1 0 1 1 0 16 1
fdfile 48 371 0 0 5 0 5 5 0 inf 0
file 96 192 0 0 5 0 5 5 0 inf 0
filedesc 776 25 0 0 5 0 5 5 0 inf 0
in6pcbpl 232 16 0 15 1 0 1 1 0 inf 0
inmltpl 48 3 0 1 1 0 1 1 0 inf 0
inpcbpl 192 58 0 53 1 0 1 1 0 inf 0
ipqepl 80 1404 0 1404 1 0 1 1 0 inf 1
kcredpl 176 173 0 0 8 0 8 8 0 inf 0
kmem-1024 1024 331 0 0 331 0 331 331 0 inf 0
kmem-112 112 137 0 0 16 0 16 16 0 inf 0
kmem-128 128 23 0 0 3 0 3 3 0 inf 0
kmem-144 144 186 0 0 27 0 27 27 0 inf 0
kmem-1536 1536 34 0 0 34 0 34 34 0 inf 0
kmem-16 16 1 0 0 1 0 1 1 0 inf 0
kmem-168 168 81 0 0 14 0 14 14 0 inf 0
kmem-200 200 29 0 0 6 0 6 6 0 inf 0
kmem-2048 2048 78 0 0 78 0 78 78 0 inf 0
kmem-24 24 607 0 0 15 0 15 15 0 inf 0
kmem-256 256 49 0 0 13 0 13 13 0 inf 0
kmem-2560 2560 22 0 0 22 0 22 22 0 inf 0
kmem-3072 3072 2 0 0 2 0 2 2 0 inf 0
kmem-32 32 384 0 0 12 0 12 12 0 inf 0
kmem-3584 3584 3 0 0 3 0 3 3 0 inf 0
kmem-40 40 65 0 0 3 0 3 3 0 inf 0
kmem-4096 4096 2 0 0 2 0 2 2 0 inf 0
kmem-48 48 169 0 0 9 0 9 9 0 inf 0
kmem-56 56 4146 0 0 231 0 231 231 0 inf 0
kmem-64 64 81 0 0 6 0 6 6 0 inf 0
kmem-72 72 39 0 0 3 0 3 3 0 inf 0
kmem-80 80 310 0 0 26 0 26 26 0 inf 0
kmem-88 88 57 0 0 6 0 6 6 0 inf 0
kmem-96 96 26 0 0 3 0 3 3 0 inf 0
ksiginfo 72 1 0 0 1 0 1 1 0 inf 0
kvakernel 4096 1276 0 0 40 0 40 40 0 inf 0
kvakmem 4096 462 0 0 15 0 15 15 0 inf 0
lockf 112 9 0 0 1 0 1 1 0 inf 0
lwppl 1024 80 0 0 20 0 20 20 0 inf 0
mbpl 512 783 0 0 100 0 100 100 2 inf 2
mclpl 2048 518 0 0 263 0 263 263 4 8192 4
mutex 24 427 0 0 3 0 3 3 0 inf 0
ncache 160 6837 0 0 274 0 274 274 0 inf 0
nfsnodepl 304 3624 0 67 274 0 274 274 0 inf 0
nfsvapl 176 3624 0 67 162 0 162 162 0 inf 0
node_tmpfs_0xffffa000265f6008 208 2 0 0 1 0 1 1 0 inf 0
pcache 896 79 0 14 17 0 17 17 0 inf 0
pcglarge 1024 118 0 0 30 0 30 30 0 inf 0
pcgnormal 256 172 0 0 11 0 11 11 0 inf 0
pdict16 72 8 0 0 1 0 1 1 0 inf 0
pdict32 88 3 0 0 1 0 1 1 0 inf 0
pdppl 4096 25 0 0 25 0 25 25 0 inf 0
phpool-0 56 472 0 0 7 0 7 7 0 inf 0
phpool-64 56 1002 0 0 14 0 14 14 0 inf 0
piperd 312 11 0 0 1 0 1 1 0 inf 0
pipewr 312 12 0 0 1 0 1 1 0 inf 0
plimitpl 208 7 0 0 1 0 1 1 0 inf 0
pmappl 328 25 0 0 3 0 3 3 0 inf 0
pnbufpl 1024 7 0 0 2 0 2 2 0 inf 0
procpl 736 25 0 0 5 0 5 5 0 inf 0
propdict 48 98 0 2 2 0 2 2 0 inf 0
propnmbr 56 24 0 1 1 0 1 1 0 inf 0
propstng 40 91 0 1 1 0 1 1 0 inf 0
pstatspl 448 25 0 0 3 0 3 3 0 inf 0
ptimerpl 264 7 0 2 1 0 1 1 0 inf 0
ptimerspl 280 7 0 2 1 0 1 1 0 inf 0
pvpl 40 13624 0 0 135 0 135 135 0 inf 0
ractx 32 731 0 0 6 0 6 6 0 inf 0
rndsample 536 6 0 0 1 0 1 1 0 inf 0
rtentpl 264 39 0 1 3 0 3 3 0 inf 0
scxspl 256 63 0 63 1 0 1 1 1 inf 1
sigacts 3088 25 0 0 25 0 25 25 0 inf 0
socket 568 106 0 0 16 0 16 16 0 inf 0
str_tmpfs_0xffffa000265f6008 24 1 0 0 1 0 1 1 0 inf 0
synpl 280 1 0 1 1 0 1 1 0 inf 1
tcpcbpl 792 23 0 17 3 0 3 3 0 inf 1
tcpipqepl 80 4 0 4 1 0 1 1 0 inf 1
tstilepl 96 80 0 0 2 0 2 2 0 inf 0
uarea 16384 80 0 0 80 0 80 80 0 inf 0
vmembt 56 1536 0 0 22 0 22 22 0 inf 0
vmmpepl 136 1613 0 0 56 0 56 56 0 inf 0
vmsppl 368 25 0 0 3 0 3 3 0 inf 0
vndxpl 288 4 0 4 1 0 1 1 0 inf 1
vndxpl 288 3 0 3 1 0 1 1 0 inf 1
vnodepl 304 3584 0 0 276 0 276 276 0 inf 0
In use 10073K, total allocated 10891K; utilization 92.5%
n# vmstat -m
Memory statistics by bucket size
Size In Use Free Requests HighWater Couldfree
32 1344 64 6965 640 0
64 263 57 4766 320 0
128 2838 10 3340 160 0
256 175 1 299 80 0
512 32 0 39 40 0
1024 641 3 991 20 0
2048 278 0 288 10 0
4096 21 1 39 5 0
8192 5 0 6 5 0
16384 3 0 21 5 0
32768 1 0 1 5 0
65536 4 0 4 5 0
131072 1 0 1 5 0
Memory usage type by bucket size
Size Type(s)
32 ptyfs mount, kernfs mount, USB, USB device, soname, sockaddr, vmem,
NETSMBDEV, sysctldata, devbuf, pcb, temp, routetbl, in_multi, ifmedia,
ifaddr, ether_multi, acpi
64 USB, packet tags, tcpcongctl, RAIDframe, NDP, sysctldata, devbuf, pcb,
temp, routetbl, fragtbl, ifaddr, acpi
128 USB, USB device, soname, vmem, NDP, sysctlnode, sysctldata, devbuf,
DMA map, pcb, temp, acpi
256 prop dictionary, VM page, USB, USB device, sysctldata, devbuf, temp,
routetbl, ip_moptions, in_multi, ifaddr, acpi
512 USB, sysctldata, devbuf, DMA map, temp, ifaddr, acpi
1024 VM map, USB, vmem, sysctlnode, sysctldata, devbuf, DMA map, temp,
crypto, acpi
2048 USB device, RAIDframe, sysctlnode, sysctldata, devbuf, DMA map, temp
4096 vmem, RAIDframe, sysctlnode, devbuf, temp
8192 USB, vmem, sysctlnode, devbuf
16384 sysctlnode, devbuf
32768 devbuf
65536 devbuf
131072 devbuf
Memory statistics by type Type Kern
Type InUse MemUse HighUse Limit Requests Limit Limit Size(s)
ptyfs mount 1 1K 1K 314573K 1 0 0 32
kernfs mount 1 1K 1K 314573K 1 0 0 32
prop dictionary 95 24K 24K 314573K 97 0 0 256
VM page 1 1K 1K 314573K 1 0 0 256
VM map 2 2K 2K 314573K 2 0 0 1024
USB 45 16K 16K 314573K 51 0 0 32,64,128,256,512,1024,8192
USB device 18 19K 19K 314573K 18 0 0 32,128,256,2048
soname 32 4K 5K 314573K 101 0 0 32,128
packet tags 0 0K 1K 314573K 30 0 0 64
sockaddr 78 3K 3K 314573K 185 0 0 32
tcpcongctl 2 1K 1K 314573K 2 0 0 64
vmem 8 25K 26K 314573K 12 0 0 32,128,1024,4096,8192
NETSMBDEV 1 1K 1K 314573K 1 0 0 32
RAIDframe 10 35K 35K 314573K 10 0 0 64,2048,4096
NDP 10 1K 1K 314573K 12 0 0 64,128
sysctlnode 139 191K 191K 314573K 170 0 0 128,1024,2048,4096,8192,16384
sysctldata 85 9K 9K 314573K 119 0 0 32,64,128,256,512,1024,2048
devbuf 417 304K 307K 314573K 2224 0 0 32,64,128,256,512,1024,2048,4096,8192,16384,32768,65536,131072
DMA map 1072 1064K 1064K 314573K 1072 0 0 128,512,1024,2048
pcb 95 12K 13K 314573K 172 0 0 32,64,128
temp 20 7K 8K 314573K 322 0 0 32,64,128,256,512,1024,2048,4096
routetbl 40 7K 7K 314573K 76 0 0 32,64,256
fragtbl 0 0K 1K 314573K 131 0 0 64
ip_moptions 0 0K 1K 314573K 1 0 0 256
in_multi 38 5K 5K 314573K 38 0 0 32,256
ifmedia 23 1K 1K 314573K 23 0 0 32
ifaddr 39 10K 10K 314573K 40 0 0 32,64,256,512
ether_multi 9 1K 1K 314573K 10 0 0 32
crypto 1 1K 1K 314573K 1 0 0 1024
acpi 3324 330K 332K 314573K 11837 0 0 32,64,128,256,512,1024
Memory totals: In Use Free Requests
2062K 15K 16760
Memory resource pool statistics
Name Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
amappl 80 591 0 0 12 0 12 12 0 inf 0
anonpl 24 11407 0 0 68 0 68 68 0 inf 0
aobj 128 10 0 0 1 0 1 1 0 inf 0
ataspl 96 21 0 21 1 0 1 1 0 inf 1
biopl 280 6 0 0 1 0 1 1 0 inf 0
brtpl 56 92 0 5 2 0 2 2 0 inf 0
buf1k 1024 53 0 0 1 0 1 1 1 1 0
buf2k 2048 3 0 0 1 0 1 1 1 1 0
buf8k 8192 62 0 0 8 0 8 8 1 1 0
bufpl 280 118 0 0 9 0 9 9 0 inf 0
cwdi 40 25 0 0 1 0 1 1 0 inf 0
dirent_tmpfs_0xffffa000265f6008 40 1 0 0 1 0 1 1 0 inf 0
execargs 262144 268 0 268 1 0 1 1 0 16 1
fdfile 48 371 0 0 5 0 5 5 0 inf 0
file 96 192 0 0 5 0 5 5 0 inf 0
filedesc 776 25 0 0 5 0 5 5 0 inf 0
in6pcbpl 232 16 0 15 1 0 1 1 0 inf 0
inmltpl 48 3 0 1 1 0 1 1 0 inf 0
inpcbpl 192 58 0 53 1 0 1 1 0 inf 0
ipqepl 80 1404 0 1404 1 0 1 1 0 inf 1
kcredpl 176 173 0 0 8 0 8 8 0 inf 0
kmem-1024 1024 331 0 0 331 0 331 331 0 inf 0
kmem-112 112 137 0 0 16 0 16 16 0 inf 0
kmem-128 128 23 0 0 3 0 3 3 0 inf 0
kmem-144 144 186 0 0 27 0 27 27 0 inf 0
kmem-1536 1536 34 0 0 34 0 34 34 0 inf 0
kmem-16 16 1 0 0 1 0 1 1 0 inf 0
kmem-168 168 81 0 0 14 0 14 14 0 inf 0
kmem-200 200 29 0 0 6 0 6 6 0 inf 0
kmem-2048 2048 78 0 0 78 0 78 78 0 inf 0
kmem-24 24 607 0 0 15 0 15 15 0 inf 0
kmem-256 256 49 0 0 13 0 13 13 0 inf 0
kmem-2560 2560 22 0 0 22 0 22 22 0 inf 0
kmem-3072 3072 2 0 0 2 0 2 2 0 inf 0
kmem-32 32 384 0 0 12 0 12 12 0 inf 0
kmem-3584 3584 3 0 0 3 0 3 3 0 inf 0
kmem-40 40 65 0 0 3 0 3 3 0 inf 0
kmem-4096 4096 2 0 0 2 0 2 2 0 inf 0
kmem-48 48 169 0 0 9 0 9 9 0 inf 0
kmem-56 56 4146 0 0 231 0 231 231 0 inf 0
kmem-64 64 81 0 0 6 0 6 6 0 inf 0
kmem-72 72 39 0 0 3 0 3 3 0 inf 0
kmem-80 80 310 0 0 26 0 26 26 0 inf 0
kmem-88 88 57 0 0 6 0 6 6 0 inf 0
kmem-96 96 26 0 0 3 0 3 3 0 inf 0
ksiginfo 72 1 0 0 1 0 1 1 0 inf 0
kvakernel 4096 1276 0 0 40 0 40 40 0 inf 0
kvakmem 4096 462 0 0 15 0 15 15 0 inf 0
lockf 112 9 0 0 1 0 1 1 0 inf 0
lwppl 1024 80 0 0 20 0 20 20 0 inf 0
mbpl 512 783 0 0 100 0 100 100 2 inf 2
mclpl 2048 518 0 0 263 0 263 263 4 8192 4
mutex 24 427 0 0 3 0 3 3 0 inf 0
ncache 160 6837 0 0 274 0 274 274 0 inf 0
nfsnodepl 304 3624 0 67 274 0 274 274 0 inf 0
nfsvapl 176 3624 0 67 162 0 162 162 0 inf 0
node_tmpfs_0xffffa000265f6008 208 2 0 0 1 0 1 1 0 inf 0
pcache 896 79 0 14 17 0 17 17 0 inf 0
pcglarge 1024 118 0 0 30 0 30 30 0 inf 0
pcgnormal 256 172 0 0 11 0 11 11 0 inf 0
pdict16 72 8 0 0 1 0 1 1 0 inf 0
pdict32 88 3 0 0 1 0 1 1 0 inf 0
pdppl 4096 25 0 0 25 0 25 25 0 inf 0
phpool-0 56 472 0 0 7 0 7 7 0 inf 0
phpool-64 56 1002 0 0 14 0 14 14 0 inf 0
piperd 312 11 0 0 1 0 1 1 0 inf 0
pipewr 312 12 0 0 1 0 1 1 0 inf 0
plimitpl 208 7 0 0 1 0 1 1 0 inf 0
pmappl 328 25 0 0 3 0 3 3 0 inf 0
pnbufpl 1024 7 0 0 2 0 2 2 0 inf 0
procpl 736 25 0 0 5 0 5 5 0 inf 0
propdict 48 98 0 2 2 0 2 2 0 inf 0
propnmbr 56 24 0 1 1 0 1 1 0 inf 0
propstng 40 91 0 1 1 0 1 1 0 inf 0
pstatspl 448 25 0 0 3 0 3 3 0 inf 0
ptimerpl 264 7 0 2 1 0 1 1 0 inf 0
ptimerspl 280 7 0 2 1 0 1 1 0 inf 0
pvpl 40 13624 0 0 135 0 135 135 0 inf 0
ractx 32 731 0 0 6 0 6 6 0 inf 0
rndsample 536 6 0 0 1 0 1 1 0 inf 0
rtentpl 264 39 0 1 3 0 3 3 0 inf 0
scxspl 256 63 0 63 1 0 1 1 1 inf 1
sigacts 3088 25 0 0 25 0 25 25 0 inf 0
socket 568 106 0 0 16 0 16 16 0 inf 0
str_tmpfs_0xffffa000265f6008 24 1 0 0 1 0 1 1 0 inf 0
synpl 280 1 0 1 1 0 1 1 0 inf 1
tcpcbpl 792 23 0 17 3 0 3 3 0 inf 1
tcpipqepl 80 4 0 4 1 0 1 1 0 inf 1
tstilepl 96 80 0 0 2 0 2 2 0 inf 0
uarea 16384 80 0 0 80 0 80 80 0 inf 0
vmembt 56 1536 0 0 22 0 22 22 0 inf 0
vmmpepl 136 1613 0 0 56 0 56 56 0 inf 0
vmsppl 368 25 0 0 3 0 3 3 0 inf 0
vndxpl 288 4 0 4 1 0 1 1 0 inf 1
vndxpl 288 3 0 3 1 0 1 1 0 inf 1
vnodepl 304 3584 0 0 276 0 276 276 0 inf 0
In use 10074K, total allocated 10891K; utilization 92.5%
Output of vmstat -m (three times for number comparison):
# vmstat -m
Memory statistics by bucket size
Size In Use Free Requests HighWater Couldfree
32 1344 64 6965 640 0
64 263 57 4767 320 0
128 2838 10 3344 160 0
256 175 1 299 80 0
512 32 0 39 40 0
1024 641 3 991 20 0
2048 278 0 288 10 0
4096 21 1 39 5 0
8192 5 0 6 5 0
16384 3 0 21 5 0
32768 1 0 1 5 0
65536 4 0 4 5 0
131072 1 0 1 5 0
Memory usage type by bucket size
Size Type(s)
32 ptyfs mount, kernfs mount, USB, USB device, soname, sockaddr, vmem,
NETSMBDEV, sysctldata, devbuf, pcb, temp, routetbl, in_multi, ifmedia,
ifaddr, ether_multi, acpi
64 USB, packet tags, tcpcongctl, RAIDframe, NDP, sysctldata, devbuf, pcb,
temp, routetbl, fragtbl, ifaddr, acpi
128 USB, USB device, soname, vmem, NDP, sysctlnode, sysctldata, devbuf,
DMA map, pcb, temp, acpi
256 prop dictionary, VM page, USB, USB device, sysctldata, devbuf, temp,
routetbl, ip_moptions, in_multi, ifaddr, acpi
512 USB, sysctldata, devbuf, DMA map, temp, ifaddr, acpi
1024 VM map, USB, vmem, sysctlnode, sysctldata, devbuf, DMA map, temp,
crypto, acpi
2048 USB device, RAIDframe, sysctlnode, sysctldata, devbuf, DMA map, temp
4096 vmem, RAIDframe, sysctlnode, devbuf, temp
8192 USB, vmem, sysctlnode, devbuf
16384 sysctlnode, devbuf
32768 devbuf
65536 devbuf
131072 devbuf
Memory statistics by type Type Kern
Type InUse MemUse HighUse Limit Requests Limit Limit Size(s)
ptyfs mount 1 1K 1K 314573K 1 0 0 32
kernfs mount 1 1K 1K 314573K 1 0 0 32
prop dictionary 95 24K 24K 314573K 97 0 0 256
VM page 1 1K 1K 314573K 1 0 0 256
VM map 2 2K 2K 314573K 2 0 0 1024
USB 45 16K 16K 314573K 51 0 0 32,64,128,256,512,1024,8192
USB device 18 19K 19K 314573K 18 0 0 32,128,256,2048
soname 32 4K 5K 314573K 103 0 0 32,128
packet tags 0 0K 1K 314573K 30 0 0 64
sockaddr 78 3K 3K 314573K 185 0 0 32
tcpcongctl 2 1K 1K 314573K 2 0 0 64
vmem 8 25K 26K 314573K 12 0 0 32,128,1024,4096,8192
NETSMBDEV 1 1K 1K 314573K 1 0 0 32
RAIDframe 10 35K 35K 314573K 10 0 0 64,2048,4096
NDP 10 1K 1K 314573K 12 0 0 64,128
sysctlnode 139 191K 191K 314573K 170 0 0 128,1024,2048,4096,8192,16384
sysctldata 85 9K 9K 314573K 119 0 0 32,64,128,256,512,1024,2048
devbuf 417 304K 307K 314573K 2224 0 0 32,64,128,256,512,1024,2048,4096,8192,16384,32768,65536,131072
DMA map 1072 1064K 1064K 314573K 1072 0 0 128,512,1024,2048
pcb 95 12K 13K 314573K 174 0 0 32,64,128
temp 20 7K 8K 314573K 323 0 0 32,64,128,256,512,1024,2048,4096
routetbl 40 7K 7K 314573K 76 0 0 32,64,256
fragtbl 0 0K 1K 314573K 131 0 0 64
ip_moptions 0 0K 1K 314573K 1 0 0 256
in_multi 38 5K 5K 314573K 38 0 0 32,256
ifmedia 23 1K 1K 314573K 23 0 0 32
ifaddr 39 10K 10K 314573K 40 0 0 32,64,256,512
ether_multi 9 1K 1K 314573K 10 0 0 32
crypto 1 1K 1K 314573K 1 0 0 1024
acpi 3324 330K 332K 314573K 11837 0 0 32,64,128,256,512,1024
Memory totals: In Use Free Requests
2062K 15K 16765
Memory resource pool statistics
Name Size Requests Fail Releases Pgreq Pgrel Npage Hiwat Minpg Maxpg Idle
amappl 80 591 0 0 12 0 12 12 0 inf 0
anonpl 24 11407 0 0 68 0 68 68 0 inf 0
aobj 128 10 0 0 1 0 1 1 0 inf 0
ataspl 96 21 0 21 1 0 1 1 0 inf 1
biopl 280 6 0 0 1 0 1 1 0 inf 0
brtpl 56 94 0 5 2 0 2 2 0 inf 0
buf1k 1024 53 0 0 1 0 1 1 1 1 0
buf2k 2048 3 0 0 1 0 1 1 1 1 0
buf8k 8192 62 0 0 8 0 8 8 1 1 0
bufpl 280 118 0 0 9 0 9 9 0 inf 0
cwdi 40 25 0 0 1 0 1 1 0 inf 0
dirent_tmpfs_0xffffa000265f6008 40 1 0 0 1 0 1 1 0 inf 0
execargs 262144 269 0 269 1 0 1 1 0 16 1
fdfile 48 371 0 0 5 0 5 5 0 inf 0
file 96 192 0 0 5 0 5 5 0 inf 0
filedesc 776 25 0 0 5 0 5 5 0 inf 0
in6pcbpl 232 16 0 15 1 0 1 1 0 inf 0
inmltpl 48 3 0 1 1 0 1 1 0 inf 0
inpcbpl 192 58 0 53 1 0 1 1 0 inf 0
ipqepl 80 1404 0 1404 1 0 1 1 0 inf 1
kcredpl 176 173 0 0 8 0 8 8 0 inf 0
kmem-1024 1024 331 0 0 331 0 331 331 0 inf 0
kmem-112 112 137 0 0 16 0 16 16 0 inf 0
kmem-128 128 23 0 0 3 0 3 3 0 inf 0
kmem-144 144 186 0 0 27 0 27 27 0 inf 0
kmem-1536 1536 34 0 0 34 0 34 34 0 inf 0
kmem-16 16 1 0 0 1 0 1 1 0 inf 0
kmem-168 168 81 0 0 14 0 14 14 0 inf 0
kmem-200 200 29 0 0 6 0 6 6 0 inf 0
kmem-2048 2048 78 0 0 78 0 78 78 0 inf 0
kmem-24 24 607 0 0 15 0 15 15 0 inf 0
kmem-256 256 49 0 0 13 0 13 13 0 inf 0
kmem-2560 2560 22 0 0 22 0 22 22 0 inf 0
kmem-3072 3072 2 0 0 2 0 2 2 0 inf 0
kmem-32 32 384 0 0 12 0 12 12 0 inf 0
kmem-3584 3584 3 0 0 3 0 3 3 0 inf 0
kmem-40 40 65 0 0 3 0 3 3 0 inf 0
kmem-4096 4096 2 0 0 2 0 2 2 0 inf 0
kmem-48 48 169 0 0 9 0 9 9 0 inf 0
kmem-56 56 4146 0 0 231 0 231 231 0 inf 0
kmem-64 64 81 0 0 6 0 6 6 0 inf 0
kmem-72 72 39 0 0 3 0 3 3 0 inf 0
kmem-80 80 310 0 0 26 0 26 26 0 inf 0
kmem-88 88 57 0 0 6 0 6 6 0 inf 0
kmem-96 96 26 0 0 3 0 3 3 0 inf 0
ksiginfo 72 1 0 0 1 0 1 1 0 inf 0
kvakernel 4096 1276 0 0 40 0 40 40 0 inf 0
kvakmem 4096 462 0 0 15 0 15 15 0 inf 0
lockf 112 9 0 0 1 0 1 1 0 inf 0
lwppl 1024 80 0 0 20 0 20 20 0 inf 0
mbpl 512 783 0 0 100 0 100 100 2 inf 2
mclpl 2048 518 0 0 263 0 263 263 4 8192 4
mutex 24 427 0 0 3 0 3 3 0 inf 0
ncache 160 6837 0 0 274 0 274 274 0 inf 0
nfsnodepl 304 3624 0 67 274 0 274 274 0 inf 0
nfsvapl 176 3624 0 67 162 0 162 162 0 inf 0
node_tmpfs_0xffffa000265f6008 208 2 0 0 1 0 1 1 0 inf 0
pcache 896 79 0 14 17 0 17 17 0 inf 0
pcglarge 1024 118 0 0 30 0 30 30 0 inf 0
pcgnormal 256 172 0 0 11 0 11 11 0 inf 0
pdict16 72 8 0 0 1 0 1 1 0 inf 0
pdict32 88 3 0 0 1 0 1 1 0 inf 0
pdppl 4096 25 0 0 25 0 25 25 0 inf 0
phpool-0 56 472 0 0 7 0 7 7 0 inf 0
phpool-64 56 1002 0 0 14 0 14 14 0 inf 0
piperd 312 11 0 0 1 0 1 1 0 inf 0
pipewr 312 12 0 0 1 0 1 1 0 inf 0
plimitpl 208 7 0 0 1 0 1 1 0 inf 0
pmappl 328 25 0 0 3 0 3 3 0 inf 0
pnbufpl 1024 7 0 0 2 0 2 2 0 inf 0
procpl 736 25 0 0 5 0 5 5 0 inf 0
propdict 48 98 0 2 2 0 2 2 0 inf 0
propnmbr 56 24 0 1 1 0 1 1 0 inf 0
propstng 40 91 0 1 1 0 1 1 0 inf 0
pstatspl 448 25 0 0 3 0 3 3 0 inf 0
ptimerpl 264 7 0 2 1 0 1 1 0 inf 0
ptimerspl 280 7 0 2 1 0 1 1 0 inf 0
pvpl 40 13624 0 0 135 0 135 135 0 inf 0
ractx 32 731 0 0 6 0 6 6 0 inf 0
rndsample 536 6 0 0 1 0 1 1 0 inf 0
rtentpl 264 39 0 1 3 0 3 3 0 inf 0
scxspl 256 63 0 63 1 0 1 1 1 inf 1
sigacts 3088 25 0 0 25 0 25 25 0 inf 0
socket 568 106 0 0 16 0 16 16 0 inf 0
str_tmpfs_0xffffa000265f6008 24 1 0 0 1 0 1 1 0 inf 0
synpl 280 1 0 1 1 0 1 1 0 inf 1
tcpcbpl 792 23 0 17 3 0 3 3 0 inf 1
tcpipqepl 80 4 0 4 1 0 1 1 0 inf 1
tstilepl 96 80 0 0 2 0 2 2 0 inf 0
uarea 16384 80 0 0 80 0 80 80 0 inf 0
vmembt 56 1536 0 0 22 0 22 22 0 inf 0
vmmpepl 136 1613 0 0 56 0 56 56 0 inf 0
vmsppl 368 25 0 0 3 0 3 3 0 inf 0
vndxpl 288 4 0 4 1 0 1 1 0 inf 1
vndxpl 288 3 0 3 1 0 1 1 0 inf 1
vnodepl 304 3584 0 0 276 0 276 276 0 inf 0
In use 10074K, total allocated 10891K; utilization 92.5%
# netstat -m
526 mbufs in use:
514 mbufs allocated to data
6 mbufs allocated to packet headers
6 mbufs allocated to socket names and addresses
0 calls to protocol drain routines
# netstat -m
525 mbufs in use:
513 mbufs allocated to data
6 mbufs allocated to packet headers
6 mbufs allocated to socket names and addresses
0 calls to protocol drain routines
# netstat -m
525 mbufs in use:
513 mbufs allocated to data
6 mbufs allocated to packet headers
6 mbufs allocated to socket names and addresses
0 calls to protocol drain routines
So 514 seems to be a maximum of allocated mbufs.
From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
kern-bug-people@netbsd.org
Cc:
Subject: Re: kern/42455: tstile hang with nfs
Date: Wed, 05 May 2010 15:55:40 +0200
Per request from ad@ here all of 'ps /l':
db> ps /l
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
816 1 3 0 0 ffffa00026bbcc00 sync tstile
692 83 5 0 0 ffffa0002618a800 qemu-dm
692 2 3 0 80 ffffa00026bbc800 qemu-dm netio
692 1 3 0 0 ffffa00026b7ec00 qemu-dm genput
467 2 3 0 80 ffffa00026b7e400 xenconsole netio
467 1 3 0 80 ffffa00026627000 xenconsole select
543 1 3 0 80 ffffa00026627800 sh wait
562 13 5 0 0 ffffa0002666c000 python2.5
562 10 3 0 80 ffffa00026b7e800 python2.5 netio
562 7 3 0 80 ffffa00026a5b000 python2.5 socket
562 6 3 0 80 ffffa00026a5b400 python2.5 socket
562 5 3 0 80 ffffa00026a5b800 python2.5 netio
562 4 3 0 80 ffffa00026a5bc00 python2.5 parked
562 3 3 0 80 ffffa00026a4b000 python2.5 select
562 2 3 0 80 ffffa00026a4b400 python2.5 socket
562 1 3 0 80 ffffa00026a4b800 python2.5 select
561 1 3 0 80 ffffa0002666c400 python2.5 wait
556 2 3 0 80 ffffa00026a4bc00 xenconsoled netio
556 1 3 0 80 ffffa000269ef400 xenconsoled select
71 2 3 0 80 ffffa000269ef000 xenbackendd netio
71 1 3 0 80 ffffa000269efc00 xenbackendd parked
539 1 3 0 80 ffffa0002666c800 xenstored select
441 1 3 0 80 ffffa0002618a400 ksh pause
517 1 3 0 80 ffffa000260d0000 sshd select
477 1 3 0 80 ffffa0002618ac00 ksh pause
490 1 3 0 80 ffffa000260d0400 login wait
438 1 3 0 80 ffffa00026627400 cron nanoslp
474 1 3 0 80 ffffa0002619bc00 qmgr kqueue
452 1 3 0 80 ffffa0002618a000 pickup kqueue
446 1 3 0 80 ffffa0002666cc00 inetd kqueue
439 1 3 0 80 ffffa0002619b000 master kqueue
303 1 3 0 80 ffffa00026627c00 sshd select
131 1 2 0 0 ffffa0002619b800 syslogd
1 1 3 0 80 ffffa0002321d000 init wait
0 57 3 0 200 ffffa00026b7e000 vnd1 vndbp
0 56 3 0 200 ffffa0002321a800 vnd0 vndbp
0 55 3 0 200 ffffa0002619b400 nfskqpoll nfskqpw
0 52 3 0 200 ffffa000260d0800 aiodoned aiodoned
0 51 3 0 200 ffffa000260d0c00 ioflush syncer
0 50 3 0 200 ffffa000260cd000 pgdaemon pgdaemon
0 49 3 0 200 ffffa000260cd400 nfsio nfsiod
0 48 3 0 200 ffffa000260cd800 nfsio nfsiod
0 47 3 0 200 ffffa000260cdc00 nfsio nfsmblk
0 46 3 0 200 ffffa000260ca000 nfsio nfsiod
0 45 3 0 200 ffffa000260ca400 xbdbackd xbdbackd
0 44 3 0 200 ffffa0002321dc00 cryptoret crypto_wait
0 43 3 0 200 ffffa000260ca800 atapibus0 sccomp
0 41 3 0 200 ffffa0002321c000 usb2 usbevt
0 40 3 0 200 ffffa0002321c400 usb1 usbevt
0 39 3 0 200 ffffa000260cac00 usbtask-dr usbtsk
0 38 3 0 200 ffffa0002321d800 usbtask-hc usbtsk
0 37 3 0 200 ffffa0002321d400 usb0 usbevt
0 35 3 0 200 ffffa0002321ec00 unpgc unpgc
0 34 3 0 200 ffffa0002321e800 vmem_rehash vmem_rehash
0 33 3 0 200 ffffa0002321e000 xenbus rdst
0 32 3 0 200 ffffa0002321e400 xenwatch evtsq
0 23 3 0 200 ffffa0002321c800 scsibus0 sccomp
0 22 3 0 200 ffffa0002321cc00 atabus5 atath
0 21 3 0 200 ffffa0002321b000 atabus4 atath
0 20 3 0 200 ffffa0002321b400 atabus3 atath
0 19 3 0 200 ffffa0002321b800 atabus2 atath
0 18 3 0 200 ffffa0002321bc00 atabus1 atath
0 17 3 0 200 ffffa0002321a000 atabus0 atath
0 16 3 0 200 ffffa0002321a400 pms0 pmsreset
0 14 3 0 200 ffffa0002321ac00 sysmon smtaskq
0 13 3 0 200 ffffa00023212000 pmfsuspend pmfsuspend
From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
kern-bug-people@netbsd.org, ad@netbsd.org
Cc:
Subject: Re: kern/42455: tstile hang with nfs
Date: Wed, 05 May 2010 16:35:27 +0200
Some information about the nfsmblk process:
db> tr /a ffffa000260cdc00
trace: pid 0 lid 47 at 0xffffa00026110980
sleepq_block() at netbsd:sleepq_block+0xbf
cv_wait() at netbsd:cv_wait+0x106
nfs_writerpc() at netbsd:nfs_writerpc+0xda1
nfs_doio() at netbsd:nfs_doio+0x4c2
nfssvc_iod() at netbsd:nfssvc_iod+0x17b
db> ps /w
PID LID COMMAND EMUL PRI WAIT-MSG WAIT-CHANNEL
0 47 system netbsd 96 nfsmblk ffffa00026110a88
db> show lock ffffa00026110a88
lock address : 0xffffa00026110a88 type : spin
initialized : 0xffffffff804da720 interlock: 0xffffa00026110a80
db> show lock 0xffffa00026110a80
lock address : 0xffffa00026110a80 type : spin
initialized : 0xffffffff804da709
shared holds : 0 exclusive: 0
shares wanted: 0 exclusive: 0
current cpu : 0 last held: 0
current lwp : 0xffffa0002320ec00 last held: 000000000000000000
last locked : 0xffffffff804db406 unlocked : 0xffffffff803eeae7
owner field : 0x0000000000000600 wait/spin: 0/1
'last locked' points to sys/nfs/nfs_vnops.c:1468
'unlocked' points to sys/kern/kern_condvar.c:154
From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc:
Subject: Re: kern/42455: tstile hang with nfs
Date: Tue, 03 Aug 2010 11:18:55 +0200
I tried this patch:
Index: nfs_vnops.c
===================================================================
RCS file: /cvsroot/src/sys/nfs/nfs_vnops.c,v
retrieving revision 1.284
diff -u -p -r1.284 nfs_vnops.c
--- nfs_vnops.c 24 Jun 2010 13:03:17 -0000 1.284
+++ nfs_vnops.c 3 Aug 2010 09:13:57 -0000
@@ -1468,7 +1468,12 @@ nfsmout:
mutex_enter(&ctx.nwc_lock);
ctx.nwc_mbufcount--;
while (ctx.nwc_mbufcount > 0) {
- cv_wait(&ctx.nwc_cv, &ctx.nwc_lock);
+ error = cv_timedwait(&ctx.nwc_cv, &ctx.nwc_lock,
+ mstohz(1000));
+ if (error) {
+ printf("nfsmblk timeout\n");
+ break;
+ }
}
mutex_exit(&ctx.nwc_lock);
}
The result is this:
nfsmblk timeout
panic: lockdebug_lookup: uninitialized lock (lock=0xffffa00026111a70, from=ffffffff804dd6ad)
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff80201d65 cs e030 rflags 246 cr2 ffffa000221e7064 cpl 8 rsp ffffa00026b93980
Stopped in pid 750.1 (qemu-dm) at netbsd:breakpoint+0x5: leave
breakpoint() at netbsd:breakpoint+0x5
panic() at netbsd:panic+0x28f
lockdebug_abort() at netbsd:lockdebug_abort
mutex_enter() at netbsd:mutex_enter+0x263
nfs_writerpc_extfree() at netbsd:nfs_writerpc_extfree+0x35
m_ext_free() at netbsd:m_ext_free+0xcf
tap_dev_read() at netbsd:tap_dev_read+0x100
dofileread() at netbsd:dofileread+0x6f
sys_read() at netbsd:sys_read+0x73
syscall() at netbsd:syscall+0xa8
ds 0x3960
es 0
fs 0x7b40
gs 0x51f8
rdi 0
rsi 0xffffffff805c0119 printf+0xbc
rbp 0xffffa00026b93980
rbx 0xffffa00026b93990
rdx 0
rcx 0
rax 0x1
r8 0xffffa00026b938a0
r9 0x400
r10 0xffffa00026b938a0
r11 0x1
r12 0x104
r13 0xffffffff80a351f8
r14 0xffffa00026b67b40
r15 0
rip 0xffffffff80201d65 breakpoint+0x5
cs 0xe030
rflags 0x246
rsp 0xffffa00026b93980
ss 0xe02b
netbsd:breakpoint+0x5: leave
db>
The lockdebug address 0xffffffff804dd6ad is at
sys/dev/nfs/nfs_vnops.c:1259
/*
* free mbuf used to refer protected pages while write rpc call.
* called at splvm.
*/
static void
nfs_writerpc_extfree(struct mbuf *m, void *tbuf, size_t size, void *arg)
{
struct nfs_writerpc_context *ctx = arg;
KASSERT(m != NULL);
KASSERT(ctx != NULL);
pool_cache_put(mb_cache, m);
mutex_enter(&ctx->nwc_lock); <-- line 1259
if (--ctx->nwc_mbufcount == 0) {
cv_signal(&ctx->nwc_cv);
}
mutex_exit(&ctx->nwc_lock);
}
From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
kern-bug-people@netbsd.org, rmind@netbsd.org, enami@netbsd.org
Cc:
Subject: Re: kern/42455: tstile hang with nfs
Date: Wed, 06 Oct 2010 12:49:07 +0200
Those two changes [1,2] to IPv4 re-assembly helps a lot.
The hang is still reproducable but much more IO traffic
must happen now.
It seems to me the IPv4 re-assembly changes fixes this
PR partially at least.
[1] http://mail-index.netbsd.org/source-changes/2010/10/03/msg013558.html
[2] http://mail-index.netbsd.org/source-changes/2010/10/06/msg013584.html
Christoph
From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
kern-bug-people@netbsd.org
Cc:
Subject: Re: kern/42455: tstile hang with nfs
Date: Tue, 19 Oct 2010 14:13:12 +0200
With the patch below, I always see that ctx.nwc_mbufcount is
always 1.
Christoph
Index: nfs_vnops.c
===================================================================
RCS file: /cvsroot/src/sys/nfs/nfs_vnops.c,v
retrieving revision 1.284
diff -u -p -r1.284 nfs_vnops.c
--- nfs_vnops.c 24 Jun 2010 13:03:17 -0000 1.284
+++ nfs_vnops.c 19 Oct 2010 12:11:11 -0000
@@ -1468,7 +1468,16 @@ nfsmout:
mutex_enter(&ctx.nwc_lock);
ctx.nwc_mbufcount--;
while (ctx.nwc_mbufcount > 0) {
- cv_wait(&ctx.nwc_cv, &ctx.nwc_lock);
+ error = cv_timedwait(&ctx.nwc_cv, &ctx.nwc_lock,
+ mstohz(1000));
+ if (error) {
+ printf("nfsmblk timeout, mbufcount %i\n",
+ ctx.nwc_mbufcount);
+ mutex_exit(&ctx.nwc_lock);
+ Debugger();
+ mutex_enter(&ctx.nwc_lock);
+ continue;
+ }
}
mutex_exit(&ctx.nwc_lock);
}
From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
kern-bug-people@netbsd.org
Cc:
Subject: Re: kern/42455: tstile hang with nfs
Date: Wed, 20 Oct 2010 15:10:57 +0200
I added debug printf's right before every 'goto nfsmout'
in the macros.
When the bug happened again I saw none of the debug printf's.
I also printed the 'mb' address and in ddb 'show mbuf' says:
db> show mbuf 0xffffa00001170400
MBUF 0xffffa00001170400
data=0xffffa000011a2802, len=2046, type=1, flags=0x0x9000003<EXT,PKTHDR,
EXT_CLUSTER,EXT_RW>
owner=0x0, next=0x0, nextpkt=0x0
leadingspace=2, trailingspace=0, readonly=0
pktlen=2046, rcvif=0x0, csum_flags=0x0x0, csum_data=0x0, segsz=402653184
ext_refcnt=1, ext_buf=0xffffa000011a2800, ext_size=2048, ext_free=0x0,
ext_arg=0xffffa0002320d3d0
next=0x0 => mb is not a chained mbuf.
nwc.mbuf_count is initialized with 1
and nwc.mbuf_count is decreased by one right before it enters
the while loop with 'cv_wait' so nwc.mbuf_count must have
been 2 before it entered the loop.
There is only one way to increase nwc.mbuf_count: it must run
the path where MEXTADD() is used which chains an mbuf to 'mb'.
Christoph
From: Christoph Egger <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/42455: tstile hang with nfs
Date: Thu, 21 Oct 2010 19:38:33 +0200
> next=0x0 => mb is not a chained mbuf.
>
> nwc.mbuf_count is initialized with 1
> and nwc.mbuf_count is decreased by one right before it enters
> the while loop with 'cv_wait' so ctxt.nwc_mbuf_count must have
> been 2 before it entered the loop.
>
> There is only one way to increase nwc.mbuf_count: it must run
> the path where MEXTADD() is used which chains an mbuf to 'mb'.
I added some more debug lines and figured out that the macro
nfsm_wcc_data() drops the mbuf chain w/o decreasing ctxt.nwc_mbufcount.
Christoph
From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
kern-bug-people@netbsd.org, rmind@netbsd.org
Cc:
Subject: Re: kern/42455: tstile hang with nfs
Date: Mon, 25 Oct 2010 18:39:22 +0200
> > next=0x0 => mb is not a chained mbuf.
> >
> > nwc.mbuf_count is initialized with 1
> > and nwc.mbuf_count is decreased by one right before it enters
> > the while loop with 'cv_wait' so ctxt.nwc_mbuf_count must have
> > been 2 before it entered the loop.
> >
> > There is only one way to increase nwc.mbuf_count: it must run
> > the path where MEXTADD() is used which chains an mbuf to 'mb'.
>
> I added some more debug lines and figured out that the macro
> nfsm_wcc_data() drops the mbuf chain w/o decreasing
> ctxt.nwc_mbufcount.
The nfsm_wcc_data() macro calls the nfsm_postop_attr() macro.
The nfsm_postop_attr() macro calls nfsm_loadattrcache() function.
The nfsm_loadattrcache() function calls nfsm_disct() function.
nfsm_disct() is the function in error which drops the mbuf chain.
nfs_subs.c:968 is the line in error:
do {
m2 = m_get(M_WAIT, MT_DATA);
MCLAIM(m2, m1->m_owner);
if (left >= MINCLSIZE) {
MCLGET(m2, M_WAIT);
}
m2->m_next = *nextp; <-- BUG happens here!!
*nextp = m2;
nextp = &m2->m_next;
Christoph
From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
kern-bug-people@netbsd.org, rmind@netbsd.org, ad@netbsd.org, yamt@netbsd.org,
enami@netbsd.org
Cc:
Subject: Re: kern/42455: tstile hang with nfs
Date: Tue, 26 Oct 2010 11:01:58 +0200
> >
> > I added some more debug lines and figured out that the macro
> > nfsm_wcc_data() drops the mbuf chain w/o decreasing
> > ctxt.nwc_mbufcount.
>
> The nfsm_wcc_data() macro calls the nfsm_postop_attr() macro.
> The nfsm_postop_attr() macro calls nfsm_loadattrcache() function.
> The nfsm_loadattrcache() function calls nfsm_disct() function.
>
> nfsm_disct() is the function in error which drops the mbuf chain.
>
> nfs_subs.c:968 is the line in error:
>
> do {
> m2 = m_get(M_WAIT, MT_DATA);
> MCLAIM(m2, m1->m_owner);
> if (left >= MINCLSIZE) {
> MCLGET(m2, M_WAIT);
> }
> m2->m_next = *nextp; <-- BUG happens here!!
> *nextp = m2;
> nextp = &m2->m_next;
>
This is the code path from entering nfsm_disct() till when
the bug happens:
897: m1 = *mdp;
903: while (left == 0) {
921: if ((m1->m_flags & M_EXT) != 0) {
922: if (havebuf && M_TRAILINGSPACE(havebuf) >= siz &&
923: nfsm_aligned(mtod(havebuf, char *) + havebuf->m_len)) {
959: struct mbuf **nextp = &m1->m_next;
961: m1->m_len -= left;
962: do {
963: m2 = m_get(M_WAIT, MT_DATA);
964: MCLAIM(m2, m1->m_owner);
965: if (left >= MINCLSIZE) {
968: m2->m_next = *nextp;
Christoph
From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: Christoph_Egger@gmx.de
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
kern-bug-people@netbsd.org, rmind@netbsd.org, ad@netbsd.org,
enami@netbsd.org
Subject: Re: kern/42455: tstile hang with nfs
Date: Tue, 26 Oct 2010 09:36:53 +0000 (UTC)
hi,
>> > I added some more debug lines and figured out that the macro
>> > nfsm_wcc_data() drops the mbuf chain w/o decreasing
>> > ctxt.nwc_mbufcount.
>>
>> The nfsm_wcc_data() macro calls the nfsm_postop_attr() macro.
>> The nfsm_postop_attr() macro calls nfsm_loadattrcache() function.
>> The nfsm_loadattrcache() function calls nfsm_disct() function.
>>
>> nfsm_disct() is the function in error which drops the mbuf chain.
are you sure?
iirc, nwc_mbufcount is about sending mbuf. otoh, nfsm_disct is for
received mbuf. i guess that the ancient macros confused you.
YAMAMOTO Takashi
From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
kern-bug-people@netbsd.org
Cc:
Subject: Re: kern/42455: tstile hang with nfs
Date: Tue, 26 Oct 2010 11:53:49 +0200
> hi,
>
> >> > I added some more debug lines and figured out that the macro
> >> > nfsm_wcc_data() drops the mbuf chain w/o decreasing
> >> > ctxt.nwc_mbufcount.
> >>
> >> The nfsm_wcc_data() macro calls the nfsm_postop_attr() macro.
> >> The nfsm_postop_attr() macro calls nfsm_loadattrcache() function.
> >> The nfsm_loadattrcache() function calls nfsm_disct() function.
> >>
> >> nfsm_disct() is the function in error which drops the mbuf chain.
>
> are you sure?
yes, absolutely and reproducable.
> iirc, nwc_mbufcount is about sending mbuf. otoh, nfsm_disct
> is for received mbuf.
nfs_writerpc *does* call nfsm_disct() through nfsm_wcc_data,
nfsm_postop_attr and nfsm_loadattrcache in this order.
So you are saying this should never happen?
Christoph
From: "Christoph Egger" <cegger@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/42455 CVS commit: src/sys/nfs
Date: Tue, 26 Oct 2010 11:44:53 +0000
Module Name: src
Committed By: cegger
Date: Tue Oct 26 11:44:53 UTC 2010
Modified Files:
src/sys/nfs: nfs_vnops.c
Log Message:
Add diagnostic check which hits when PR 42455 is reproduced.
Idea from hans@
To generate a diff of this commit:
cvs rdiff -u -r1.284 -r1.285 src/sys/nfs/nfs_vnops.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: Christoph_Egger@gmx.de
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
kern-bug-people@netbsd.org
Subject: Re: kern/42455: tstile hang with nfs
Date: Thu, 28 Oct 2010 04:48:41 +0000 (UTC)
hi,
>> hi,
>>
>> >> > I added some more debug lines and figured out that the macro
>> >> > nfsm_wcc_data() drops the mbuf chain w/o decreasing
>> >> > ctxt.nwc_mbufcount.
>> >>
>> >> The nfsm_wcc_data() macro calls the nfsm_postop_attr() macro.
>> >> The nfsm_postop_attr() macro calls nfsm_loadattrcache() function.
>> >> The nfsm_loadattrcache() function calls nfsm_disct() function.
>> >>
>> >> nfsm_disct() is the function in error which drops the mbuf chain.
>>
>> are you sure?
>
> yes, absolutely and reproducable.
>
>> iirc, nwc_mbufcount is about sending mbuf. otoh, nfsm_disct
>> is for received mbuf.
>
> nfs_writerpc *does* call nfsm_disct() through nfsm_wcc_data,
> nfsm_postop_attr and nfsm_loadattrcache in this order.
>
> So you are saying this should never happen?
i'm saying i don't understand.
nfs_writerpc sends a request to the server, using mreq and mb.
it's what nwc_mbufcount is used for.
it then parses the reply from the server, using mrep and md.
it's what nfsm_wcc_data/nfsm_postop_attr/nfsm_loadattrcache/nfsm_disct are
used for.
i don't understand how a problem in the latter causes the nwc_mbufcount
problem. the above two are somehow mixed up?
YAMAMOTO Takashi
>
> Christoph
From: Christoph Egger <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org
Cc: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>,
kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/42455: tstile hang with nfs
Date: Thu, 28 Oct 2010 07:06:41 +0200
On 28.10.10 06:50, YAMAMOTO Takashi wrote:
> The following reply was made to PR kern/42455; it has been noted by GNATS.
>
> From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
> To: Christoph_Egger@gmx.de
> Cc: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
> kern-bug-people@netbsd.org
> Subject: Re: kern/42455: tstile hang with nfs
> Date: Thu, 28 Oct 2010 04:48:41 +0000 (UTC)
>
> hi,
>
> >> hi,
> >>
> >> >> > I added some more debug lines and figured out that the macro
> >> >> > nfsm_wcc_data() drops the mbuf chain w/o decreasing
> >> >> > ctxt.nwc_mbufcount.
> >> >>
> >> >> The nfsm_wcc_data() macro calls the nfsm_postop_attr() macro.
> >> >> The nfsm_postop_attr() macro calls nfsm_loadattrcache() function.
> >> >> The nfsm_loadattrcache() function calls nfsm_disct() function.
> >> >>
> >> >> nfsm_disct() is the function in error which drops the mbuf chain.
> >>
> >> are you sure?
> >
> > yes, absolutely and reproducable.
> >
> >> iirc, nwc_mbufcount is about sending mbuf. otoh, nfsm_disct
> >> is for received mbuf.
> >
> > nfs_writerpc *does* call nfsm_disct() through nfsm_wcc_data,
> > nfsm_postop_attr and nfsm_loadattrcache in this order.
> >
> > So you are saying this should never happen?
>
> i'm saying i don't understand.
>
> nfs_writerpc sends a request to the server, using mreq and mb.
> it's what nwc_mbufcount is used for.
>
> it then parses the reply from the server, using mrep and md.
> it's what nfsm_wcc_data/nfsm_postop_attr/nfsm_loadattrcache/nfsm_disct are
> used for.
Ah, I see.
> i don't understand how a problem in the latter causes the nwc_mbufcount
> problem. the above two are somehow mixed up?
nfsm_disct() creates new mbufs with m_get() and MCLAIM().
nfs_writerpc() relies on that the ext hook is called on m_free.
But nfsm_disct() does *not* use MEXTADD(), so the ext hook is empty.
=> nfs_writerpc_extfree() won't be called to decrement nwc_mbufcount
=> nfs_writerpc() calls cv_wait() which waits forever.
Christoph
From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: Christoph_Egger@gmx.de
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/42455: tstile hang with nfs
Date: Thu, 28 Oct 2010 05:41:19 +0000 (UTC)
hi,
> On 28.10.10 06:50, YAMAMOTO Takashi wrote:
>> The following reply was made to PR kern/42455; it has been noted by GNATS.
>>
>> From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
>> To: Christoph_Egger@gmx.de
>> Cc: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
>> kern-bug-people@netbsd.org
>> Subject: Re: kern/42455: tstile hang with nfs
>> Date: Thu, 28 Oct 2010 04:48:41 +0000 (UTC)
>>
>> hi,
>>
>> >> hi,
>> >>
>> >> >> > I added some more debug lines and figured out that the macro
>> >> >> > nfsm_wcc_data() drops the mbuf chain w/o decreasing
>> >> >> > ctxt.nwc_mbufcount.
>> >> >>
>> >> >> The nfsm_wcc_data() macro calls the nfsm_postop_attr() macro.
>> >> >> The nfsm_postop_attr() macro calls nfsm_loadattrcache() function.
>> >> >> The nfsm_loadattrcache() function calls nfsm_disct() function.
>> >> >>
>> >> >> nfsm_disct() is the function in error which drops the mbuf chain.
>> >>
>> >> are you sure?
>> >
>> > yes, absolutely and reproducable.
>> >
>> >> iirc, nwc_mbufcount is about sending mbuf. otoh, nfsm_disct
>> >> is for received mbuf.
>> >
>> > nfs_writerpc *does* call nfsm_disct() through nfsm_wcc_data,
>> > nfsm_postop_attr and nfsm_loadattrcache in this order.
>> >
>> > So you are saying this should never happen?
>>
>> i'm saying i don't understand.
>>
>> nfs_writerpc sends a request to the server, using mreq and mb.
>> it's what nwc_mbufcount is used for.
>>
>> it then parses the reply from the server, using mrep and md.
>> it's what nfsm_wcc_data/nfsm_postop_attr/nfsm_loadattrcache/nfsm_disct are
>> used for.
>
> Ah, I see.
>
>> i don't understand how a problem in the latter causes the nwc_mbufcount
>> problem. the above two are somehow mixed up?
>
> nfsm_disct() creates new mbufs with m_get() and MCLAIM().
> nfs_writerpc() relies on that the ext hook is called on m_free.
>
> But nfsm_disct() does *not* use MEXTADD(), so the ext hook is empty.
> => nfs_writerpc_extfree() won't be called to decrement nwc_mbufcount
how is it a problem? nwc_mbufcount is not incremented for the mbuf
allocated by nfsm_disct.
> => nfs_writerpc() calls cv_wait() which waits forever.
it waits for the sending mbuf chain being consumed. it's a separate mbuf
chain from the one nfsm_disct works on.
if i were you, i'd look for mbuf leak in the underlying network stack
and driver. sprinkling MCLAIM might help.
YAMAMOTO Takashi
>
> Christoph
From: Christoph Egger <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org
Cc: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>,
kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/42455: tstile hang with nfs
Date: Thu, 28 Oct 2010 08:27:25 +0200
On 28.10.10 07:45, YAMAMOTO Takashi wrote:
> >> >> >> > I added some more debug lines and figured out that the macro
> >> >> >> > nfsm_wcc_data() drops the mbuf chain w/o decreasing
> >> >> >> > ctxt.nwc_mbufcount.
> >> >> >>
> >> >> >> The nfsm_wcc_data() macro calls the nfsm_postop_attr() macro.
> >> >> >> The nfsm_postop_attr() macro calls nfsm_loadattrcache() function.
> >> >> >> The nfsm_loadattrcache() function calls nfsm_disct() function.
> >> >> >>
> >> >> >> nfsm_disct() is the function in error which drops the mbuf chain.
> >> >>
> >> >> are you sure?
> >> >
> >> > yes, absolutely and reproducable.
> >> >
> >> >> iirc, nwc_mbufcount is about sending mbuf. otoh, nfsm_disct
> >> >> is for received mbuf.
> >> >
> >> > nfs_writerpc *does* call nfsm_disct() through nfsm_wcc_data,
> >> > nfsm_postop_attr and nfsm_loadattrcache in this order.
> >> >
> >> > So you are saying this should never happen?
> >>
> >> i'm saying i don't understand.
> >>
> >> nfs_writerpc sends a request to the server, using mreq and mb.
> >> it's what nwc_mbufcount is used for.
> >>
> >> it then parses the reply from the server, using mrep and md.
> >> it's what nfsm_wcc_data/nfsm_postop_attr/nfsm_loadattrcache/nfsm_disct are
> >> used for.
> >
> > Ah, I see.
> >
> >> i don't understand how a problem in the latter causes the nwc_mbufcount
> >> problem. the above two are somehow mixed up?
> >
> > nfsm_disct() creates new mbufs with m_get() and MCLAIM().
> > nfs_writerpc() relies on that the ext hook is called on m_free.
> >
> > But nfsm_disct() does *not* use MEXTADD(), so the ext hook is empty.
> > => nfs_writerpc_extfree() won't be called to decrement nwc_mbufcount
>
> how is it a problem? nwc_mbufcount is not incremented for the mbuf
> allocated by nfsm_disct.
>
> > => nfs_writerpc() calls cv_wait() which waits forever.
>
> it waits for the sending mbuf chain being consumed. it's a separate mbuf
> chain from the one nfsm_disct works on.
Or at least it should be a separate mbuf chain.
Per suggestion of rmind@ I looked at the sending mbuf chain life cycle
which is 'mb' used in nfs_writerpc().
I looked at it by duplicating
nfsm_wcc_data/nfsm_postop_attr/nfsm_loadattrcache/nfsm_disct, renamed
them to
nfsm_wcc_data1/nfsm_postop_attr1/nfsm_loadattrcache1/nfsm_disct1
in my local tree.
I extended them by passing nwc_mbufcount and 'mb' as arguments.
I check for the condition where nwc_mbufcount is >=2 and mb->m_next
becomes magically NULL.
And this happens magically in nfsm_disct() at line 968 as
I already reported to this PR.
Christoph
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/42455: tstile hang with nfs
Date: Mon, 8 Nov 2010 03:16:42 +0000
(three messages not sent to gnats)
------
From: Christoph Egger <Christoph_Egger@gmx.de>
To: Christoph Egger <Christoph_Egger@gmx.de>
Cc: netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
kern-bug-people@netbsd.org, yamt@mwd.biglobe.ne.jp,
rmind@netbsd.org, enami@netsbd.org
Subject: Re: kern/42455: tstile hang with nfs
Date: Fri, 05 Nov 2010 10:13:39 +0100
yamt: Your guess is right: There is an mbuf leak through
the use of pool_cache(9) on 'mb_cache'.
In nfsm_disct() at line 963 m_get() is called.
m2 = m_get(M_WAIT, MT_DATA); <-- line 963
m_get() calls pool_cache_get().
There is a race where pool_cache_get() returns an mbuf
for the receiving mbuf chain that is still used
in the sending mbuf chain.
The sending mbuf chain is this (and nwc_mbufcount is 2):
db> show mbuf 0xffffa000013eea00
MBUF 0xffffa000013eea00
data=0xffffa000013eea38, len=56, type=1, flags=0x0x0
owner=0xffffffff80bdd558, next=0xffffa000013c4c00, nextpkt=0x0
leadingspace=0, trailingspace=400, readonly=0
MBUF 0xffffa000013c4c00
data=0xffffa000221e6000, len=8192, type=1, flags=0x0x4000001<EXT,EXT_ROMAP>
owner=0xffffffff80bdd6e0, next=0x0, nextpkt=0x0
leadingspace=0, trailingspace=0, readonly=1
ext_refcnt=4, ext_buf=0xffffa000221e6000, ext_size=8192, ext_free=0xffffffff80
4e6ca7, ext_arg=0xffffa00026119a70
m_get() initializes the returned mbuf with m_next set to NULL.
So when m_get() does m->m_next = NULL; the sending mbuf
chain is this:
db> show mbuf 0xffffa000013eea00
MBUF 0xffffa000013eea00
data=0xffffa000013eea38, len=56, type=1, flags=0x0x0
owner=0xffffffff80bdd558, next=0x0, nextpkt=0x0
leadingspace=0, trailingspace=400, readonly=0
db> show mbuf 0xffffa000013c4c00
MBUF 0xffffa000013c4c00
data=0xffffa000221e6000, len=8192, type=1, flags=0x0x4000001<EXT,EXT_ROMAP>
owner=0xffffffff80bdd6e0, next=0x0, nextpkt=0x0
leadingspace=0, trailingspace=0, readonly=1
ext_refcnt=4, ext_buf=0xffffa000221e6000, ext_size=8192, ext_free=0xffffffff80
4e6ca7, ext_arg=0xffffa00026119a70
The second mbuf is lost, ext_free hook is never called
to decrease the nwc_mbufcount.
Christoph
From: Christoph Egger <Christoph_Egger@gmx.de>
To: netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
kern-bug-people@netbsd.org
Cc: enami@netbsd.org, rmind@netbsd.org, yamt@mwd.biglobe.ne.jp,
kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/42455: tstile hang with nfs
Date: Fri, 05 Nov 2010 14:27:19 +0100
>
> yamt: Your guess is right: There is an mbuf leak through
> the use of pool_cache(9) on 'mb_cache'.
>
> In nfsm_disct() at line 963 m_get() is called.
>
> m2 = m_get(M_WAIT, MT_DATA); <-- line 963
>
> m_get() calls pool_cache_get().
> There is a race where pool_cache_get() returns an mbuf
> for the receiving mbuf chain that is still used
> in the sending mbuf chain.
>
> The sending mbuf chain is this (and nwc_mbufcount is 2):
>
> db> show mbuf 0xffffa000013eea00
> MBUF 0xffffa000013eea00
> data=0xffffa000013eea38, len=56, type=1, flags=0x0x0
> owner=0xffffffff80bdd558, next=0xffffa000013c4c00, nextpkt=0x0
> leadingspace=0, trailingspace=400, readonly=0
> MBUF 0xffffa000013c4c00
> data=0xffffa000221e6000, len=8192, type=1,
> flags=0x0x4000001<EXT,EXT_ROMAP>
> owner=0xffffffff80bdd6e0, next=0x0, nextpkt=0x0
> leadingspace=0, trailingspace=0, readonly=1
> ext_refcnt=4, ext_buf=0xffffa000221e6000, ext_size=8192,
> ext_free=0xffffffff80
> 4e6ca7, ext_arg=0xffffa00026119a70
>
>
> m_get() initializes the returned mbuf with m_next set to NULL.
> So when m_get() does m->m_next = NULL; the sending mbuf
> chain is this:
>
>
> db> show mbuf 0xffffa000013eea00
> MBUF 0xffffa000013eea00
> data=0xffffa000013eea38, len=56, type=1, flags=0x0x0
> owner=0xffffffff80bdd558, next=0x0, nextpkt=0x0
> leadingspace=0, trailingspace=400, readonly=0
> db> show mbuf 0xffffa000013c4c00
> MBUF 0xffffa000013c4c00
> data=0xffffa000221e6000, len=8192, type=1,
> flags=0x0x4000001<EXT,EXT_ROMAP>
> owner=0xffffffff80bdd6e0, next=0x0, nextpkt=0x0
> leadingspace=0, trailingspace=0, readonly=1
> ext_refcnt=4, ext_buf=0xffffa000221e6000, ext_size=8192,
> ext_free=0xffffffff80
> 4e6ca7, ext_arg=0xffffa00026119a70
>
>
> The second mbuf is lost, ext_free hook is never called
> to decrease the nwc_mbufcount.
ok, that mbuf is not lost, at least not in m_get().
I figured out m_ext_free() decreases ext_refcnt first.
This is what mreq contains at this point:
db> show mbuf /c 0xffffa00001203600
MBUF 0xffffa00001203600
data=0xffffa0000131f048, len=60, type=1, flags=0x9000403<EXT,PKTHDR,CANFASTFWD
,EXT_CLUSTER,EXT_RW>
owner=0xffffffff80bd6500, next=0x0, nextpkt=0x0
leadingspace=72, trailingspace=1916, readonly=0
pktlen=164, rcvif=0xffffa000248f6010, csum_flags=0x0x4b<TCPv4,UDPv4,DATA,IPv4>
, csum_data=0xffff, segsz=32136531
ext_refcnt=1, ext_buf=0xffffa0000131f000, ext_size=2048, ext_free=0x0, ext_arg
=0xffffa0002320d3d0
I wish I would get some help/guidance in hunting down this
bug. The networking area is completely new to me. *sigh*
Christoph
From: Christoph Egger <Christoph_Egger@gmx.de>
To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Cc: yamt@mwd.biglobe.ne.jp, rmind@netbsd.org, enami@netbsd.org,
matt@netbsd.org
Subject: Re: kern/42455: tstile hang with nfs
Date: Fri, 05 Nov 2010 18:29:44 +0100
I have attached my current debug code.
When the bug hit I got this below.
Has anyone an idea what is going wrong?
Can anyone tell me how to proceed?
m_get1: m 0xffffa00000fa1000, mb 0xffffa00000fa1000, mb->m_next 0xffffa000013d5e00, mreq 0xffffa000012ce800
nfsm_disct1: mb 0xffffa00000fa1000, mreq 0xffffa00000fa1000
nfs_writerpc: mbufcnt 2 mb 0xffffa00000fa1000, mreq 0xffffa00000fa1000, mrep 0xffffa000012ce800, md 0xffffa00000fa1000
nfsmblk timeout, mbufcount 1, mb 0xffffa00000fa1000, mreq 0xffffa00000fa1000, mrep 0xffffa000012ce800, md 0xffffa00000fa1000
fatal breakpoint trap in supervisor mode
trap type 1 code 0 rip ffffffff802021a5 cs e030 rflags 286 cr2 7f7ffdfdc000 cpl 0 rsp ffffa0002610d9d0
Stopped in pid 0.46 (system) at netbsd:breakpoint+0x5: leave
breakpoint() at netbsd:breakpoint+0x5
nfs_writerpc() at netbsd:nfs_writerpc+0x1033
nfs_doio() at netbsd:nfs_doio+0x4d0
nfssvc_iod() at netbsd:nfssvc_iod+0x17b
ds 0
es 0
fs 0xe033
gs 0x1000
rdi 0
rsi 0xffffffff80ee9000
rbp 0xffffa0002610d9d0
rbx 0x20c49ba5e353f7cf
rdx 0xffffffff80b73008 cpu_info_primary+0x1c8
rcx 0
rax 0
r8 0x400
r9 0
r10 0xffffa0002610d9c0
r11 0xe033
r12 0x1
r13 0x1
r14 0x23
r15 0xffffa0002610dbc0
rip 0xffffffff802021a5 breakpoint+0x5
cs 0xe030
rflags 0x286
rsp 0xffffa0002610d9d0
ss 0xe02b
netbsd:breakpoint+0x5: leave
db> show mbuf /c 0xffffa00000fa1000
MBUF 0xffffa00000fa1000
data=0xffffa000011b5800, len=90, type=1, flags=0x900010b<EXT,PKTHDR,PROTO1,BCA
ST,EXT_CLUSTER,EXT_RW>
owner=0xffffa000248f63b0, next=0x0, nextpkt=0xffffa000011a9200
leadingspace=0, trailingspace=1958, readonly=0
pktlen=90, rcvif=0xffffa000248f6010, csum_flags=0x0x0, csum_data=0xffff, segsz
=32136531
ext_refcnt=1, ext_buf=0xffffa000011b5800, ext_size=2048, ext_free=0x0, ext_arg
=0xffffa0002320d3d0
db> show mbuf /c 0xffffa00000fa1000
MBUF 0xffffa00000fa1000
data=0xffffa000011b5800, len=90, type=1, flags=0x900010b<EXT,PKTHDR,PROTO1,BCA
ST,EXT_CLUSTER,EXT_RW>
owner=0xffffa000248f63b0, next=0x0, nextpkt=0xffffa000011a9200
leadingspace=0, trailingspace=1958, readonly=0
pktlen=90, rcvif=0xffffa000248f6010, csum_flags=0x0x0, csum_data=0xffff, segsz
=32136531
ext_refcnt=1, ext_buf=0xffffa000011b5800, ext_size=2048, ext_free=0x0, ext_arg
=0xffffa0002320d3d0
db> show mbuf /c 0xffffa000013d5e00
MBUF 0xffffa000013d5e00
data=0xffffa000221e6000, len=8192, type=1, flags=0x4000001<EXT,EXT_ROMAP>
owner=0xffffffff80bdafe0, next=0x0, nextpkt=0x0
leadingspace=0, trailingspace=0, readonly=1
ext_refcnt=4, ext_buf=0xffffa000221e6000, ext_size=8192, ext_free=0xffffffff80
4e5b6f, ext_arg=0xffffa0002610da70
db> show mbuf /c 0xffffa000012ce800
MBUF 0xffffa000012ce800
data=0xffffa00001201802, len=2046, type=1, flags=0x9000003<EXT,PKTHDR,EXT_CLUS
TER,EXT_RW>
owner=0xffffffff80bdae58, next=0x0, nextpkt=0x0
leadingspace=2, trailingspace=0, readonly=0
pktlen=2046, rcvif=0x0, csum_flags=0x0x0, csum_data=0x0, segsz=32136531
ext_refcnt=1, ext_buf=0xffffa00001201800, ext_size=2048, ext_free=0x0, ext_arg
=0xffffa0002320d3d0
db>
From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: Christoph_Egger@gmx.de
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/42455: tstile hang with nfs
Date: Thu, 11 Nov 2010 05:52:21 +0000 (UTC)
hi,
the debug code (nfs_vnops.c rev.1.285) seems racy at best.
i don't understand what the check is trying to detect.
the mbuf chain is handed off to the underlying layers at that point and
it's completely legal for eg. network drivers clears mb->m_next while
keeping the next mbuf in the mbuf chain.
can you please fix?
YAMAMOTO Takashi
From: "Christoph Egger" <cegger@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/42455 CVS commit: src/sys/nfs
Date: Tue, 14 Dec 2010 16:25:19 +0000
Module Name: src
Committed By: cegger
Date: Tue Dec 14 16:25:19 UTC 2010
Modified Files:
src/sys/nfs: nfs_vnops.c
Log Message:
back out rev. 1.285. The problem I try to hunt down
in PR 42455 is not in the network stack as shown by PR 44206.
To generate a diff of this commit:
cvs rdiff -u -r1.287 -r1.288 src/sys/nfs/nfs_vnops.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
kern-bug-people@netbsd.org
Cc:
Subject: Re: PR/42455
Date: Tue, 14 Dec 2010 17:32:57 +0100
According to the test results in PR 44206 I can exclude
- tcp layer
- udp layer
- ip layer
- ethernet layer
So only the driver remains which is bge(4) in my case.
Christoph
From: "Christoph Egger" <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
kern-bug-people@netbsd.org
Cc:
Subject: Re: PR/42455
Date: Tue, 14 Dec 2010 17:48:45 +0100
This is this bge I have in my machine:
bge0: ASIC BCM5715 A3 (0x9003)
Christoph
From: Christoph Egger <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: PR/42455
Date: Thu, 10 Mar 2011 11:10:24 +0100
I want to mention this OpenBSD commit log as it gives an idea where
to start to search in the driver: Handling of empty rings.
Christoph
http://article.gmane.org/gmane.os.openbsd.cvs/102392
-------------------------------------------------------------------------
CVSROOT: /cvs
Module name: src
Changes by: claudio <at> cvs.openbsd.org 2010/12/21 07:00:43
Modified files:
sys/kern : uipc_mbuf.c
Log message:
Ugly workaround in nmbclust_update(). Additionally to setting the limit
also modify the hiwat mark. This was done in pool_sethardlimit() until
rev. 1.99. Without this the mbuf cluster pool may return free pages too
quickly with the result that m_clget() may fail while populating DMA
rings. Seems to fix some hangs seen on MCLGETI() interfaces on i386
e.g. PR 6524. A proper fix is to make all drivers handle empty rings
but that will take a while to implement. With and OK mikeb@
-------------------------------------------------------------------------
From: "Christoph Egger" <cegger@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/42455 CVS commit: pkgsrc/sysutils/xentools41
Date: Thu, 26 Jan 2012 11:19:24 +0000
Module Name: pkgsrc
Committed By: cegger
Date: Thu Jan 26 11:19:24 UTC 2012
Modified Files:
pkgsrc/sysutils/xentools41: Makefile distinfo
Added Files:
pkgsrc/sysutils/xentools41/patches: patch-df
Log Message:
Apply patch 79d1d3311319f3390f540f547becaba9d957f84c
from qemu upstream:
Fill in word 64 of IDENTIFY data to indicate support for PIO modes 3 and 4.
This allows NetBSD guests to use UltraDMA modes instead of just PIO mode 0.
With this patch I can no longer reproduce PR 42455.
Bump package revision.
To generate a diff of this commit:
cvs rdiff -u -r1.13 -r1.14 pkgsrc/sysutils/xentools41/Makefile
cvs rdiff -u -r1.15 -r1.16 pkgsrc/sysutils/xentools41/distinfo
cvs rdiff -u -r0 -r1.1 pkgsrc/sysutils/xentools41/patches/patch-df
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: PR/42455 CVS commit: pkgsrc/sysutils/xentools41
Date: Thu, 26 Jan 2012 16:21:03 +0000
On Thu, Jan 26, 2012 at 11:20:04AM +0000, Christoph Egger wrote:
> Apply patch 79d1d3311319f3390f540f547becaba9d957f84c
> from qemu upstream:
>
> Fill in word 64 of IDENTIFY data to indicate support for PIO modes 3 and 4.
> This allows NetBSD guests to use UltraDMA modes instead of just PIO mode 0.
>
> With this patch I can no longer reproduce PR 42455.
> Bump package revision.
(1) does the main qemu package need this patch?
(2) this seems unlikely to have fixed the real problem with nfs; do we
have any idea what it was?
--
David A. Holland
dholland@netbsd.org
From: Christoph Egger <Christoph_Egger@gmx.de>
To: gnats-bugs@NetBSD.org
Cc: David Holland <dholland-bugs@netbsd.org>, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: PR/42455 CVS commit: pkgsrc/sysutils/xentools41
Date: Thu, 26 Jan 2012 18:56:38 +0100
On 26.01.12 17:25, David Holland wrote:
> The following reply was made to PR kern/42455; it has been noted by GNATS.
>
> From: David Holland <dholland-bugs@netbsd.org>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: PR/42455 CVS commit: pkgsrc/sysutils/xentools41
> Date: Thu, 26 Jan 2012 16:21:03 +0000
>
> On Thu, Jan 26, 2012 at 11:20:04AM +0000, Christoph Egger wrote:
> > Apply patch 79d1d3311319f3390f540f547becaba9d957f84c
> > from qemu upstream:
> >
> > Fill in word 64 of IDENTIFY data to indicate support for PIO modes 3 and 4.
> > This allows NetBSD guests to use UltraDMA modes instead of just PIO mode 0.
> >
> > With this patch I can no longer reproduce PR 42455.
> > Bump package revision.
>
> (1) does the main qemu package need this patch?
No, it is already in that version.
> (2) this seems unlikely to have fixed the real problem with nfs; do we
> have any idea what it was?
No. However, over a long time of observation every network
improvement that reduced the amount of used kernel memory seemed to have
helped...
Christoph
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: PR/42455 CVS commit: pkgsrc/sysutils/xentools41
Date: Thu, 26 Jan 2012 18:37:29 +0000
On Thu, Jan 26, 2012 at 06:00:08PM +0000, Christoph Egger wrote:
> > (2) this seems unlikely to have fixed the real problem with nfs; do we
> > have any idea what it was?
>
> No. However, over a long time of observation every network
> improvement that reduced the amount of used kernel memory seemed to have
> helped...
Ok, but I'll leave the PR open then. At some point in the not too
distant future I intend to do a big rototill on nfs, and when that
happens maybe we can try to reproduce this.
--
David A. Holland
dholland@netbsd.org
From: "Matthias Scheler" <tron@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/42455 CVS commit: [pkgsrc-2011Q4] pkgsrc/sysutils
Date: Sat, 4 Feb 2012 14:21:29 +0000
Module Name: pkgsrc
Committed By: tron
Date: Sat Feb 4 14:21:29 UTC 2012
Modified Files:
pkgsrc/sysutils/xentools33 [pkgsrc-2011Q4]: distinfo
pkgsrc/sysutils/xentools41 [pkgsrc-2011Q4]: distinfo
Added Files:
pkgsrc/sysutils/xentools33/patches [pkgsrc-2011Q4]: patch-blktaplib_h
patch-io_ring_h
pkgsrc/sysutils/xentools41/patches [pkgsrc-2011Q4]: patch-df
Log Message:
Pullup ticket #3672 - requested by bouyer
sysutils/xentools33: build fix
sysutils/xentools41: bug fix
Revisions pulled up:
- sysutils/xentools33/Makefile 1.28
- sysutils/xentools33/distinfo 1.26
- sysutils/xentools33/patches/patch-blktaplib_h 1.1
- sysutils/xentools33/patches/patch-io_ring_h 1.1
- sysutils/xentools41/Makefile 1.14
- sysutils/xentools41/distinfo 1.16
- sysutils/xentools41/patches/patch-df 1.1
---
Module Name: pkgsrc
Committed By: cegger
Date: Mon Jan 9 14:06:35 UTC 2012
Modified Files:
pkgsrc/sysutils/xentools33: Makefile distinfo
Added Files:
pkgsrc/sysutils/xentools33/patches: patch-blktaplib_h patch-io_ring_h
Log Message:
Apply fixes for gcc 4.5. I cannot reproduce the
error message show in PR 45386.
Bump revision.
---
Module Name: pkgsrc
Committed By: cegger
Date: Thu Jan 26 11:19:24 UTC 2012
Modified Files:
pkgsrc/sysutils/xentools41: Makefile distinfo
Added Files:
pkgsrc/sysutils/xentools41/patches: patch-df
Log Message:
Apply patch 79d1d3311319f3390f540f547becaba9d957f84c
from qemu upstream:
Fill in word 64 of IDENTIFY data to indicate support for PIO modes 3 and 4.
This allows NetBSD guests to use UltraDMA modes instead of just PIO mode 0.
With this patch I can no longer reproduce PR 42455.
Bump package revision.
To generate a diff of this commit:
cvs rdiff -u -r1.25.2.1 -r1.25.2.2 pkgsrc/sysutils/xentools33/distinfo
cvs rdiff -u -r0 -r1.1.2.2 \
pkgsrc/sysutils/xentools33/patches/patch-blktaplib_h \
pkgsrc/sysutils/xentools33/patches/patch-io_ring_h
cvs rdiff -u -r1.15.2.1 -r1.15.2.2 pkgsrc/sysutils/xentools41/distinfo
cvs rdiff -u -r0 -r1.1.2.2 pkgsrc/sysutils/xentools41/patches/patch-df
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.