NetBSD Problem Report #23135
Received: (qmail 29984 invoked by uid 605); 12 Oct 2003 13:38:38 -0000
Message-Id: <200310121338.h9CDcXZ05976@desssrv.lip6.fr>
Date: Sun, 12 Oct 2003 15:38:33 +0200 (CEST)
From: Manuel Bouyer <Manuel.Bouyer@lip6.fr>
Sender: gnats-bugs-owner@NetBSD.org
Reply-To: Manuel Bouyer <Manuel.Bouyer@lip6.fr>
To: gnats-bugs@gnats.netbsd.org
Subject: broadcast ypbind sometimes fails to find servers
X-Send-Pr-Version: 3.95
>Number: 23135
>Category: bin
>Synopsis: broadcast ypbind sometimes fails to find servers
>Confidential: no
>Severity: serious
>Priority: low
>Responsible: bin-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Oct 12 13:39:00 +0000 2003
>Closed-Date: Tue Jun 10 18:08:32 +0000 2014
>Last-Modified: Tue Sep 09 08:25:01 +0000 2014
>Originator: Manuel Bouyer
>Release: NetBSD 1.6.1_STABLE
>Organization:
LIP6/RP
>Environment:
System: NetBSD desssrv 1.6.1_STABLE NetBSD 1.6.1_STABLE (ANTILOQUE) #1: Tue Aug 26 12:26:24 CEST 2003 bouyer@vlaminck:/usr/src/syssrc/sys/arch/i386/compile/ANTILOQUE i386
Architecture: i386
Machine: i386
>Description:
We have a small student room composed of 24 desktop and a NFS/NIS
server, all running NetBSD 1.6.1_STABLE. We let ypbind find the server
from broadcast.
At boot time, clients sometimes hang on ypbind start, failing
to find a server. The server is running fine (another client rebooted
at the same time don't have problems finding the server).
A tcpdump on the server shows that the client is bradcasting requests,
but the server doesn't anserws (broadcast address is 132.227.72.63):
15:06:07.893277 dess260-11.65533 > 132.227.72.63.sunrpc: udp 104
0x0000 4500 0084 b637 0000 4011 29f9 84e3 4833 E....7..@.)...H3
0x0010 84e3 483f fffd 006f 0070 a9e3 0000 0000 ..H?...o.p......
0x0020 0000 0000 0000 0002 0001 86a0 0000 0002 ................
0x0030 0000 0005 0000 0001 0000 0024 3f89 51bd ...........$?.Q.
0x0040 0000 000a 6465 7373 3236 302d 3131 0000 ....dess260-11..
0x0050 0000 0000 0000 0000 0000 0001 0000 0000 ................
0x0060 0000 0000 0000 0000 0001 86a4 0000 0002 ................
0x0070 0000 0002 0000 000c 0000 0007 7465 6c65 ............tele
0x0080 696e 6600 inf.
15:06:10.857553 dess260-13.65533 > 132.227.72.63.sunrpc: udp 104
0x0000 4500 0084 b632 0000 4011 29fc 84e3 4835 E....2..@.)...H5
0x0010 84e3 483f fffd 006f 0070 a9dc 0000 0000 ..H?...o.p......
0x0020 0000 0000 0000 0002 0001 86a0 0000 0002 ................
0x0030 0000 0005 0000 0001 0000 0024 3f89 51c0 ...........$?.Q.
0x0040 0000 000a 6465 7373 3236 302d 3133 0000 ....dess260-13..
0x0050 0000 0000 0000 0000 0000 0001 0000 0000 ................
0x0060 0000 0000 0000 0000 0001 86a4 0000 0002 ................
0x0070 0000 0002 0000 000c 0000 0007 7465 6c65 ............tele
0x0080 696e 6600 inf.
15:06:13.953387 dess260-11.65533 > 132.227.72.63.sunrpc: udp 104
0x0000 4500 0084 b639 0000 4011 29f7 84e3 4833 E....9..@.)...H3
0x0010 84e3 483f fffd 006f 0070 a9dd 0000 0000 ..H?...o.p......
0x0020 0000 0000 0000 0002 0001 86a0 0000 0002 ................
0x0030 0000 0005 0000 0001 0000 0024 3f89 51c3 ...........$?.Q.
0x0040 0000 000a 6465 7373 3236 302d 3131 0000 ....dess260-11..
0x0050 0000 0000 0000 0000 0000 0001 0000 0000 ................
0x0060 0000 0000 0000 0000 0001 86a4 0000 0002 ................
0x0070 0000 0002 0000 000c 0000 0007 7465 6c65 ............tele
0x0080 696e 6600 inf.
15:06:16.917674 dess260-13.65533 > 132.227.72.63.sunrpc: udp 104
0x0000 4500 0084 b634 0000 4011 29fa 84e3 4835 E....4..@.)...H5
0x0010 84e3 483f fffd 006f 0070 a9d6 0000 0000 ..H?...o.p......
0x0020 0000 0000 0000 0002 0001 86a0 0000 0002 ................
0x0030 0000 0005 0000 0001 0000 0024 3f89 51c6 ...........$?.Q.
0x0040 0000 000a 6465 7373 3236 302d 3133 0000 ....dess260-13..
0x0050 0000 0000 0000 0000 0000 0001 0000 0000 ................
0x0060 0000 0000 0000 0000 0001 86a4 0000 0002 ................
0x0070 0000 0002 0000 000c 0000 0007 7465 6c65 ............tele
0x0080 696e 6600 inf.
A ktrace of rpcbind shows that it does get the requests, but it
seems to just ignore it:
110 rpcbind 1065963973.953476 RET poll 1
110 rpcbind 1065963973.953533 CALL recvfrom(0x6,0x805d000,0x2000,0,0xbfbfce
60,0xbfbfce5c)
110 rpcbind 1065963973.953593 GIO fd 6 read 104 bytes
"\0\0\0\0\0\0\0\0\0\0\0\^B\0\^A\M^F\240\0\0\0\^B\0\0\0\^E\0\0\0\^A\0\0\
\0$?\M^IQ\M-C\0\0\0
dess260-11\0\0\0\0\0\0\0\0\0\0\0\0\0\^A\0\0\0\0\0\0\0\0\0\0\0\0\0\^A\
\M^F\M-$\0\0\0\^B\0\0\0\^B\0\0\0\f\0\0\0\ateleinf\0"
110 rpcbind 1065963973.953639 RET recvfrom 104/0x68
110 rpcbind 1065963973.953692 CALL getsockname(0x6,0xbfbdcd60,0xbfbdcd5c)
110 rpcbind 1065963973.953742 RET getsockname 0
110 rpcbind 1065963973.953777 CALL getsockopt(0x6,0xffff,0x1008,0xbfbdcd58,
0xbfbdcd5c)
110 rpcbind 1065963973.953822 RET getsockopt 0
110 rpcbind 1065963973.953900 CALL open(0x480ff287,0,0x1b6)
110 rpcbind 1065963973.953953 NAMI "/etc/netconfig"
110 rpcbind 1065963973.954027 RET open 3
110 rpcbind 1065963973.954127 CALL __fstat13(0x3,0xbfbdcbb0)
110 rpcbind 1065963973.954175 RET __fstat13 0
110 rpcbind 1065963973.954213 CALL read(0x3,0x8062000,0x2000)
110 rpcbind 1065963973.954268 GIO fd 3 read 774 bytes
"# $NetBSD: netconfig,v 1.1 2000/06/02 22:54:10 fvdl Exp $
#
# The network configuration file. This file is currently only used in
# conjunction with the (TI-) RPC code in the C library, unlike its
# use in SVR4.
#
# Entries consist of:
#
# <network_id> <semantics> <flags> <protofamily> <protoname> \
\134
# <device> <nametoaddr_libs>
#
# The <device> and <nametoaddr_libs> fields are always empty in NetBSD\
.
#
udp6 tpi_clts v inet6 udp - -
tcp6 tpi_cots_ord v inet6 tcp - -
udp tpi_clts v inet udp - -
tcp tpi_cots_ord v inet tcp - -
rawip tpi_raw - inet - - -
local tpi_cots_ord - loopback - - -
"
110 rpcbind 1065963973.954329 RET read 774/0x306
110 rpcbind 1065963973.954420 CALL close(0x3)
110 rpcbind 1065963973.954479 RET close 0
110 rpcbind 1065963973.954549 CALL __sysctl(0xbfbdccd8,0x6,0,0xbfbdccd4,0,0
)
110 rpcbind 1065963973.954613 RET __sysctl 0
110 rpcbind 1065963973.954651 CALL __sysctl(0xbfbdccd8,0x6,0x805bc00,0xbfbd
ccd4,0,0)
110 rpcbind 1065963973.954767 RET __sysctl 0
110 rpcbind 1065963973.954944 CALL open(0x480ff287,0,0x1b6)
110 rpcbind 1065963973.954998 NAMI "/etc/netconfig"
110 rpcbind 1065963973.955064 RET open 3
110 rpcbind 1065963973.955110 CALL __fstat13(0x3,0xbfbdcba0)
110 rpcbind 1065963973.955155 RET __fstat13 0
110 rpcbind 1065963973.955191 CALL read(0x3,0x8062000,0x2000)
110 rpcbind 1065963973.955240 GIO fd 3 read 774 bytes
"# $NetBSD: netconfig,v 1.1 2000/06/02 22:54:10 fvdl Exp $
#
# The network configuration file. This file is currently only used in
# conjunction with the (TI-) RPC code in the C library, unlike its
# use in SVR4.
#
# Entries consist of:
#
# <network_id> <semantics> <flags> <protofamily> <protoname> \
\134
# <device> <nametoaddr_libs>
#
# The <device> and <nametoaddr_libs> fields are always empty in NetBSD\
.
#
udp6 tpi_clts v inet6 udp - -
tcp6 tpi_cots_ord v inet6 tcp - -
udp tpi_clts v inet udp - -
tcp tpi_cots_ord v inet tcp - -
rawip tpi_raw - inet - - -
local tpi_cots_ord - loopback - - -
"
110 rpcbind 1065963973.955306 RET read 774/0x306
110 rpcbind 1065963973.955392 CALL close(0x3)
110 rpcbind 1065963973.955451 RET close 0
110 rpcbind 1065963973.955512 CALL __sysctl(0xbfbdccc8,0x6,0,0xbfbdccc4,0,0
)
110 rpcbind 1065963973.955569 RET __sysctl 0
110 rpcbind 1065963973.955607 CALL __sysctl(0xbfbdccc8,0x6,0x805bc00,0xbfbd
ccc4,0,0)
110 rpcbind 1065963973.955708 RET __sysctl 0
110 rpcbind 1065963973.955978 CALL gettimeofday(0xbfbdcd88,0)
110 rpcbind 1065963973.956034 RET gettimeofday 0
110 rpcbind 1065963973.956075 CALL gettimeofday(0xbfbdcd88,0)
110 rpcbind 1065963973.956115 RET gettimeofday 0
110 rpcbind 1065963973.956155 CALL poll(0xbfbfd4a0,0x4,0x7530)
110 rpcbind 1065963975.100258 RET poll 1
110 rpcbind 1065963975.100399 CALL recvfrom(0x6,0x805d000,0x2000,0,0xbfbfce
60,0xbfbfce5c)
110 rpcbind 1065963975.100478 GIO fd 6 read 104 bytes
"\0\0\0\0\0\0\0\0\0\0\0\^B\0\^A\M^F\240\0\0\0\^B\0\0\0\^E\0\0\0\^A\0\0\
\0$?\M^IQ\M-G\0\0\0
dess264-04\0\0\0\0\0\0\0\0\0\0\0\0\0\^A\0\0\0\0\0\0\0\0\0\0\0\0\0\^A\
\M^F\M-$\0\0\0\^B\0\0\0\^B\0\0\0\f\0\0\0\ateleinf\0"
I don't know why rpcbind ignores the request, I guess something
in the packet he did get. ethereal doens't show any difference
between a request which gets handed, and one which gets ignored.
I don't know well enouth the internals of RPC to be able to
decode the packet and see why it would fail.
>How-To-Repeat:
Not easy to reproduce on demand. rebooting a client will hang once
in about 20 reboots.
>Fix:
workaround: list servers in /var/yp/binding/<domain>.ypservers
>Release-Note:
>Audit-Trail:
From: "David A. Holland" <dholland@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/23135 CVS commit: src/usr.sbin/ypbind
Date: Tue, 10 Jun 2014 17:19:22 +0000
Module Name: src
Committed By: dholland
Date: Tue Jun 10 17:19:22 UTC 2014
Modified Files:
src/usr.sbin/ypbind: ypbind.c
Log Message:
Instead of using magic numbers in what looks like a boolean
(dom_alive), create a state enumeration (domainstates) and use it
instead.
Instead of three states (new, alive, and, effectively, 'troubled') go
to five: new, alive, pinging, lost, and dead.
Domains start in the NEW state. When we get a reply from a server, the
state goes to ALIVE. The state is set to PINGING when we ping the
server (once a minute normally) and if the ping times out, it goes to
LOST. If we stay lost for a minute, go to DEAD, and in DEAD, do
exponential backoff of nag_servers calls.
Getting rid of the broken logic attached to the 'troubled' state fixes
PR 15355 (ypbind defeats disk idle spindown) -- it will now only
rewrite the binding file when the binding changes.
Also, fix the HEURISTIC code so it doesn't trigger except in ALIVE
state. I think this was the source of a lot of the spamming behavior
seen in PR 32519, which is now fixed.
Might also fix PR 23135 (broadcast ypbind sometimes fails to find
servers).
To generate a diff of this commit:
cvs rdiff -u -r1.95 -r1.96 src/usr.sbin/ypbind/ypbind.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Tue, 10 Jun 2014 17:22:57 +0000
State-Changed-Why:
I don't suppose you're still in a position to be able to test this? If
not I thikn we should assume it's fixed.
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org, dholland@NetBSD.org
Subject: Re: bin/23135 (broadcast ypbind sometimes fails to find servers)
Date: Tue, 10 Jun 2014 20:03:21 +0200
On Tue, Jun 10, 2014 at 05:22:57PM +0000, dholland@NetBSD.org wrote:
> Synopsis: broadcast ypbind sometimes fails to find servers
>
> State-Changed-From-To: open->feedback
> State-Changed-By: dholland@NetBSD.org
> State-Changed-When: Tue, 10 Jun 2014 17:22:57 +0000
> State-Changed-Why:
> I don't suppose you're still in a position to be able to test this? If
> not I thikn we should assume it's fixed.
AFAIK the issue was on the server side; and I guess it has been fixed
(I don't have this exact same setup anymore, but we're still using NetBSD
for ypservers and I've not noticed this problem for years).
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
State-Changed-From-To: feedback->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Tue, 10 Jun 2014 18:08:32 +0000
State-Changed-Why:
ok, probably it was fixed ages ago then. Let's assume so since otherwise
we'll never get any traction.
From: "SAITOH Masanobu" <msaitoh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/23135 CVS commit: [netbsd-6] src/usr.sbin/ypbind
Date: Tue, 9 Sep 2014 08:24:29 +0000
Module Name: src
Committed By: msaitoh
Date: Tue Sep 9 08:24:29 UTC 2014
Modified Files:
src/usr.sbin/ypbind [netbsd-6]: ypbind.8 ypbind.c
Log Message:
Pull up following revision(s) (requested by dholland in ticket #1083):
usr.sbin/ypbind/ypbind.c: revision 1.91
usr.sbin/ypbind/ypbind.c: revision 1.92
usr.sbin/ypbind/ypbind.c: revision 1.93
usr.sbin/ypbind/ypbind.c: revision 1.94
usr.sbin/ypbind/ypbind.c: revision 1.95
usr.sbin/ypbind/ypbind.c: revision 1.96
usr.sbin/ypbind/ypbind.c: revision 1.97
usr.sbin/ypbind/ypbind.c: revision 1.98
usr.sbin/ypbind/ypbind.8: revision 1.20
usr.sbin/ypbind/ypbind.8: revision 1.19
Don't store the default domain name in a global. While running we
really don't care which domain is the system's default domain.
Factor out some rpc validation code.
While there are times it's appropriate to call a state variable
"evil", this isn't one of them. Since the logic involved is to wait
until the default domain binds before backgrounding, call the variable
"started" instead.
Don't rake up the default domain until after processing arguments.
Processing arguments just sets flags -- may as well do it first, and
this way detection of silly errors isn't contingent on having things
fully configured and operating.
Load up with comments.
Instead of using magic numbers in what looks like a boolean
(dom_alive), create a state enumeration (domainstates) and use it
instead.
Instead of three states (new, alive, and, effectively, 'troubled') go
to five: new, alive, pinging, lost, and dead.
Domains start in the NEW state. When we get a reply from a server, the
state goes to ALIVE. The state is set to PINGING when we ping the
server (once a minute normally) and if the ping times out, it goes to
LOST. If we stay lost for a minute, go to DEAD, and in DEAD, do
exponential backoff of nag_servers calls.
Getting rid of the broken logic attached to the 'troubled' state fixes
PR 15355 (ypbind defeats disk idle spindown) -- it will now only
rewrite the binding file when the binding changes.
Also, fix the HEURISTIC code so it doesn't trigger except in ALIVE
state. I think this was the source of a lot of the spamming behavior
seen in PR 32519, which is now fixed.
Might also fix PR 23135 (broadcast ypbind sometimes fails to find
servers).
Add a SIGHUP handler; upon SIGHUP do an extra nag_servers on any
domain that's in DEAD state. This lets you explicitly rescue ypbind
from its exponential backoff when you know the world's back up.
Log state transitions.
Don't store the default domain name in a global. While running we
really don't care which domain is the system's default domain.
Factor out some rpc validation code.
While there are times it's appropriate to call a state variable
"evil", this isn't one of them. Since the logic involved is to wait
until the default domain binds before backgrounding, call the variable
"started" instead.
Don't rake up the default domain until after processing arguments.
Processing arguments just sets flags -- may as well do it first, and
this way detection of silly errors isn't contingent on having things
fully configured and operating.
Load up with comments.
Instead of using magic numbers in what looks like a boolean
(dom_alive), create a state enumeration (domainstates) and use it
instead.
Instead of three states (new, alive, and, effectively, 'troubled') go
to five: new, alive, pinging, lost, and dead.
Domains start in the NEW state. When we get a reply from a server, the
state goes to ALIVE. The state is set to PINGING when we ping the
server (once a minute normally) and if the ping times out, it goes to
LOST. If we stay lost for a minute, go to DEAD, and in DEAD, do
exponential backoff of nag_servers calls.
Getting rid of the broken logic attached to the 'troubled' state fixes
PR 15355 (ypbind defeats disk idle spindown) -- it will now only
rewrite the binding file when the binding changes.
Also, fix the HEURISTIC code so it doesn't trigger except in ALIVE
state. I think this was the source of a lot of the spamming behavior
seen in PR 32519, which is now fixed.
Might also fix PR 23135 (broadcast ypbind sometimes fails to find
servers).
Add a SIGHUP handler; upon SIGHUP do an extra nag_servers on any
domain that's in DEAD state. This lets you explicitly rescue ypbind
from its exponential backoff when you know the world's back up.
Log state transitions.
Document exponential backoff behavior and SIGHUP support, plus a couple
other minor edits.
Use more markup.
To generate a diff of this commit:
cvs rdiff -u -r1.18 -r1.18.22.1 src/usr.sbin/ypbind/ypbind.8
cvs rdiff -u -r1.90 -r1.90.4.1 src/usr.sbin/ypbind/ypbind.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.