NetBSD Problem Report #23135

Received: (qmail 29984 invoked by uid 605); 12 Oct 2003 13:38:38 -0000
Message-Id: <200310121338.h9CDcXZ05976@desssrv.lip6.fr>
Date: Sun, 12 Oct 2003 15:38:33 +0200 (CEST)
From: Manuel Bouyer <Manuel.Bouyer@lip6.fr>
Sender: gnats-bugs-owner@NetBSD.org
Reply-To: Manuel Bouyer <Manuel.Bouyer@lip6.fr>
To: gnats-bugs@gnats.netbsd.org
Subject: broadcast ypbind sometimes fails to find servers
X-Send-Pr-Version: 3.95

>Number:         23135
>Category:       bin
>Synopsis:       broadcast ypbind sometimes fails to find servers
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    bin-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Oct 12 13:39:00 +0000 2003
>Closed-Date:    Tue Jun 10 18:08:32 +0000 2014
>Last-Modified:  Tue Sep 09 08:25:01 +0000 2014
>Originator:     Manuel Bouyer
>Release:        NetBSD 1.6.1_STABLE
>Organization:
	LIP6/RP
>Environment:
System: NetBSD desssrv 1.6.1_STABLE NetBSD 1.6.1_STABLE (ANTILOQUE) #1: Tue Aug 26 12:26:24 CEST 2003 bouyer@vlaminck:/usr/src/syssrc/sys/arch/i386/compile/ANTILOQUE i386
Architecture: i386
Machine: i386
>Description:
	We have a small student room composed of 24 desktop and a NFS/NIS
	server, all running NetBSD 1.6.1_STABLE. We let ypbind find the server
	from broadcast.
	At boot time, clients sometimes hang on ypbind start, failing
	to find a server. The server is running fine (another client rebooted
	at the same time don't have problems finding the server).
	A tcpdump on the server shows that the client is bradcasting requests,
	but the server doesn't anserws (broadcast address is 132.227.72.63):
15:06:07.893277 dess260-11.65533 > 132.227.72.63.sunrpc: udp 104
0x0000   4500 0084 b637 0000 4011 29f9 84e3 4833        E....7..@.)...H3
0x0010   84e3 483f fffd 006f 0070 a9e3 0000 0000        ..H?...o.p......
0x0020   0000 0000 0000 0002 0001 86a0 0000 0002        ................
0x0030   0000 0005 0000 0001 0000 0024 3f89 51bd        ...........$?.Q.
0x0040   0000 000a 6465 7373 3236 302d 3131 0000        ....dess260-11..
0x0050   0000 0000 0000 0000 0000 0001 0000 0000        ................
0x0060   0000 0000 0000 0000 0001 86a4 0000 0002        ................
0x0070   0000 0002 0000 000c 0000 0007 7465 6c65        ............tele
0x0080   696e 6600                                      inf.
15:06:10.857553 dess260-13.65533 > 132.227.72.63.sunrpc: udp 104
0x0000   4500 0084 b632 0000 4011 29fc 84e3 4835        E....2..@.)...H5
0x0010   84e3 483f fffd 006f 0070 a9dc 0000 0000        ..H?...o.p......
0x0020   0000 0000 0000 0002 0001 86a0 0000 0002        ................
0x0030   0000 0005 0000 0001 0000 0024 3f89 51c0        ...........$?.Q.
0x0040   0000 000a 6465 7373 3236 302d 3133 0000        ....dess260-13..
0x0050   0000 0000 0000 0000 0000 0001 0000 0000        ................
0x0060   0000 0000 0000 0000 0001 86a4 0000 0002        ................
0x0070   0000 0002 0000 000c 0000 0007 7465 6c65        ............tele
0x0080   696e 6600                                      inf.
15:06:13.953387 dess260-11.65533 > 132.227.72.63.sunrpc: udp 104
0x0000   4500 0084 b639 0000 4011 29f7 84e3 4833        E....9..@.)...H3
0x0010   84e3 483f fffd 006f 0070 a9dd 0000 0000        ..H?...o.p......
0x0020   0000 0000 0000 0002 0001 86a0 0000 0002        ................
0x0030   0000 0005 0000 0001 0000 0024 3f89 51c3        ...........$?.Q.
0x0040   0000 000a 6465 7373 3236 302d 3131 0000        ....dess260-11..
0x0050   0000 0000 0000 0000 0000 0001 0000 0000        ................
0x0060   0000 0000 0000 0000 0001 86a4 0000 0002        ................
0x0070   0000 0002 0000 000c 0000 0007 7465 6c65        ............tele
0x0080   696e 6600                                      inf.
15:06:16.917674 dess260-13.65533 > 132.227.72.63.sunrpc: udp 104
0x0000   4500 0084 b634 0000 4011 29fa 84e3 4835        E....4..@.)...H5
0x0010   84e3 483f fffd 006f 0070 a9d6 0000 0000        ..H?...o.p......
0x0020   0000 0000 0000 0002 0001 86a0 0000 0002        ................
0x0030   0000 0005 0000 0001 0000 0024 3f89 51c6        ...........$?.Q.
0x0040   0000 000a 6465 7373 3236 302d 3133 0000        ....dess260-13..
0x0050   0000 0000 0000 0000 0000 0001 0000 0000        ................
0x0060   0000 0000 0000 0000 0001 86a4 0000 0002        ................
0x0070   0000 0002 0000 000c 0000 0007 7465 6c65        ............tele
0x0080   696e 6600                                      inf.

	A ktrace of rpcbind shows that it does get the requests, but it
	seems to just ignore it:

   110 rpcbind  1065963973.953476 RET   poll 1
   110 rpcbind  1065963973.953533 CALL  recvfrom(0x6,0x805d000,0x2000,0,0xbfbfce
60,0xbfbfce5c)
   110 rpcbind  1065963973.953593 GIO   fd 6 read 104 bytes
       "\0\0\0\0\0\0\0\0\0\0\0\^B\0\^A\M^F\240\0\0\0\^B\0\0\0\^E\0\0\0\^A\0\0\
        \0$?\M^IQ\M-C\0\0\0
        dess260-11\0\0\0\0\0\0\0\0\0\0\0\0\0\^A\0\0\0\0\0\0\0\0\0\0\0\0\0\^A\
        \M^F\M-$\0\0\0\^B\0\0\0\^B\0\0\0\f\0\0\0\ateleinf\0"
   110 rpcbind  1065963973.953639 RET   recvfrom 104/0x68
   110 rpcbind  1065963973.953692 CALL  getsockname(0x6,0xbfbdcd60,0xbfbdcd5c)
   110 rpcbind  1065963973.953742 RET   getsockname 0
   110 rpcbind  1065963973.953777 CALL  getsockopt(0x6,0xffff,0x1008,0xbfbdcd58,
0xbfbdcd5c)
   110 rpcbind  1065963973.953822 RET   getsockopt 0
   110 rpcbind  1065963973.953900 CALL  open(0x480ff287,0,0x1b6)
   110 rpcbind  1065963973.953953 NAMI  "/etc/netconfig"
   110 rpcbind  1065963973.954027 RET   open 3
   110 rpcbind  1065963973.954127 CALL  __fstat13(0x3,0xbfbdcbb0)
   110 rpcbind  1065963973.954175 RET   __fstat13 0
   110 rpcbind  1065963973.954213 CALL  read(0x3,0x8062000,0x2000)
   110 rpcbind  1065963973.954268 GIO   fd 3 read 774 bytes
       "# $NetBSD: netconfig,v 1.1 2000/06/02 22:54:10 fvdl Exp $
        #
        # The network configuration file. This file is currently only used in
        # conjunction with the (TI-) RPC code in the C library, unlike its
        # use in SVR4.
        #
        # Entries consist of:
        #
        #       <network_id> <semantics> <flags> <protofamily> <protoname> \
        \134
        #               <device> <nametoaddr_libs>
        #
        # The <device> and <nametoaddr_libs> fields are always empty in NetBSD\
        .
        #
        udp6       tpi_clts      v     inet6    udp     -       -
        tcp6       tpi_cots_ord  v     inet6    tcp     -       -
        udp        tpi_clts      v     inet     udp     -       -
        tcp        tpi_cots_ord  v     inet     tcp     -       -
        rawip      tpi_raw       -     inet      -      -       -
        local      tpi_cots_ord  -     loopback  -      -       -
       "
   110 rpcbind  1065963973.954329 RET   read 774/0x306
   110 rpcbind  1065963973.954420 CALL  close(0x3)
   110 rpcbind  1065963973.954479 RET   close 0
   110 rpcbind  1065963973.954549 CALL  __sysctl(0xbfbdccd8,0x6,0,0xbfbdccd4,0,0
)
   110 rpcbind  1065963973.954613 RET   __sysctl 0
   110 rpcbind  1065963973.954651 CALL  __sysctl(0xbfbdccd8,0x6,0x805bc00,0xbfbd
ccd4,0,0)
   110 rpcbind  1065963973.954767 RET   __sysctl 0
   110 rpcbind  1065963973.954944 CALL  open(0x480ff287,0,0x1b6)
   110 rpcbind  1065963973.954998 NAMI  "/etc/netconfig"
   110 rpcbind  1065963973.955064 RET   open 3
   110 rpcbind  1065963973.955110 CALL  __fstat13(0x3,0xbfbdcba0)
   110 rpcbind  1065963973.955155 RET   __fstat13 0
   110 rpcbind  1065963973.955191 CALL  read(0x3,0x8062000,0x2000)
   110 rpcbind  1065963973.955240 GIO   fd 3 read 774 bytes
       "# $NetBSD: netconfig,v 1.1 2000/06/02 22:54:10 fvdl Exp $
        #
        # The network configuration file. This file is currently only used in
        # conjunction with the (TI-) RPC code in the C library, unlike its
        # use in SVR4.
        #
        # Entries consist of:
        #
        #       <network_id> <semantics> <flags> <protofamily> <protoname> \
        \134
        #               <device> <nametoaddr_libs>
        #
        # The <device> and <nametoaddr_libs> fields are always empty in NetBSD\
        .
        #
        udp6       tpi_clts      v     inet6    udp     -       -
        tcp6       tpi_cots_ord  v     inet6    tcp     -       -
        udp        tpi_clts      v     inet     udp     -       -
        tcp        tpi_cots_ord  v     inet     tcp     -       -
        rawip      tpi_raw       -     inet      -      -       -
        local      tpi_cots_ord  -     loopback  -      -       -
       "
   110 rpcbind  1065963973.955306 RET   read 774/0x306
   110 rpcbind  1065963973.955392 CALL  close(0x3)
   110 rpcbind  1065963973.955451 RET   close 0
   110 rpcbind  1065963973.955512 CALL  __sysctl(0xbfbdccc8,0x6,0,0xbfbdccc4,0,0
)
   110 rpcbind  1065963973.955569 RET   __sysctl 0
   110 rpcbind  1065963973.955607 CALL  __sysctl(0xbfbdccc8,0x6,0x805bc00,0xbfbd
ccc4,0,0)
   110 rpcbind  1065963973.955708 RET   __sysctl 0
   110 rpcbind  1065963973.955978 CALL  gettimeofday(0xbfbdcd88,0)
   110 rpcbind  1065963973.956034 RET   gettimeofday 0
   110 rpcbind  1065963973.956075 CALL  gettimeofday(0xbfbdcd88,0)
   110 rpcbind  1065963973.956115 RET   gettimeofday 0
   110 rpcbind  1065963973.956155 CALL  poll(0xbfbfd4a0,0x4,0x7530)
   110 rpcbind  1065963975.100258 RET   poll 1
   110 rpcbind  1065963975.100399 CALL  recvfrom(0x6,0x805d000,0x2000,0,0xbfbfce
60,0xbfbfce5c)
   110 rpcbind  1065963975.100478 GIO   fd 6 read 104 bytes
       "\0\0\0\0\0\0\0\0\0\0\0\^B\0\^A\M^F\240\0\0\0\^B\0\0\0\^E\0\0\0\^A\0\0\
        \0$?\M^IQ\M-G\0\0\0
        dess264-04\0\0\0\0\0\0\0\0\0\0\0\0\0\^A\0\0\0\0\0\0\0\0\0\0\0\0\0\^A\
        \M^F\M-$\0\0\0\^B\0\0\0\^B\0\0\0\f\0\0\0\ateleinf\0"


	I don't know why rpcbind ignores the request, I guess something
	in the packet he did get. ethereal doens't show any difference
	between a request which gets handed, and one which gets ignored.
	I don't know well enouth the internals of RPC to be able to
	decode the packet and see why it would fail.

>How-To-Repeat:
	Not easy to reproduce on demand. rebooting a client will hang once
	in about 20 reboots.
>Fix:
	workaround: list servers in /var/yp/binding/<domain>.ypservers
>Release-Note:
>Audit-Trail:
From: "David A. Holland" <dholland@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/23135 CVS commit: src/usr.sbin/ypbind
Date: Tue, 10 Jun 2014 17:19:22 +0000

 Module Name:	src
 Committed By:	dholland
 Date:		Tue Jun 10 17:19:22 UTC 2014

 Modified Files:
 	src/usr.sbin/ypbind: ypbind.c

 Log Message:
 Instead of using magic numbers in what looks like a boolean
 (dom_alive), create a state enumeration (domainstates) and use it
 instead.

 Instead of three states (new, alive, and, effectively, 'troubled') go
 to five: new, alive, pinging, lost, and dead.

 Domains start in the NEW state. When we get a reply from a server, the
 state goes to ALIVE. The state is set to PINGING when we ping the
 server (once a minute normally) and if the ping times out, it goes to
 LOST. If we stay lost for a minute, go to DEAD, and in DEAD, do
 exponential backoff of nag_servers calls.

 Getting rid of the broken logic attached to the 'troubled' state fixes
 PR 15355 (ypbind defeats disk idle spindown) -- it will now only
 rewrite the binding file when the binding changes.

 Also, fix the HEURISTIC code so it doesn't trigger except in ALIVE
 state. I think this was the source of a lot of the spamming behavior
 seen in PR 32519, which is now fixed.

 Might also fix PR 23135 (broadcast ypbind sometimes fails to find
 servers).


 To generate a diff of this commit:
 cvs rdiff -u -r1.95 -r1.96 src/usr.sbin/ypbind/ypbind.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Tue, 10 Jun 2014 17:22:57 +0000
State-Changed-Why:
I don't suppose you're still in a position to be able to test this? If
not I thikn we should assume it's fixed.


From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org, dholland@NetBSD.org
Subject: Re: bin/23135 (broadcast ypbind sometimes fails to find servers)
Date: Tue, 10 Jun 2014 20:03:21 +0200

 On Tue, Jun 10, 2014 at 05:22:57PM +0000, dholland@NetBSD.org wrote:
 > Synopsis: broadcast ypbind sometimes fails to find servers
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: dholland@NetBSD.org
 > State-Changed-When: Tue, 10 Jun 2014 17:22:57 +0000
 > State-Changed-Why:
 > I don't suppose you're still in a position to be able to test this? If
 > not I thikn we should assume it's fixed.

 AFAIK the issue was on the server side; and I guess it has been fixed
 (I don't have this exact same setup anymore, but we're still using NetBSD
 for ypservers and I've not noticed this problem for years).

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

State-Changed-From-To: feedback->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Tue, 10 Jun 2014 18:08:32 +0000
State-Changed-Why:
ok, probably it was fixed ages ago then. Let's assume so since otherwise
we'll never get any traction.


From: "SAITOH Masanobu" <msaitoh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/23135 CVS commit: [netbsd-6] src/usr.sbin/ypbind
Date: Tue, 9 Sep 2014 08:24:29 +0000

 Module Name:	src
 Committed By:	msaitoh
 Date:		Tue Sep  9 08:24:29 UTC 2014

 Modified Files:
 	src/usr.sbin/ypbind [netbsd-6]: ypbind.8 ypbind.c

 Log Message:
 Pull up following revision(s) (requested by dholland in ticket #1083):
 	usr.sbin/ypbind/ypbind.c: revision 1.91
 	usr.sbin/ypbind/ypbind.c: revision 1.92
 	usr.sbin/ypbind/ypbind.c: revision 1.93
 	usr.sbin/ypbind/ypbind.c: revision 1.94
 	usr.sbin/ypbind/ypbind.c: revision 1.95
 	usr.sbin/ypbind/ypbind.c: revision 1.96
 	usr.sbin/ypbind/ypbind.c: revision 1.97
 	usr.sbin/ypbind/ypbind.c: revision 1.98
 	usr.sbin/ypbind/ypbind.8: revision 1.20
 	usr.sbin/ypbind/ypbind.8: revision 1.19
 Don't store the default domain name in a global. While running we
 really don't care which domain is the system's default domain.
 Factor out some rpc validation code.
 While there are times it's appropriate to call a state variable
 "evil", this isn't one of them. Since the logic involved is to wait
 until the default domain binds before backgrounding, call the variable
 "started" instead.
 Don't rake up the default domain until after processing arguments.
 Processing arguments just sets flags -- may as well do it first, and
 this way detection of silly errors isn't contingent on having things
 fully configured and operating.
 Load up with comments.
 Instead of using magic numbers in what looks like a boolean
 (dom_alive), create a state enumeration (domainstates) and use it
 instead.
 Instead of three states (new, alive, and, effectively, 'troubled') go
 to five: new, alive, pinging, lost, and dead.
 Domains start in the NEW state. When we get a reply from a server, the
 state goes to ALIVE. The state is set to PINGING when we ping the
 server (once a minute normally) and if the ping times out, it goes to
 LOST. If we stay lost for a minute, go to DEAD, and in DEAD, do
 exponential backoff of nag_servers calls.
 Getting rid of the broken logic attached to the 'troubled' state fixes
 PR 15355 (ypbind defeats disk idle spindown) -- it will now only
 rewrite the binding file when the binding changes.
 Also, fix the HEURISTIC code so it doesn't trigger except in ALIVE
 state. I think this was the source of a lot of the spamming behavior
 seen in PR 32519, which is now fixed.
 Might also fix PR 23135 (broadcast ypbind sometimes fails to find
 servers).
 Add a SIGHUP handler; upon SIGHUP do an extra nag_servers on any
 domain that's in DEAD state. This lets you explicitly rescue ypbind
 from its exponential backoff when you know the world's back up.
 Log state transitions.
 Don't store the default domain name in a global. While running we
 really don't care which domain is the system's default domain.
 Factor out some rpc validation code.
 While there are times it's appropriate to call a state variable
 "evil", this isn't one of them. Since the logic involved is to wait
 until the default domain binds before backgrounding, call the variable
 "started" instead.
 Don't rake up the default domain until after processing arguments.
 Processing arguments just sets flags -- may as well do it first, and
 this way detection of silly errors isn't contingent on having things
 fully configured and operating.
 Load up with comments.
 Instead of using magic numbers in what looks like a boolean
 (dom_alive), create a state enumeration (domainstates) and use it
 instead.
 Instead of three states (new, alive, and, effectively, 'troubled') go
 to five: new, alive, pinging, lost, and dead.
 Domains start in the NEW state. When we get a reply from a server, the
 state goes to ALIVE. The state is set to PINGING when we ping the
 server (once a minute normally) and if the ping times out, it goes to
 LOST. If we stay lost for a minute, go to DEAD, and in DEAD, do
 exponential backoff of nag_servers calls.
 Getting rid of the broken logic attached to the 'troubled' state fixes
 PR 15355 (ypbind defeats disk idle spindown) -- it will now only
 rewrite the binding file when the binding changes.
 Also, fix the HEURISTIC code so it doesn't trigger except in ALIVE
 state. I think this was the source of a lot of the spamming behavior
 seen in PR 32519, which is now fixed.
 Might also fix PR 23135 (broadcast ypbind sometimes fails to find
 servers).
 Add a SIGHUP handler; upon SIGHUP do an extra nag_servers on any
 domain that's in DEAD state. This lets you explicitly rescue ypbind
 from its exponential backoff when you know the world's back up.
 Log state transitions.
 Document exponential backoff behavior and SIGHUP support, plus a couple
 other minor edits.
 Use more markup.


 To generate a diff of this commit:
 cvs rdiff -u -r1.18 -r1.18.22.1 src/usr.sbin/ypbind/ypbind.8
 cvs rdiff -u -r1.90 -r1.90.4.1 src/usr.sbin/ypbind/ypbind.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.