NetBSD Problem Report #51505

From mark@ecs.vuw.ac.nz  Sun Sep 25 05:16:46 2016
Return-Path: <mark@ecs.vuw.ac.nz>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 15A4B7A283
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 25 Sep 2016 05:16:46 +0000 (UTC)
Message-Id: <201609250516.u8P5GeYL021193@turakirae.ecs.vuw.ac.nz>
Date: Sun, 25 Sep 2016 18:16:40 +1300 (NZDT)
From: mark@ecs.vuw.ac.nz
Reply-To: mark@ecs.vuw.ac.nz
To: gnats-bugs@NetBSD.org
Subject: amd in 7.0_STABLE can segfault
X-Send-Pr-Version: 3.95

>Number:         51505
>Category:       bin
>Synopsis:       amd in 7.0_STABLE can segfault
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Sep 25 05:20:00 +0000 2016
>Last-Modified:  Mon Oct 03 02:35:00 +0000 2016
>Originator:     Mark Davies
>Release:        NetBSD 7.0_STABLE
>Organization:
ECS, Victoria Uni. of Wellington, New Zealand.
>Environment:


System: NetBSD turakirae.ecs.vuw.ac.nz 7.0_STABLE NetBSD 7.0_STABLE (GENERIC) #6: Tue Mar 15 21:15:42 NZDT 2016 mark@turakirae.ecs.vuw.ac.nz:/local/SAVE/7_64.obj/src/work/7/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
	amd will occasionally segfault on various of our machines.
	The backtrace on the core file is always:

	#0  0x00007f7ff6900610 in strcmp () from /usr/lib/libc.so.12
	#1  0x0000000000416387 in find_nfs_srvr ()
	#2  0x0000000000407fd0 in amfs_nfsl_ffserver ()
	#3  0x000000000040ff7c in locate_mntfs ()
	#4  0x000000000040fff9 in find_mntfs ()
	#5  0x0000000000406b50 in amfs_lookup_mntfs ()
	#6  0x0000000000406e52 in amfs_generic_lookup_child ()
	#7  0x0000000000411479 in nfsproc_lookup_2_svc ()
	#8  0x000000000041062b in nfs_program_2 ()
	#9  0x00007f7ff68c9401 in svc_getreq_common () from /usr/lib/libc.so.12
	#10 0x00007f7ff68c9444 in svc_getreqset () from /usr/lib/libc.so.12
	#11 0x0000000000410b59 in mount_automounter ()
	#12 0x000000000041ba85 in main ()

	The last thing amd syslogs prior to core dumping is something
	like:

Sep 24 23:44:27 paramount amd[282]: check_pmap_up: failed to contact portmapper on host "XXX": RPC: Timed out

    or:

Sep 25 06:17:24 homepages amd[282]: check_pmap_up: failed to contact portmapper on host "XXX": RPC: Timed out
Sep 25 06:17:24 homepages amd[282]: portmapper service not running on XXX
Sep 25 06:17:24 homepages amd[282]: NFS service not running on XXX

>How-To-Repeat:
	run amd, automount some nfs filesesytems, wait for some
	anomolous condition to occur :-(

>Fix:
	Don't know, clearly its not handling some error case.
	Some patch already in am-utils 6.2 may fix it.

>Audit-Trail:
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: bin/51505: amd in 7.0_STABLE can segfault
Date: Sun, 25 Sep 2016 11:02:19 -0400

 On Sep 25,  5:20am, mark@ecs.vuw.ac.nz (mark@ecs.vuw.ac.nz) wrote:
 -- Subject: bin/51505: amd in 7.0_STABLE can segfault

 Can you try building and running the latest amd on it?
 Or even updateing to HEAD in the NetBSD tree?
 Or adding the debug sets for 7 so we can see where that strcmp is
 coming from in the core-dump, because I can't see it?

 christos

From: Mark Davies <mark@ecs.vuw.ac.nz>
To: Christos Zoulas <christos@zoulas.com>, gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/51505: amd in 7.0_STABLE can segfault
Date: Mon, 26 Sep 2016 16:17:48 +1300

 On 26/09/16 04:02, Christos Zoulas wrote:
 > Can you try building and running the latest amd on it?
 > Or even updateing to HEAD in the NetBSD tree?

 I'm now running with the amd from wip/am-utils (not stripped) but it may
 take a few weeks before it happens again, assuming whatever it is isn't
 already fixed in this version.

 > Or adding the debug sets for 7 so we can see where that strcmp is
 > coming from in the core-dump, because I can't see it?

 I presume its one of the three STREQ() calls in find_nfs_srvr() but
 which one I don't know.

 cheers
 mark

From: christos@zoulas.com (Christos Zoulas)
To: Mark Davies <mark@ecs.vuw.ac.nz>, gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/51505: amd in 7.0_STABLE can segfault
Date: Mon, 26 Sep 2016 08:45:33 -0400

 On Sep 26,  4:17pm, mark@ecs.vuw.ac.nz (Mark Davies) wrote:
 -- Subject: Re: bin/51505: amd in 7.0_STABLE can segfault

 | On 26/09/16 04:02, Christos Zoulas wrote:
 | > Can you try building and running the latest amd on it?
 | > Or even updateing to HEAD in the NetBSD tree?
 | 
 | I'm now running with the amd from wip/am-utils (not stripped) but it may
 | take a few weeks before it happens again, assuming whatever it is isn't
 | already fixed in this version.

 Great.

 | > Or adding the debug sets for 7 so we can see where that strcmp is
 | > coming from in the core-dump, because I can't see it?
 | 
 | I presume its one of the three STREQ() calls in find_nfs_srvr() but
 | which one I don't know.

 If you add the debug sets and the core file gdb should point out which one.

 christos

From: Mark Davies <mark@ecs.vuw.ac.nz>
To: Christos Zoulas <christos@zoulas.com>, gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/51505: amd in 7.0_STABLE can segfault
Date: Tue, 27 Sep 2016 09:05:20 +1300

 On 27/09/16 01:45, Christos Zoulas wrote:
 > If you add the debug sets and the core file gdb should point out which one.

 Unfortunately I don't have a debug version that matches the build that 
 was running, so will have to wait for it to occur with the current amd.

 cheers
 mark

From: Mark Davies <mark@ecs.vuw.ac.nz>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/51505: amd in 7.0_STABLE can segfault
Date: Mon, 3 Oct 2016 15:30:32 +1300

 On 27/09/16 09:10, Mark Davies wrote:
 >  Unfortunately I don't have a debug version that matches the build that 
 >  was running, so will have to wait for it to occur with the current amd.

 And it has:


 #0  0x00007f7ff69005db in strcmp () from /usr/lib/libc.so.12
 #1  0x000000000041970b in find_nfs_srvr (mf=0x7f7ff773dd80) at
 srvr_nfs.c:945
 #2  0x000000000040abe1 in amfs_nfsl_ffserver (mf=0x7f7ff773dd80) at
 amfs_nfsl.c:234
 #3  0x000000000041175c in init_mntfs (mf=mf@entry=0x7f7ff773dd80,
 ops=ops@entry=0x630ee0 <amfs_nfsl_ops>, mo=mo@entry=0x7f7ff770d900,
     mp=mp@entry=0x7f7ff7706340 "/am/paramount/vol/ecs",
 info=info@entry=0x7f7ff7706380 "paramount:/am/paramount/vol/ecs",
     auto_opts=auto_opts@entry=0x7f7ff773d180
 "type:=nfsl;rfs:=${autodir}/${rhost}/vol/${key};fs:=${rfs};opts:=rw,intr,nolock,vers=3,xlatecookie;remopts:=${opts};",
 mopts=mopts@entry=0x7f7ff771d4a0 "rw,intr,nolock,vers=3,xlatecookie",
     remopts=remopts@entry=0x7f7ff771d9e0
 "rw,intr,nolock,vers=3,xlatecookie") at mntfs.c:98
 #4  0x0000000000411b29 in alloc_mntfs (remopts=0x7f7ff771d9e0
 "rw,intr,nolock,vers=3,xlatecookie",
     mopts=0x7f7ff771d4a0 "rw,intr,nolock,vers=3,xlatecookie",
     auto_opts=0x7f7ff773d180
 "type:=nfsl;rfs:=${autodir}/${rhost}/vol/${key};fs:=${rfs};opts:=rw,intr,nolock,vers=3,xlatecookie;remopts:=${opts};",

     info=0x7f7ff7706380 "paramount:/am/paramount/vol/ecs",
 mp=0x7f7ff7706340 "/am/paramount/vol/ecs", mo=0x7f7ff770d900,
 ops=0x630ee0 <amfs_nfsl_ops>)
     at mntfs.c:109
 #5  find_mntfs (ops=0x630ee0 <amfs_nfsl_ops>,
 mo=mo@entry=0x7f7ff770d900, mp=0x7f7ff7706340 "/am/paramount/vol/ecs",
     info=0x7f7ff7706380 "paramount:/am/paramount/vol/ecs",
     auto_opts=auto_opts@entry=0x7f7ff773d180
 "type:=nfsl;rfs:=${autodir}/${rhost}/vol/${key};fs:=${rfs};opts:=rw,intr,nolock,vers=3,xlatecookie;remopts:=${opts};",
 mopts=0x7f7ff771d4a0 "rw,intr,nolock,vers=3,xlatecookie",
 remopts=0x7f7ff771d9e0 "rw,intr,nolock,vers=3,xlatecookie") at mntfs.c:207
 #6  0x00000000004093d6 in amfs_lookup_one_location
 (new_mp=0x7f7ff7701d00, mf=0x7f7ff7b19c00, pfname=0x7f7ff773eb88 "ecs",
     def_opts=0x7f7ff773d180
 "type:=nfsl;rfs:=${autodir}/${rhost}/vol/${key};fs:=${rfs};opts:=rw,intr,nolock,vers=3,xlatecookie;remopts:=${opts};",

     ivec=0x7f7ff7706280 "rhost:=paramount") at amfs_generic.c:301
 #7  amfs_lookup_loc (new_mp=new_mp@entry=0x7f7ff7701d00,
 error_return=error_return@entry=0x7f7fffffd3ec) at amfs_generic.c:477
 #8  0x0000000000409a51 in amfs_generic_lookup_child (mp=0x7f7ff7b42150,
 fname=<optimized out>, error_return=0x7f7fffffd3ec, op=1) at
 amfs_generic.c:1184
 #9  0x0000000000413afd in nfsproc_lookup_2_svc (argp=0x7f7fffffd430,
 rqstp=<optimized out>) at nfs_subr.c:230
 #10 0x000000000041227b in nfs_program_2 (rqstp=0x7f7fffffd4c0,
 transp=0x7f7ff7b46080) at nfs_prot_svc.c:283
 #11 0x00007f7ff68c9401 in svc_getreq_common () from /usr/lib/libc.so.12
 #12 0x00007f7ff68c9444 in svc_getreqset () from /usr/lib/libc.so.12
 #13 0x0000000000412b9f in run_rpc () at nfs_start.c:289
 #14 mount_automounter (ppid=ppid@entry=441) at nfs_start.c:452
 #15 0x0000000000420b7b in main (argc=13, argv=<optimized out>) at amd.c:561


 (gdb) up
 #1  0x000000000041970b in find_nfs_srvr (mf=0x7f7ff773dd80) at
 srvr_nfs.c:945
 945             STREQ(nfs_proto, fs->fs_proto)) {
 (gdb) p fs
 $1 = (fserver *) 0x7f7ff7b2a5c0
 (gdb) p *fs
 $2 = {fs_q = {q_forw = 0x7f7ff7b2a6e0, q_back = 0x7f7ff7b2a800}, fs_refc
 = 0, fs_host = 0x7f7ff7705980 "paramount.ecs.vuw.ac.nz",
   fs_ip = 0x7f7ff773f480, fs_cid = 418546, fs_pinger = 30, fs_flags = 2,
 fs_type = 0x4257b9 "nfs", fs_version = 3, fs_proto = 0x0,
   fs_private = 0x7f7ff77059a0, fs_prfree = 0x405e20 <free@plt>}
 (gdb) p nfs_proto
 $3 = 0x424771 "tcp"


 cheers
 mark

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.