NetBSD Problem Report #51505
From mark@ecs.vuw.ac.nz Sun Sep 25 05:16:46 2016
Return-Path: <mark@ecs.vuw.ac.nz>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 15A4B7A283
for <gnats-bugs@gnats.NetBSD.org>; Sun, 25 Sep 2016 05:16:46 +0000 (UTC)
Message-Id: <201609250516.u8P5GeYL021193@turakirae.ecs.vuw.ac.nz>
Date: Sun, 25 Sep 2016 18:16:40 +1300 (NZDT)
From: mark@ecs.vuw.ac.nz
Reply-To: mark@ecs.vuw.ac.nz
To: gnats-bugs@NetBSD.org
Subject: amd in 7.0_STABLE can segfault
X-Send-Pr-Version: 3.95
>Number: 51505
>Category: bin
>Synopsis: amd in 7.0_STABLE can segfault
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: bin-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Sep 25 05:20:00 +0000 2016
>Last-Modified: Mon Oct 03 02:35:00 +0000 2016
>Originator: Mark Davies
>Release: NetBSD 7.0_STABLE
>Organization:
ECS, Victoria Uni. of Wellington, New Zealand.
>Environment:
System: NetBSD turakirae.ecs.vuw.ac.nz 7.0_STABLE NetBSD 7.0_STABLE (GENERIC) #6: Tue Mar 15 21:15:42 NZDT 2016 mark@turakirae.ecs.vuw.ac.nz:/local/SAVE/7_64.obj/src/work/7/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
amd will occasionally segfault on various of our machines.
The backtrace on the core file is always:
#0 0x00007f7ff6900610 in strcmp () from /usr/lib/libc.so.12
#1 0x0000000000416387 in find_nfs_srvr ()
#2 0x0000000000407fd0 in amfs_nfsl_ffserver ()
#3 0x000000000040ff7c in locate_mntfs ()
#4 0x000000000040fff9 in find_mntfs ()
#5 0x0000000000406b50 in amfs_lookup_mntfs ()
#6 0x0000000000406e52 in amfs_generic_lookup_child ()
#7 0x0000000000411479 in nfsproc_lookup_2_svc ()
#8 0x000000000041062b in nfs_program_2 ()
#9 0x00007f7ff68c9401 in svc_getreq_common () from /usr/lib/libc.so.12
#10 0x00007f7ff68c9444 in svc_getreqset () from /usr/lib/libc.so.12
#11 0x0000000000410b59 in mount_automounter ()
#12 0x000000000041ba85 in main ()
The last thing amd syslogs prior to core dumping is something
like:
Sep 24 23:44:27 paramount amd[282]: check_pmap_up: failed to contact portmapper on host "XXX": RPC: Timed out
or:
Sep 25 06:17:24 homepages amd[282]: check_pmap_up: failed to contact portmapper on host "XXX": RPC: Timed out
Sep 25 06:17:24 homepages amd[282]: portmapper service not running on XXX
Sep 25 06:17:24 homepages amd[282]: NFS service not running on XXX
>How-To-Repeat:
run amd, automount some nfs filesesytems, wait for some
anomolous condition to occur :-(
>Fix:
Don't know, clearly its not handling some error case.
Some patch already in am-utils 6.2 may fix it.
>Audit-Trail:
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc:
Subject: Re: bin/51505: amd in 7.0_STABLE can segfault
Date: Sun, 25 Sep 2016 11:02:19 -0400
On Sep 25, 5:20am, mark@ecs.vuw.ac.nz (mark@ecs.vuw.ac.nz) wrote:
-- Subject: bin/51505: amd in 7.0_STABLE can segfault
Can you try building and running the latest amd on it?
Or even updateing to HEAD in the NetBSD tree?
Or adding the debug sets for 7 so we can see where that strcmp is
coming from in the core-dump, because I can't see it?
christos
From: Mark Davies <mark@ecs.vuw.ac.nz>
To: Christos Zoulas <christos@zoulas.com>, gnats-bugs@NetBSD.org
Cc:
Subject: Re: bin/51505: amd in 7.0_STABLE can segfault
Date: Mon, 26 Sep 2016 16:17:48 +1300
On 26/09/16 04:02, Christos Zoulas wrote:
> Can you try building and running the latest amd on it?
> Or even updateing to HEAD in the NetBSD tree?
I'm now running with the amd from wip/am-utils (not stripped) but it may
take a few weeks before it happens again, assuming whatever it is isn't
already fixed in this version.
> Or adding the debug sets for 7 so we can see where that strcmp is
> coming from in the core-dump, because I can't see it?
I presume its one of the three STREQ() calls in find_nfs_srvr() but
which one I don't know.
cheers
mark
From: christos@zoulas.com (Christos Zoulas)
To: Mark Davies <mark@ecs.vuw.ac.nz>, gnats-bugs@NetBSD.org
Cc:
Subject: Re: bin/51505: amd in 7.0_STABLE can segfault
Date: Mon, 26 Sep 2016 08:45:33 -0400
On Sep 26, 4:17pm, mark@ecs.vuw.ac.nz (Mark Davies) wrote:
-- Subject: Re: bin/51505: amd in 7.0_STABLE can segfault
| On 26/09/16 04:02, Christos Zoulas wrote:
| > Can you try building and running the latest amd on it?
| > Or even updateing to HEAD in the NetBSD tree?
|
| I'm now running with the amd from wip/am-utils (not stripped) but it may
| take a few weeks before it happens again, assuming whatever it is isn't
| already fixed in this version.
Great.
| > Or adding the debug sets for 7 so we can see where that strcmp is
| > coming from in the core-dump, because I can't see it?
|
| I presume its one of the three STREQ() calls in find_nfs_srvr() but
| which one I don't know.
If you add the debug sets and the core file gdb should point out which one.
christos
From: Mark Davies <mark@ecs.vuw.ac.nz>
To: Christos Zoulas <christos@zoulas.com>, gnats-bugs@NetBSD.org
Cc:
Subject: Re: bin/51505: amd in 7.0_STABLE can segfault
Date: Tue, 27 Sep 2016 09:05:20 +1300
On 27/09/16 01:45, Christos Zoulas wrote:
> If you add the debug sets and the core file gdb should point out which one.
Unfortunately I don't have a debug version that matches the build that
was running, so will have to wait for it to occur with the current amd.
cheers
mark
From: Mark Davies <mark@ecs.vuw.ac.nz>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: bin/51505: amd in 7.0_STABLE can segfault
Date: Mon, 3 Oct 2016 15:30:32 +1300
On 27/09/16 09:10, Mark Davies wrote:
> Unfortunately I don't have a debug version that matches the build that
> was running, so will have to wait for it to occur with the current amd.
And it has:
#0 0x00007f7ff69005db in strcmp () from /usr/lib/libc.so.12
#1 0x000000000041970b in find_nfs_srvr (mf=0x7f7ff773dd80) at
srvr_nfs.c:945
#2 0x000000000040abe1 in amfs_nfsl_ffserver (mf=0x7f7ff773dd80) at
amfs_nfsl.c:234
#3 0x000000000041175c in init_mntfs (mf=mf@entry=0x7f7ff773dd80,
ops=ops@entry=0x630ee0 <amfs_nfsl_ops>, mo=mo@entry=0x7f7ff770d900,
mp=mp@entry=0x7f7ff7706340 "/am/paramount/vol/ecs",
info=info@entry=0x7f7ff7706380 "paramount:/am/paramount/vol/ecs",
auto_opts=auto_opts@entry=0x7f7ff773d180
"type:=nfsl;rfs:=${autodir}/${rhost}/vol/${key};fs:=${rfs};opts:=rw,intr,nolock,vers=3,xlatecookie;remopts:=${opts};",
mopts=mopts@entry=0x7f7ff771d4a0 "rw,intr,nolock,vers=3,xlatecookie",
remopts=remopts@entry=0x7f7ff771d9e0
"rw,intr,nolock,vers=3,xlatecookie") at mntfs.c:98
#4 0x0000000000411b29 in alloc_mntfs (remopts=0x7f7ff771d9e0
"rw,intr,nolock,vers=3,xlatecookie",
mopts=0x7f7ff771d4a0 "rw,intr,nolock,vers=3,xlatecookie",
auto_opts=0x7f7ff773d180
"type:=nfsl;rfs:=${autodir}/${rhost}/vol/${key};fs:=${rfs};opts:=rw,intr,nolock,vers=3,xlatecookie;remopts:=${opts};",
info=0x7f7ff7706380 "paramount:/am/paramount/vol/ecs",
mp=0x7f7ff7706340 "/am/paramount/vol/ecs", mo=0x7f7ff770d900,
ops=0x630ee0 <amfs_nfsl_ops>)
at mntfs.c:109
#5 find_mntfs (ops=0x630ee0 <amfs_nfsl_ops>,
mo=mo@entry=0x7f7ff770d900, mp=0x7f7ff7706340 "/am/paramount/vol/ecs",
info=0x7f7ff7706380 "paramount:/am/paramount/vol/ecs",
auto_opts=auto_opts@entry=0x7f7ff773d180
"type:=nfsl;rfs:=${autodir}/${rhost}/vol/${key};fs:=${rfs};opts:=rw,intr,nolock,vers=3,xlatecookie;remopts:=${opts};",
mopts=0x7f7ff771d4a0 "rw,intr,nolock,vers=3,xlatecookie",
remopts=0x7f7ff771d9e0 "rw,intr,nolock,vers=3,xlatecookie") at mntfs.c:207
#6 0x00000000004093d6 in amfs_lookup_one_location
(new_mp=0x7f7ff7701d00, mf=0x7f7ff7b19c00, pfname=0x7f7ff773eb88 "ecs",
def_opts=0x7f7ff773d180
"type:=nfsl;rfs:=${autodir}/${rhost}/vol/${key};fs:=${rfs};opts:=rw,intr,nolock,vers=3,xlatecookie;remopts:=${opts};",
ivec=0x7f7ff7706280 "rhost:=paramount") at amfs_generic.c:301
#7 amfs_lookup_loc (new_mp=new_mp@entry=0x7f7ff7701d00,
error_return=error_return@entry=0x7f7fffffd3ec) at amfs_generic.c:477
#8 0x0000000000409a51 in amfs_generic_lookup_child (mp=0x7f7ff7b42150,
fname=<optimized out>, error_return=0x7f7fffffd3ec, op=1) at
amfs_generic.c:1184
#9 0x0000000000413afd in nfsproc_lookup_2_svc (argp=0x7f7fffffd430,
rqstp=<optimized out>) at nfs_subr.c:230
#10 0x000000000041227b in nfs_program_2 (rqstp=0x7f7fffffd4c0,
transp=0x7f7ff7b46080) at nfs_prot_svc.c:283
#11 0x00007f7ff68c9401 in svc_getreq_common () from /usr/lib/libc.so.12
#12 0x00007f7ff68c9444 in svc_getreqset () from /usr/lib/libc.so.12
#13 0x0000000000412b9f in run_rpc () at nfs_start.c:289
#14 mount_automounter (ppid=ppid@entry=441) at nfs_start.c:452
#15 0x0000000000420b7b in main (argc=13, argv=<optimized out>) at amd.c:561
(gdb) up
#1 0x000000000041970b in find_nfs_srvr (mf=0x7f7ff773dd80) at
srvr_nfs.c:945
945 STREQ(nfs_proto, fs->fs_proto)) {
(gdb) p fs
$1 = (fserver *) 0x7f7ff7b2a5c0
(gdb) p *fs
$2 = {fs_q = {q_forw = 0x7f7ff7b2a6e0, q_back = 0x7f7ff7b2a800}, fs_refc
= 0, fs_host = 0x7f7ff7705980 "paramount.ecs.vuw.ac.nz",
fs_ip = 0x7f7ff773f480, fs_cid = 418546, fs_pinger = 30, fs_flags = 2,
fs_type = 0x4257b9 "nfs", fs_version = 3, fs_proto = 0x0,
fs_private = 0x7f7ff77059a0, fs_prfree = 0x405e20 <free@plt>}
(gdb) p nfs_proto
$3 = 0x424771 "tcp"
cheers
mark
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.