NetBSD Problem Report #57952

From www@netbsd.org  Thu Feb 22 15:26:39 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 4DE031A9239
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 22 Feb 2024 15:26:39 +0000 (UTC)
Message-Id: <20240222152637.AAC281A923A@mollari.NetBSD.org>
Date: Thu, 22 Feb 2024 15:26:37 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: dhcpcd(8) inexplicably terminated in the night
X-Send-Pr-Version: www-1.0

>Number:         57952
>Category:       bin
>Synopsis:       dhcpcd(8) inexplicably terminated in the night
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    roy
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Feb 22 15:30:00 +0000 2024
>Last-Modified:  Sat Feb 24 07:50:01 +0000 2024
>Originator:     Taylor R Campbell
>Release:        10.0_RC3
>Organization:
The NetBSDCD Foundation
>Environment:
NetBSD nanocons.local 10.0_RC3 NetBSD 10.0_RC3 (GENERIC64) #15: Wed Jan 17 05:31:14 UTC 2024  root@manticore.local:/usr/obj/10/evbarm64/sys/arch/evbarm/compile/GENERIC64 evbarm
>Description:
Some time during the night, dhcpcd terminated on two different hosts on my network without explaining why.  No core dump, no log messages obviously explaining what happened.  I may have been configuring a network device at the time, which I'll call ${ROGUE_DEVICE}, but I don't remember what stage in the misconfiguration I was in.  I may have wound up with two IPv6 routers and DHCPv6 servers at a time on the network.

Relevant log messages from the previous half hour or so, with IPv6 addresses replaced by symbolic labels:

<daemon.info>Feb 22 02:46:12 nanocons dhcpcd[622]: ure0: Router Advertisement from fe80::${ROGUEDEVICE}
<daemon.info>Feb 22 02:46:13 nanocons dhcpcd[622]: urtwn0: Router Advertisement from fe80::${ROGUEDEVICE}
<daemon.info>Feb 22 02:46:16 nanocons dhcpcd[622]: ure0: adding address ${GLOBALPREFIX}:${URE0_GLOBAL}/64
<daemon.info>Feb 22 02:46:16 nanocons dhcpcd[622]: ure0: adding address ${LOCALPREFIX}:${URE0_LOCAL}/64
<daemon.info>Feb 22 02:46:16 nanocons dhcpcd[622]: urtwn0: adding address ${GLOBALPREFIX}:${URTWN0_GLOBAL}/64
<daemon.info>Feb 22 02:46:16 nanocons dhcpcd[622]: urtwn0: adding address ${LOCALPREFIX}:${URTWN0_LOCAL}/64
<daemon.warn>Feb 22 02:51:42 nanocons dhcpcd[622]: ure0: fe80::${ROGUEDEVICE}: no longer a default router
<daemon.warn>Feb 22 02:51:43 nanocons dhcpcd[622]: urtwn0: fe80::${ROGUEDEVICE}: no longer a default router
<daemon.info>Feb 22 03:04:21 nanocons dhcpcd[622]: ure0: deleting address ${GLOBALPREFIX}:${URE0_GLOBAL}/64
<daemon.info>Feb 22 03:04:21 nanocons dhcpcd[622]: ure0: deleting address ${LOCALPREFIX}:${URE0_LOCAL}/64
<daemon.info>Feb 22 03:04:21 nanocons dhcpcd[622]: urtwn0: deleting address ${GLOBALPREFIX}:${URTWN0_GLOBAL}/64
<daemon.info>Feb 22 03:04:21 nanocons dhcpcd[622]: urtwn0: deleting address ${LOCALPREFIX}:${URTWN0_LOCAL}/64
<daemon.info>Feb 22 03:04:25 nanocons dhcpcd[622]: ure0: adding address ${LOCALPREFIX}:${URE0_LOCAL}/64
<daemon.info>Feb 22 03:04:25 nanocons dhcpcd[622]: ure0: adding address ${GLOBALPREFIX}:${URE0_GLOBAL}/64
<daemon.info>Feb 22 03:04:25 nanocons dhcpcd[622]: urtwn0: adding address ${LOCALPREFIX}:${URTWN0_LOCAL}/64
<daemon.info>Feb 22 03:04:25 nanocons dhcpcd[622]: urtwn0: adding address ${GLOBALPREFIX}:${URTWN0_GLOBAL}/64
<daemon.err>Feb 22 03:07:51 nanocons dhcpcd[622]: ps_sendcmdmsg: No buffer space available
<daemon.err>Feb 22 03:07:51 nanocons dhcpcd[622]: ps_inet_recvra: No buffer space available
<daemon.warn>Feb 22 03:07:59 nanocons dhcpcd[622]: ure0: fe80::${ROGUEDEVICE}: no longer a default router
<daemon.warn>Feb 22 03:08:02 nanocons syslogd[423]: last message repeated 5 times
<daemon.warn>Feb 22 03:08:02 nanocons dhcpcd[622]: urtwn0: fe80::${ROGUEDEVICE}: no longer a default router
<daemon.warn>Feb 22 03:08:02 nanocons dhcpcd[622]: ure0: fe80::${ROGUEDEVICE}: no longer a default router
<daemon.warn>Feb 22 03:08:02 nanocons dhcpcd[622]: ure0: fe80::${ROGUEDEVICE}: no longer a default router
<daemon.warn>Feb 22 03:08:03 nanocons dhcpcd[622]: urtwn0: fe80::${ROGUEDEVICE}: no longer a default router
<daemon.warn>Feb 22 03:08:03 nanocons dhcpcd[622]: ure0: fe80::${ROGUEDEVICE}: no longer a default router
<daemon.warn>Feb 22 03:08:06 nanocons syslogd[423]: last message repeated 6 times
<daemon.err>Feb 22 03:08:06 nanocons dhcpcd[622]: ps_inet_dodispatch: Connection reset by peer
<daemon.err>Feb 22 03:08:06 nanocons dhcpcd[622]: control_free: No such file or directory
<daemon.err>Feb 22 03:08:06 nanocons dhcpcd[622]: ps_sendpsmmsg: Destination address required
<daemon.err>Feb 22 03:08:06 nanocons dhcpcd[622]: ps_dostop: Destination address required

>How-To-Repeat:
not sure
>Fix:
Yes, please!

Ideally, mere network conditions should not provoke dhcpcd to terminate.

Preferably, dhcpcd would at least give a reason for each way that it does terminate.

At least it didn't take down the network configuration when it did so I was still able to ssh into the machines where it terminated.

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: bin-bug-people->roy
Responsible-Changed-By: riastradh@NetBSD.org
Responsible-Changed-When: Thu, 22 Feb 2024 15:36:30 +0000
Responsible-Changed-Why:
Could I trouble you to take a look at this, roy?


From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/57952: dhcpcd(8) inexplicably terminated in the night
Date: Fri, 23 Feb 2024 01:15:56 +0700

 As it happens, a few days ago, a friend of mine (not on any netbsd lists)
 had much the same thing happen - dhcpcd on a NetBSD 9.3 host simply
 vanished, without any log messages.

 In that case, a possible explanation was carrier being lost, and reappearing
 due to a shoddy connector (apparently wiggling the cable at the socket
 caused carrier to reappear) but no new address was acquired, after which
 upon looking, there was no dhcpcd in sight to obtain one (this would have
 been a v4 only installation - simply no v6 available).

 That cause is mere speculation - no question about the wonky cable causing
 carrier loss, but whether that had anything to do with dhcpcd's decision to
 vanish is mere speculation.

 At the time the setup there was using dhcpcd=NO in rc.conf, and "dhcp" in
 /etc/ifconfig.wm0, with -q in dhcpcd_flags (just the default setting).

 It has since been changed to dhcpcd=YES with debug enabled in dhcpcd.conf
 and no -q in dhcpcd_flags to increase the chances of there being something
 revealing in the logs if it happens again).

 I agree with this though:

    Preferably, dhcpcd would at least give a reason for each way that
    it does terminate.

 Regardless of what options are set, or not, if a daemon that should remain
 running needs to exit, it should always say why - even if that means keeping
 a parent around that does nothing except wait on the child (which does the
 work, owns the .pid file ...) so it can report exits due to a signal if
 one should happen, which the child itself cannot reasonably log.


From: Taylor R Campbell <riastradh@NetBSD.org>
To: Robert Elz <kre@munnari.OZ.AU>
Cc: gnats-bugs@NetBSD.org, roy@NetBSD.org, gnats-admin@NetBSD.org,
	netbsd-bugs@NetBSD.org
Subject: Re: bin/57952: dhcpcd(8) inexplicably terminated in the night
Date: Thu, 22 Feb 2024 18:53:06 +0000

 > Date: Fri, 23 Feb 2024 01:15:56 +0700
 > From: Robert Elz <kre@munnari.OZ.AU>
 > 
 > At the time the setup there was using dhcpcd=NO in rc.conf, and "dhcp" in
 > /etc/ifconfig.wm0, with -q in dhcpcd_flags (just the default setting).

 In this case, I'm using dhcpcd=YES in rc.conf.

 On one machine, dhcpcd.conf is unmodified from 9.99.76 (yes, that one
 needs an update).

 On the other, the only differences from 10.0_RC3 /etc/dhcpcd.conf are:

 -option rapid_commit
 +#option rapid_commit
 ...
 +nooption dhcp6_reconfigure_accept

 (I had set these back in 2020 in an attempt to diagnose something I
 have since forgotten about, can probably remove them now.)

From: Hauke Fath <h.fath@nt.tu-darmstadt.de>
To: gnats-bugs@netbsd.org, gnats-admin@netbsd.org
Cc: 
Subject: Re: bin/57952: dhcpcd(8) inexplicably terminated in the night
Date: Thu, 22 Feb 2024 19:55:53 +0100

 On 2024-02-22 16:30, campbell+netbsd@mumble.net wrote:
 > Some time during the night, dhcpcd terminated on two different hosts
 > on my network without explaining why.

 FWIW, I ran into the same issue on Arch half a year ago: 
 <https://lists.archlinux.org/hyperkitty/list/arch-general@lists.archlinux.org/thread/6VRNI7RARP3EQUWHKRGQQILEGKV6UOMQ/>

 The hourly restart script is still in place...

 Cheerio,
 Hauke

 -- 
       The ASCII Ribbon Campaign                    Hauke Fath
 ()     No HTML/RTF in email	        Institut für Nachrichtentechnik
 /\     No Word docs in email                     TU Darmstadt
       Respect for open standards              Ruf +49-6151-16-21344

From: Roy Marples <roy@marples.name>
To: "gnats-bugs" <gnats-bugs@netbsd.org>
Cc: "riastradh" <riastradh@NetBSD.org>, "roy" <roy@netbsd.org>,
	"gnats-admin" <gnats-admin@netbsd.org>,
	"netbsd-bugs" <netbsd-bugs@netbsd.org>,
	"campbell+netbsd" <campbell+netbsd@mumble.net>
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Thu, 22 Feb 2024 23:37:23 +0000

 ----- On Thu, 22 Feb 2024 15:36:31 +0000    wrote --- 
   > Could I trouble you to take a look at this, roy?

 So dhcpcd crashed. As it runs in an empty chroot without the ability to create anything it has no means of saving any segfault information to the best of my knowledge.

 > dhcpcd[622]: ps_sendcmdmsg: No buffer space available
 > dhcpcd[622]: ps_inet_recvra: No buffer space available

 This means we tried to send a message over privsep that was bigger than what we allocate for.
 For reference, it's a fairly large allocation:
 #define	PS_BUFLEN		((64 * 1024) +			\
 				 sizeof(struct ps_msghdr) +	\
 				 sizeof(struct msghdr) +	\
 				 CMSG_SPACE(sizeof(struct in6_pktinfo) + \
 					    sizeof(int)))

 Basically this should be more than enough for an unfragmented message either from ICMP or UDP in any address family.
 It seems that it received a RA message dhcpcd really didn't like.

 > dhcpcd[622]: ure0: fe80::${ROGUEDEVICE}: no longer a default router
 > syslogd[423]: last message repeated 6 times
 > dhcpcd[622]: ps_inet_dodispatch: Connection reset by peer

 This is the manager process noting that the network proxy has gone away.

 I'll see if manually tripping the above condition causes an error or not and try to fix it.
 Bit busy until after the weekend, so hopefully I'll have something next week.

 Roy

From: Taylor R Campbell <riastradh@NetBSD.org>
To: Roy Marples <roy@marples.name>
Cc: gnats-bugs@NetBSD.org, roy@NetBSD.org,
	gnats-admin@NetBSD.org,
	netbsd-bugs@NetBSD.org
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Fri, 23 Feb 2024 03:10:17 +0000

 > Date: Thu, 22 Feb 2024 23:37:23 +0000
 > From: Roy Marples <roy@marples.name>
 > 
 > So dhcpcd crashed. As it runs in an empty chroot without the ability
 > to create anything it has no means of saving any segfault
 > information to the best of my knowledge.

 Could dhcpcd sprout an option to enable core dumps or something, or is
 there an easy way to do that out of the box already?

 > I'll see if manually tripping the above condition causes an error or
 > not and try to fix it.
 > 
 > Bit busy until after the weekend, so hopefully I'll have something
 > next week.

 Thanks!  This particular failure mode may not be urgent, but dhcpcd's
 silent exit without diagnostics makes it hard to figure out what's
 going on -- especially for largely-unattended network appliances.

From: Martin Husemann <martin@duskware.de>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: Roy Marples <roy@marples.name>, gnats-bugs@NetBSD.org, roy@NetBSD.org,
	gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Fri, 23 Feb 2024 10:07:21 +0100

 On Fri, Feb 23, 2024 at 03:10:17AM +0000, Taylor R Campbell wrote:
 > Could dhcpcd sprout an option to enable core dumps or something, or is
 > there an easy way to do that out of the box already?

 Create a writable /tmp in the chroot and set kern.defcorename = /tmp/%n.core
 (and be prepared to find any other core files in /tmp/ while this is
 in effect).

 Martin

From: Roy Marples <roy@marples.name>
To: "gnats-bugs" <gnats-bugs@netbsd.org>
Cc: "Martin Husemann" <martin@duskware.de>,
	"gnats-admin" <gnats-admin@netbsd.org>,
	"netbsd-bugs" <netbsd-bugs@netbsd.org>,
	"campbell+netbsd" <campbell+netbsd@mumble.net>
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Fri, 23 Feb 2024 09:26:13 +0000

  ---- On Fri, 23 Feb 2024 09:10:02 +0000  Martin Husemann  wrote --- 
  >  On Fri, Feb 23, 2024 at 03:10:17AM +0000, Taylor R Campbell wrote:
  >  > Could dhcpcd sprout an option to enable core dumps or something, or is
  >  > there an easy way to do that out of the box already?
  >  
  >  Create a writable /tmp in the chroot and set kern.defcorename = /tmp/%n.core
  >  (and be prepared to find any other core files in /tmp/ while this is
  >  in effect).

 Good idea!

 Does that work when the process is locked down so it can't create new files?
 https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/privsep.c#L146

 Roy

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Fri, 23 Feb 2024 10:19:12 -0000 (UTC)

 roy@marples.name (Roy Marples) writes:

 >Does that work when the process is locked down so it can't create new files?
 >https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/privsep.c#L146

 RLIMIT_NOFILE restricts file descriptors. Dumping core doesn't use
 file descriptors. You would need RLIMIT_CORE to restrict the size
 of the core dump (if too large, no core is dumped).

 You still need a writable path in the chroot.

From: Roy Marples <roy@marples.name>
To: "gnats-bugs" <gnats-bugs@netbsd.org>
Cc: "Martin Husemann" <martin@duskware.de>, "roy" <roy@netbsd.org>,
	"gnats-admin" <gnats-admin@netbsd.org>,
	"netbsd-bugs" <netbsd-bugs@netbsd.org>,
	"campbell+netbsd" <campbell+netbsd@mumble.net>
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Fri, 23 Feb 2024 16:17:35 +0000

  ---- On Fri, 23 Feb 2024 09:10:02 +0000  Martin Husemann  wrote --- 
  >  Create a writable /tmp in the chroot and set kern.defcorename = /tmp/%n.core
  >  (and be prepared to find any other core files in /tmp/ while this is
  >  in effect).

 Would it be a good idea to adjust mtree to create the default defcorename directory in the dhcpcd chroot directory with appropriate permissions?
 Or is that not a good out of the box idea?

 Roy

From: Robert Elz <kre@munnari.OZ.AU>
To: Roy Marples <roy@marples.name>
Cc: "gnats-bugs" <gnats-bugs@netbsd.org>
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Sat, 24 Feb 2024 05:35:05 +0700

     Date:        Fri, 23 Feb 2024 16:17:35 +0000
     From:        Roy Marples <roy@marples.name>
     Message-ID:  <18dd6c202ed.f0b7cc331864567.4485765256044008191@marples.name>

   | Would it be a good idea to adjust mtree to create the default
   | defcorename directory in the dhcpcd chroot directory with
   | appropriate permissions?

 The "default defcorename directory" (if I understand what you were
 asking about) is "." - which I assume already exists with what someone
 believes are appropriate permissions.

 The suggestion was to temporarily (globally, there's no local setting
 available for this) alter that to be "/tmp" while looking for this issue
 which would result in all core dumps being placed in /tmp - and of
 course, when in a chroot, the /tmp being used is one inside the chroot.

 Creating that (as a standard thing) would seem to be not the right thing
 to do - unless having a writeable (to the dhcpcd process owner) /tmp
 directory in the chroot would be useful for some other purpose.

 There is of course no real reason for /tmp to be the name chosen,
 except that it exists in the (non-chroot) normal environment, and is
 (almost always) writeable by anyone, which makes it work OK for this.
 But one could avoid cluttering it (given it is usually a tmpfs) with
 potentially large core files (like when firefox decides to abort) and
 instead make a similar kind of directory in a larger filesystem, and
 use that (and make the corresponding thing in the chroot).   As that
 other thing might be anywhere, depending upon available space, attempting
 to standardise it seems difficult.

 kre

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@netbsd.org
Cc: roy@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
        campbell+netbsd@mumble.net
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Sat, 24 Feb 2024 00:18:33 +0100

 On Fri, Feb 23, 2024 at 10:40:02PM +0000, Robert Elz wrote:
 >  [...]
 >  The suggestion was to temporarily (globally, there's no local setting
 >  available for this)

 Actually there is one: sysctl proc.curproc.corename, which defaults to
 kern.defcorename, but can be changed per-process.

 dhcpcd could change it, or the rc.d script could once dhcpcd has written its
 PID file (or before starting dhcpcd, I think it's inherited).

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Taylor R Campbell <riastradh@NetBSD.org>
To: Roy Marples <roy@marples.name>
Cc: gnats-bugs@NetBSD.org,
	Martin Husemann <martin@duskware.de>,
	gnats-admin@NetBSD.org,
	netbsd-bugs@NetBSD.org
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Fri, 23 Feb 2024 23:29:05 +0000

 I tried getting a core dump by doing:

 # chown _dhcpcd:_dhcpcd /var/chroot/dhcpcd

 and then sending SIGABRT to each of the dhcpcd processes.

 Here's a sample of the process hierarchy from `ps -Adww':

 29577 ?     S      0:00.11 |-- dhcpcd: [manager] [ip4] [ip6]=20
  2702 ?     S      0:00.12 | |-- dhcpcd: [privileged proxy]=20
  9448 ?     S      0:00.01 | |-- dhcpcd: [control proxy]=20
 16699 ?     S      0:00.01 | `-- dhcpcd: [network proxy]=20

 I verified with sysctl proc.$pid.rlimit.coredumpsize.soft/hard that
 the core dump size rlimit is unlimited:

 # for pid in 29577 2702 9448 16699; do for x in soft hard; do sysctl proc.$=
 pid.rlimit.coredumpsize.$x; done; done
 proc.29577.rlimit.coredumpsize.soft =3D unlimited
 proc.29577.rlimit.coredumpsize.hard =3D unlimited
 proc.2702.rlimit.coredumpsize.soft =3D unlimited
 proc.2702.rlimit.coredumpsize.hard =3D unlimited
 proc.9448.rlimit.coredumpsize.soft =3D unlimited
 proc.9448.rlimit.coredumpsize.hard =3D unlimited
 proc.16699.rlimit.coredumpsize.soft =3D unlimited
 proc.16699.rlimit.coredumpsize.hard =3D unlimited

 Results (pids replaced by roles in the log messages because I restart
 dhcpcd each time, of course):

 - kill -ABRT manager (cwd /var/chroot/dhcpcd): no core in / or in
   /var/chroot/dhcpcd, log messages from privileged proxy:

   <daemon.err>Feb 23 23:15:49 nanocons dhcpcd[privileged proxy]: ps_ctl_rec=
 v: read: Undefined error: 0
   <daemon.err>Feb 23 23:15:49 nanocons dhcpcd[privileged proxy]: ps_root_re=
 cvmsg: Connection reset by peer

   (This `Undefined error: 0' seems like a bug in itself -- something
   lost errno, perhaps?)

 - kill -ABRT privileged proxy (cwd /): core dumped in /, no log
   messages

 - kill -ABRT control proxy (cwd /var/chroot/dhcpcd): no core in / or
   in /var/chroot/dhcpcd, log messages from privileged proxy:

   <daemon.err>Feb 23 23:19:25 nanocons dhcpcd[privileged proxy]: ps_ctl_dod=
 ispatch: Connection reset by peer
   <daemon.err>Feb 23 23:19:25 nanocons dhcpcd[privileged proxy]: control_fr=
 ee: No such file or directory
   <daemon.err>Feb 23 23:19:25 nanocons dhcpcd[privileged proxy]: ps_sendpsm=
 msg: Destination address required
   <daemon.err>Feb 23 23:19:25 nanocons dhcpcd[privileged proxy]: ps_dostop:=
  Destination address required

 - kill -ABRT network proxy (cwd /var/chroot/dhcpcd): no core in / or
   in /var/chroot/dhcpcd, log messages from privileged proxy:

   <daemon.err>Feb 23 23:21:35 nanocons dhcpcd[5931]: ps_inet_dodispatch: Co=
 nnection reset by peer
   <daemon.err>Feb 23 23:21:35 nanocons dhcpcd[5931]: control_free: No such =
 file or directory
   <daemon.err>Feb 23 23:21:35 nanocons dhcpcd[5931]: ps_sendpsmmsg: Destina=
 tion address required
   <daemon.err>Feb 23 23:21:35 nanocons dhcpcd[5931]: ps_dostop: Destination=
  address required

 So I infer that the network proxy must have crashed, in my original
 case.  But I don't see how to trigger a core dump.

From: Taylor R Campbell <riastradh@NetBSD.org>
To: Roy Marples <roy@marples.name>
Cc: gnats-bugs@NetBSD.org,
	Martin Husemann <martin@duskware.de>,
	gnats-admin@NetBSD.org,
	netbsd-bugs@NetBSD.org
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Sat, 24 Feb 2024 00:48:47 +0000

 > Date: Fri, 23 Feb 2024 23:29:05 +0000
 > From: Taylor R Campbell <riastradh@NetBSD.org>
 >=20
 > So I infer that the network proxy must have crashed, in my original
 > case.  But I don't see how to trigger a core dump.

 Just in case somehow there's set-user/group-id processes involved, I
 also tried:

 1. make /var/chroot/dhcpcd/var/crash owned by _dhcpcd:_dhcpcd
 2. add `kern.coredump.setid.dump=3D1' to /etc/modules.conf
 3. reboot
 4. kill -ABRT manager

 I got a zero-length core file and a console message;

 [ 1251.6878114] pid 6079 (dhcpcd): system write of 64@0xffffc000b03a79c0 at=
  0 failed: 27

 I tried unlimiting the file size with

 # sysctl -w proc.$pid.rlimit.filesize.hard=3Dunlimited
 # sysctl -w proc.$pid.rlimit.filesize.soft=3Dunlimited

 and this time, kill -ABRT manager produced a core dump at
 /var/chroot/dhcpcd/var/crash/dhcpcd.core!

 So there's several issues:

 1. dhcpcd somehow gets the kern.coredump.setid treatment (which I
    thought was reserved for executables having the set-user/group-id
    bit set, and I don't see any evidence of that in dhcpcd)

 2. /var/crash doesn't exist under the chroot, /var/chroot/dhcpcd

 3. the filesize rlimit prevents the core dump too

From: Christos Zoulas <christos@zoulas.com>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: Roy Marples <roy@marples.name>,
 "gnats-bugs@netbsd.org" <gnats-bugs@NetBSD.org>,
 Martin Husemann <martin@duskware.de>,
 "gnats-admin@netbsd.org" <gnats-admin@NetBSD.org>,
 "netbsd-bugs@netbsd.org" <netbsd-bugs@NetBSD.org>
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Fri, 23 Feb 2024 19:50:16 -0500

        if (USPACE + ctob(vm->vm_dsize + vm->vm_ssize) >=3D
             p->p_rlimit[RLIMIT_CORE].rlim_cur) {
                 error =3D EFBIG;          /* better error code? */
                 goto release;
         }

 Looks like the core resource limit is 0?

 christos

 > On Feb 23, 2024, at 7:48=E2=80=AFPM, Taylor R Campbell =
 <riastradh@NetBSD.org> wrote:
 >=20
 >> Date: Fri, 23 Feb 2024 23:29:05 +0000
 >> From: Taylor R Campbell <riastradh@NetBSD.org>
 >>=20
 >> So I infer that the network proxy must have crashed, in my original
 >> case.  But I don't see how to trigger a core dump.
 >=20
 > Just in case somehow there's set-user/group-id processes involved, I
 > also tried:
 >=20
 > 1. make /var/chroot/dhcpcd/var/crash owned by _dhcpcd:_dhcpcd
 > 2. add `kern.coredump.setid.dump=3D1' to /etc/modules.conf
 > 3. reboot
 > 4. kill -ABRT manager
 >=20
 > I got a zero-length core file and a console message;
 >=20
 > [ 1251.6878114] pid 6079 (dhcpcd): system write of =
 64@0xffffc000b03a79c0 at 0 failed: 27
 >=20
 > I tried unlimiting the file size with
 >=20
 > # sysctl -w proc.$pid.rlimit.filesize.hard=3Dunlimited
 > # sysctl -w proc.$pid.rlimit.filesize.soft=3Dunlimited
 >=20
 > and this time, kill -ABRT manager produced a core dump at
 > /var/chroot/dhcpcd/var/crash/dhcpcd.core!
 >=20
 > So there's several issues:
 >=20
 > 1. dhcpcd somehow gets the kern.coredump.setid treatment (which I
 >   thought was reserved for executables having the set-user/group-id
 >   bit set, and I don't see any evidence of that in dhcpcd)
 >=20
 > 2. /var/crash doesn't exist under the chroot, /var/chroot/dhcpcd
 >=20
 > 3. the filesize rlimit prevents the core dump too

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Sat, 24 Feb 2024 07:46:29 -0000 (UTC)

 riastradh@NetBSD.org (Taylor R Campbell) writes:

 >  <daemon.err>Feb 23 23:21:35 nanocons dhcpcd[5931]: ps_inet_dodispatch: Co=
 >nnection reset by peer
 >  <daemon.err>Feb 23 23:21:35 nanocons dhcpcd[5931]: control_free: No such =
 >file or directory
 >  <daemon.err>Feb 23 23:21:35 nanocons dhcpcd[5931]: ps_sendpsmmsg: Destina=
 >tion address required
 >  <daemon.err>Feb 23 23:21:35 nanocons dhcpcd[5931]: ps_dostop: Destination=
 > address required

 >So I infer that the network proxy must have crashed, in my original
 >case.  But I don't see how to trigger a core dump.


 Try with kern.coredump.setid.dump=1 and kern.coredump.setid.path.

 change_root()
 {
 	...

         /* Broadcast our credentials to the process and other LWPs. */
         proc_crmod_leave(ncred, p->p_cred, true);
 }

 void  
 proc_crmod_leave(kauth_cred_t scred, kauth_cred_t fcred, bool sugid)
 {
 	...
 	if (sugid) {
                 /*
                  * Mark process as having changed credentials, stops
                  * tracing etc.
                  */
                 p->p_flag |= PK_SUGID;
         }
 	...
 }

 static int
 coredump(struct lwp *l, const char *pattern)
 {
 	...
         /*
          * Make sure the process has not set-id, to prevent data leaks,
          * unless it was specifically requested to allow set-id coredumps.
          */
         if (p->p_flag & PK_SUGID) {
                 if (!security_setidcore_dump) {
                         error = EPERM;
                         goto release;
                 }
                 pattern = security_setidcore_path;
         }
 }


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.