NetBSD Problem Report #57952
From www@netbsd.org Thu Feb 22 15:26:39 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 4DE031A9239
for <gnats-bugs@gnats.NetBSD.org>; Thu, 22 Feb 2024 15:26:39 +0000 (UTC)
Message-Id: <20240222152637.AAC281A923A@mollari.NetBSD.org>
Date: Thu, 22 Feb 2024 15:26:37 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: dhcpcd(8) inexplicably terminated in the night
X-Send-Pr-Version: www-1.0
>Number: 57952
>Category: bin
>Synopsis: dhcpcd(8) inexplicably terminated in the night
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: roy
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Feb 22 15:30:00 +0000 2024
>Last-Modified: Fri Oct 18 22:49:44 +0000 2024
>Originator: Taylor R Campbell
>Release: 10.0_RC3
>Organization:
The NetBSDCD Foundation
>Environment:
NetBSD nanocons.local 10.0_RC3 NetBSD 10.0_RC3 (GENERIC64) #15: Wed Jan 17 05:31:14 UTC 2024 root@manticore.local:/usr/obj/10/evbarm64/sys/arch/evbarm/compile/GENERIC64 evbarm
>Description:
Some time during the night, dhcpcd terminated on two different hosts on my network without explaining why. No core dump, no log messages obviously explaining what happened. I may have been configuring a network device at the time, which I'll call ${ROGUE_DEVICE}, but I don't remember what stage in the misconfiguration I was in. I may have wound up with two IPv6 routers and DHCPv6 servers at a time on the network.
Relevant log messages from the previous half hour or so, with IPv6 addresses replaced by symbolic labels:
<daemon.info>Feb 22 02:46:12 nanocons dhcpcd[622]: ure0: Router Advertisement from fe80::${ROGUEDEVICE}
<daemon.info>Feb 22 02:46:13 nanocons dhcpcd[622]: urtwn0: Router Advertisement from fe80::${ROGUEDEVICE}
<daemon.info>Feb 22 02:46:16 nanocons dhcpcd[622]: ure0: adding address ${GLOBALPREFIX}:${URE0_GLOBAL}/64
<daemon.info>Feb 22 02:46:16 nanocons dhcpcd[622]: ure0: adding address ${LOCALPREFIX}:${URE0_LOCAL}/64
<daemon.info>Feb 22 02:46:16 nanocons dhcpcd[622]: urtwn0: adding address ${GLOBALPREFIX}:${URTWN0_GLOBAL}/64
<daemon.info>Feb 22 02:46:16 nanocons dhcpcd[622]: urtwn0: adding address ${LOCALPREFIX}:${URTWN0_LOCAL}/64
<daemon.warn>Feb 22 02:51:42 nanocons dhcpcd[622]: ure0: fe80::${ROGUEDEVICE}: no longer a default router
<daemon.warn>Feb 22 02:51:43 nanocons dhcpcd[622]: urtwn0: fe80::${ROGUEDEVICE}: no longer a default router
<daemon.info>Feb 22 03:04:21 nanocons dhcpcd[622]: ure0: deleting address ${GLOBALPREFIX}:${URE0_GLOBAL}/64
<daemon.info>Feb 22 03:04:21 nanocons dhcpcd[622]: ure0: deleting address ${LOCALPREFIX}:${URE0_LOCAL}/64
<daemon.info>Feb 22 03:04:21 nanocons dhcpcd[622]: urtwn0: deleting address ${GLOBALPREFIX}:${URTWN0_GLOBAL}/64
<daemon.info>Feb 22 03:04:21 nanocons dhcpcd[622]: urtwn0: deleting address ${LOCALPREFIX}:${URTWN0_LOCAL}/64
<daemon.info>Feb 22 03:04:25 nanocons dhcpcd[622]: ure0: adding address ${LOCALPREFIX}:${URE0_LOCAL}/64
<daemon.info>Feb 22 03:04:25 nanocons dhcpcd[622]: ure0: adding address ${GLOBALPREFIX}:${URE0_GLOBAL}/64
<daemon.info>Feb 22 03:04:25 nanocons dhcpcd[622]: urtwn0: adding address ${LOCALPREFIX}:${URTWN0_LOCAL}/64
<daemon.info>Feb 22 03:04:25 nanocons dhcpcd[622]: urtwn0: adding address ${GLOBALPREFIX}:${URTWN0_GLOBAL}/64
<daemon.err>Feb 22 03:07:51 nanocons dhcpcd[622]: ps_sendcmdmsg: No buffer space available
<daemon.err>Feb 22 03:07:51 nanocons dhcpcd[622]: ps_inet_recvra: No buffer space available
<daemon.warn>Feb 22 03:07:59 nanocons dhcpcd[622]: ure0: fe80::${ROGUEDEVICE}: no longer a default router
<daemon.warn>Feb 22 03:08:02 nanocons syslogd[423]: last message repeated 5 times
<daemon.warn>Feb 22 03:08:02 nanocons dhcpcd[622]: urtwn0: fe80::${ROGUEDEVICE}: no longer a default router
<daemon.warn>Feb 22 03:08:02 nanocons dhcpcd[622]: ure0: fe80::${ROGUEDEVICE}: no longer a default router
<daemon.warn>Feb 22 03:08:02 nanocons dhcpcd[622]: ure0: fe80::${ROGUEDEVICE}: no longer a default router
<daemon.warn>Feb 22 03:08:03 nanocons dhcpcd[622]: urtwn0: fe80::${ROGUEDEVICE}: no longer a default router
<daemon.warn>Feb 22 03:08:03 nanocons dhcpcd[622]: ure0: fe80::${ROGUEDEVICE}: no longer a default router
<daemon.warn>Feb 22 03:08:06 nanocons syslogd[423]: last message repeated 6 times
<daemon.err>Feb 22 03:08:06 nanocons dhcpcd[622]: ps_inet_dodispatch: Connection reset by peer
<daemon.err>Feb 22 03:08:06 nanocons dhcpcd[622]: control_free: No such file or directory
<daemon.err>Feb 22 03:08:06 nanocons dhcpcd[622]: ps_sendpsmmsg: Destination address required
<daemon.err>Feb 22 03:08:06 nanocons dhcpcd[622]: ps_dostop: Destination address required
>How-To-Repeat:
not sure
>Fix:
Yes, please!
Ideally, mere network conditions should not provoke dhcpcd to terminate.
Preferably, dhcpcd would at least give a reason for each way that it does terminate.
At least it didn't take down the network configuration when it did so I was still able to ssh into the machines where it terminated.
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: bin-bug-people->roy
Responsible-Changed-By: riastradh@NetBSD.org
Responsible-Changed-When: Thu, 22 Feb 2024 15:36:30 +0000
Responsible-Changed-Why:
Could I trouble you to take a look at this, roy?
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/57952: dhcpcd(8) inexplicably terminated in the night
Date: Fri, 23 Feb 2024 01:15:56 +0700
As it happens, a few days ago, a friend of mine (not on any netbsd lists)
had much the same thing happen - dhcpcd on a NetBSD 9.3 host simply
vanished, without any log messages.
In that case, a possible explanation was carrier being lost, and reappearing
due to a shoddy connector (apparently wiggling the cable at the socket
caused carrier to reappear) but no new address was acquired, after which
upon looking, there was no dhcpcd in sight to obtain one (this would have
been a v4 only installation - simply no v6 available).
That cause is mere speculation - no question about the wonky cable causing
carrier loss, but whether that had anything to do with dhcpcd's decision to
vanish is mere speculation.
At the time the setup there was using dhcpcd=NO in rc.conf, and "dhcp" in
/etc/ifconfig.wm0, with -q in dhcpcd_flags (just the default setting).
It has since been changed to dhcpcd=YES with debug enabled in dhcpcd.conf
and no -q in dhcpcd_flags to increase the chances of there being something
revealing in the logs if it happens again).
I agree with this though:
Preferably, dhcpcd would at least give a reason for each way that
it does terminate.
Regardless of what options are set, or not, if a daemon that should remain
running needs to exit, it should always say why - even if that means keeping
a parent around that does nothing except wait on the child (which does the
work, owns the .pid file ...) so it can report exits due to a signal if
one should happen, which the child itself cannot reasonably log.
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Robert Elz <kre@munnari.OZ.AU>
Cc: gnats-bugs@NetBSD.org, roy@NetBSD.org, gnats-admin@NetBSD.org,
netbsd-bugs@NetBSD.org
Subject: Re: bin/57952: dhcpcd(8) inexplicably terminated in the night
Date: Thu, 22 Feb 2024 18:53:06 +0000
> Date: Fri, 23 Feb 2024 01:15:56 +0700
> From: Robert Elz <kre@munnari.OZ.AU>
>
> At the time the setup there was using dhcpcd=NO in rc.conf, and "dhcp" in
> /etc/ifconfig.wm0, with -q in dhcpcd_flags (just the default setting).
In this case, I'm using dhcpcd=YES in rc.conf.
On one machine, dhcpcd.conf is unmodified from 9.99.76 (yes, that one
needs an update).
On the other, the only differences from 10.0_RC3 /etc/dhcpcd.conf are:
-option rapid_commit
+#option rapid_commit
...
+nooption dhcp6_reconfigure_accept
(I had set these back in 2020 in an attempt to diagnose something I
have since forgotten about, can probably remove them now.)
From: Hauke Fath <h.fath@nt.tu-darmstadt.de>
To: gnats-bugs@netbsd.org, gnats-admin@netbsd.org
Cc:
Subject: Re: bin/57952: dhcpcd(8) inexplicably terminated in the night
Date: Thu, 22 Feb 2024 19:55:53 +0100
On 2024-02-22 16:30, campbell+netbsd@mumble.net wrote:
> Some time during the night, dhcpcd terminated on two different hosts
> on my network without explaining why.
FWIW, I ran into the same issue on Arch half a year ago:
<https://lists.archlinux.org/hyperkitty/list/arch-general@lists.archlinux.org/thread/6VRNI7RARP3EQUWHKRGQQILEGKV6UOMQ/>
The hourly restart script is still in place...
Cheerio,
Hauke
--
The ASCII Ribbon Campaign Hauke Fath
() No HTML/RTF in email Institut für Nachrichtentechnik
/\ No Word docs in email TU Darmstadt
Respect for open standards Ruf +49-6151-16-21344
From: Roy Marples <roy@marples.name>
To: "gnats-bugs" <gnats-bugs@netbsd.org>
Cc: "riastradh" <riastradh@NetBSD.org>, "roy" <roy@netbsd.org>,
"gnats-admin" <gnats-admin@netbsd.org>,
"netbsd-bugs" <netbsd-bugs@netbsd.org>,
"campbell+netbsd" <campbell+netbsd@mumble.net>
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Thu, 22 Feb 2024 23:37:23 +0000
----- On Thu, 22 Feb 2024 15:36:31 +0000 wrote ---
> Could I trouble you to take a look at this, roy?
So dhcpcd crashed. As it runs in an empty chroot without the ability to create anything it has no means of saving any segfault information to the best of my knowledge.
> dhcpcd[622]: ps_sendcmdmsg: No buffer space available
> dhcpcd[622]: ps_inet_recvra: No buffer space available
This means we tried to send a message over privsep that was bigger than what we allocate for.
For reference, it's a fairly large allocation:
#define PS_BUFLEN ((64 * 1024) + \
sizeof(struct ps_msghdr) + \
sizeof(struct msghdr) + \
CMSG_SPACE(sizeof(struct in6_pktinfo) + \
sizeof(int)))
Basically this should be more than enough for an unfragmented message either from ICMP or UDP in any address family.
It seems that it received a RA message dhcpcd really didn't like.
> dhcpcd[622]: ure0: fe80::${ROGUEDEVICE}: no longer a default router
> syslogd[423]: last message repeated 6 times
> dhcpcd[622]: ps_inet_dodispatch: Connection reset by peer
This is the manager process noting that the network proxy has gone away.
I'll see if manually tripping the above condition causes an error or not and try to fix it.
Bit busy until after the weekend, so hopefully I'll have something next week.
Roy
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Roy Marples <roy@marples.name>
Cc: gnats-bugs@NetBSD.org, roy@NetBSD.org,
gnats-admin@NetBSD.org,
netbsd-bugs@NetBSD.org
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Fri, 23 Feb 2024 03:10:17 +0000
> Date: Thu, 22 Feb 2024 23:37:23 +0000
> From: Roy Marples <roy@marples.name>
>
> So dhcpcd crashed. As it runs in an empty chroot without the ability
> to create anything it has no means of saving any segfault
> information to the best of my knowledge.
Could dhcpcd sprout an option to enable core dumps or something, or is
there an easy way to do that out of the box already?
> I'll see if manually tripping the above condition causes an error or
> not and try to fix it.
>
> Bit busy until after the weekend, so hopefully I'll have something
> next week.
Thanks! This particular failure mode may not be urgent, but dhcpcd's
silent exit without diagnostics makes it hard to figure out what's
going on -- especially for largely-unattended network appliances.
From: Martin Husemann <martin@duskware.de>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: Roy Marples <roy@marples.name>, gnats-bugs@NetBSD.org, roy@NetBSD.org,
gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Fri, 23 Feb 2024 10:07:21 +0100
On Fri, Feb 23, 2024 at 03:10:17AM +0000, Taylor R Campbell wrote:
> Could dhcpcd sprout an option to enable core dumps or something, or is
> there an easy way to do that out of the box already?
Create a writable /tmp in the chroot and set kern.defcorename = /tmp/%n.core
(and be prepared to find any other core files in /tmp/ while this is
in effect).
Martin
From: Roy Marples <roy@marples.name>
To: "gnats-bugs" <gnats-bugs@netbsd.org>
Cc: "Martin Husemann" <martin@duskware.de>,
"gnats-admin" <gnats-admin@netbsd.org>,
"netbsd-bugs" <netbsd-bugs@netbsd.org>,
"campbell+netbsd" <campbell+netbsd@mumble.net>
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Fri, 23 Feb 2024 09:26:13 +0000
---- On Fri, 23 Feb 2024 09:10:02 +0000 Martin Husemann wrote ---
> On Fri, Feb 23, 2024 at 03:10:17AM +0000, Taylor R Campbell wrote:
> > Could dhcpcd sprout an option to enable core dumps or something, or is
> > there an easy way to do that out of the box already?
>
> Create a writable /tmp in the chroot and set kern.defcorename = /tmp/%n.core
> (and be prepared to find any other core files in /tmp/ while this is
> in effect).
Good idea!
Does that work when the process is locked down so it can't create new files?
https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/privsep.c#L146
Roy
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Fri, 23 Feb 2024 10:19:12 -0000 (UTC)
roy@marples.name (Roy Marples) writes:
>Does that work when the process is locked down so it can't create new files?
>https://github.com/NetworkConfiguration/dhcpcd/blob/master/src/privsep.c#L146
RLIMIT_NOFILE restricts file descriptors. Dumping core doesn't use
file descriptors. You would need RLIMIT_CORE to restrict the size
of the core dump (if too large, no core is dumped).
You still need a writable path in the chroot.
From: Roy Marples <roy@marples.name>
To: "gnats-bugs" <gnats-bugs@netbsd.org>
Cc: "Martin Husemann" <martin@duskware.de>, "roy" <roy@netbsd.org>,
"gnats-admin" <gnats-admin@netbsd.org>,
"netbsd-bugs" <netbsd-bugs@netbsd.org>,
"campbell+netbsd" <campbell+netbsd@mumble.net>
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Fri, 23 Feb 2024 16:17:35 +0000
---- On Fri, 23 Feb 2024 09:10:02 +0000 Martin Husemann wrote ---
> Create a writable /tmp in the chroot and set kern.defcorename = /tmp/%n.core
> (and be prepared to find any other core files in /tmp/ while this is
> in effect).
Would it be a good idea to adjust mtree to create the default defcorename directory in the dhcpcd chroot directory with appropriate permissions?
Or is that not a good out of the box idea?
Roy
From: Robert Elz <kre@munnari.OZ.AU>
To: Roy Marples <roy@marples.name>
Cc: "gnats-bugs" <gnats-bugs@netbsd.org>
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Sat, 24 Feb 2024 05:35:05 +0700
Date: Fri, 23 Feb 2024 16:17:35 +0000
From: Roy Marples <roy@marples.name>
Message-ID: <18dd6c202ed.f0b7cc331864567.4485765256044008191@marples.name>
| Would it be a good idea to adjust mtree to create the default
| defcorename directory in the dhcpcd chroot directory with
| appropriate permissions?
The "default defcorename directory" (if I understand what you were
asking about) is "." - which I assume already exists with what someone
believes are appropriate permissions.
The suggestion was to temporarily (globally, there's no local setting
available for this) alter that to be "/tmp" while looking for this issue
which would result in all core dumps being placed in /tmp - and of
course, when in a chroot, the /tmp being used is one inside the chroot.
Creating that (as a standard thing) would seem to be not the right thing
to do - unless having a writeable (to the dhcpcd process owner) /tmp
directory in the chroot would be useful for some other purpose.
There is of course no real reason for /tmp to be the name chosen,
except that it exists in the (non-chroot) normal environment, and is
(almost always) writeable by anyone, which makes it work OK for this.
But one could avoid cluttering it (given it is usually a tmpfs) with
potentially large core files (like when firefox decides to abort) and
instead make a similar kind of directory in a larger filesystem, and
use that (and make the corresponding thing in the chroot). As that
other thing might be anywhere, depending upon available space, attempting
to standardise it seems difficult.
kre
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@netbsd.org
Cc: roy@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
campbell+netbsd@mumble.net
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Sat, 24 Feb 2024 00:18:33 +0100
On Fri, Feb 23, 2024 at 10:40:02PM +0000, Robert Elz wrote:
> [...]
> The suggestion was to temporarily (globally, there's no local setting
> available for this)
Actually there is one: sysctl proc.curproc.corename, which defaults to
kern.defcorename, but can be changed per-process.
dhcpcd could change it, or the rc.d script could once dhcpcd has written its
PID file (or before starting dhcpcd, I think it's inherited).
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Roy Marples <roy@marples.name>
Cc: gnats-bugs@NetBSD.org,
Martin Husemann <martin@duskware.de>,
gnats-admin@NetBSD.org,
netbsd-bugs@NetBSD.org
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Fri, 23 Feb 2024 23:29:05 +0000
I tried getting a core dump by doing:
# chown _dhcpcd:_dhcpcd /var/chroot/dhcpcd
and then sending SIGABRT to each of the dhcpcd processes.
Here's a sample of the process hierarchy from `ps -Adww':
29577 ? S 0:00.11 |-- dhcpcd: [manager] [ip4] [ip6]=20
2702 ? S 0:00.12 | |-- dhcpcd: [privileged proxy]=20
9448 ? S 0:00.01 | |-- dhcpcd: [control proxy]=20
16699 ? S 0:00.01 | `-- dhcpcd: [network proxy]=20
I verified with sysctl proc.$pid.rlimit.coredumpsize.soft/hard that
the core dump size rlimit is unlimited:
# for pid in 29577 2702 9448 16699; do for x in soft hard; do sysctl proc.$=
pid.rlimit.coredumpsize.$x; done; done
proc.29577.rlimit.coredumpsize.soft =3D unlimited
proc.29577.rlimit.coredumpsize.hard =3D unlimited
proc.2702.rlimit.coredumpsize.soft =3D unlimited
proc.2702.rlimit.coredumpsize.hard =3D unlimited
proc.9448.rlimit.coredumpsize.soft =3D unlimited
proc.9448.rlimit.coredumpsize.hard =3D unlimited
proc.16699.rlimit.coredumpsize.soft =3D unlimited
proc.16699.rlimit.coredumpsize.hard =3D unlimited
Results (pids replaced by roles in the log messages because I restart
dhcpcd each time, of course):
- kill -ABRT manager (cwd /var/chroot/dhcpcd): no core in / or in
/var/chroot/dhcpcd, log messages from privileged proxy:
<daemon.err>Feb 23 23:15:49 nanocons dhcpcd[privileged proxy]: ps_ctl_rec=
v: read: Undefined error: 0
<daemon.err>Feb 23 23:15:49 nanocons dhcpcd[privileged proxy]: ps_root_re=
cvmsg: Connection reset by peer
(This `Undefined error: 0' seems like a bug in itself -- something
lost errno, perhaps?)
- kill -ABRT privileged proxy (cwd /): core dumped in /, no log
messages
- kill -ABRT control proxy (cwd /var/chroot/dhcpcd): no core in / or
in /var/chroot/dhcpcd, log messages from privileged proxy:
<daemon.err>Feb 23 23:19:25 nanocons dhcpcd[privileged proxy]: ps_ctl_dod=
ispatch: Connection reset by peer
<daemon.err>Feb 23 23:19:25 nanocons dhcpcd[privileged proxy]: control_fr=
ee: No such file or directory
<daemon.err>Feb 23 23:19:25 nanocons dhcpcd[privileged proxy]: ps_sendpsm=
msg: Destination address required
<daemon.err>Feb 23 23:19:25 nanocons dhcpcd[privileged proxy]: ps_dostop:=
Destination address required
- kill -ABRT network proxy (cwd /var/chroot/dhcpcd): no core in / or
in /var/chroot/dhcpcd, log messages from privileged proxy:
<daemon.err>Feb 23 23:21:35 nanocons dhcpcd[5931]: ps_inet_dodispatch: Co=
nnection reset by peer
<daemon.err>Feb 23 23:21:35 nanocons dhcpcd[5931]: control_free: No such =
file or directory
<daemon.err>Feb 23 23:21:35 nanocons dhcpcd[5931]: ps_sendpsmmsg: Destina=
tion address required
<daemon.err>Feb 23 23:21:35 nanocons dhcpcd[5931]: ps_dostop: Destination=
address required
So I infer that the network proxy must have crashed, in my original
case. But I don't see how to trigger a core dump.
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Roy Marples <roy@marples.name>
Cc: gnats-bugs@NetBSD.org,
Martin Husemann <martin@duskware.de>,
gnats-admin@NetBSD.org,
netbsd-bugs@NetBSD.org
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Sat, 24 Feb 2024 00:48:47 +0000
> Date: Fri, 23 Feb 2024 23:29:05 +0000
> From: Taylor R Campbell <riastradh@NetBSD.org>
>=20
> So I infer that the network proxy must have crashed, in my original
> case. But I don't see how to trigger a core dump.
Just in case somehow there's set-user/group-id processes involved, I
also tried:
1. make /var/chroot/dhcpcd/var/crash owned by _dhcpcd:_dhcpcd
2. add `kern.coredump.setid.dump=3D1' to /etc/sysctl.conf
3. reboot
4. kill -ABRT manager
I got a zero-length core file and a console message;
[ 1251.6878114] pid 6079 (dhcpcd): system write of 64@0xffffc000b03a79c0 at=
0 failed: 27
I tried unlimiting the file size with
# sysctl -w proc.$pid.rlimit.filesize.hard=3Dunlimited
# sysctl -w proc.$pid.rlimit.filesize.soft=3Dunlimited
and this time, kill -ABRT manager produced a core dump at
/var/chroot/dhcpcd/var/crash/dhcpcd.core!
So there's several issues:
1. dhcpcd somehow gets the kern.coredump.setid treatment (which I
thought was reserved for executables having the set-user/group-id
bit set, and I don't see any evidence of that in dhcpcd)
2. /var/crash doesn't exist under the chroot, /var/chroot/dhcpcd
3. the filesize rlimit prevents the core dump too
From: Christos Zoulas <christos@zoulas.com>
To: Taylor R Campbell <riastradh@NetBSD.org>
Cc: Roy Marples <roy@marples.name>,
"gnats-bugs@netbsd.org" <gnats-bugs@NetBSD.org>,
Martin Husemann <martin@duskware.de>,
"gnats-admin@netbsd.org" <gnats-admin@NetBSD.org>,
"netbsd-bugs@netbsd.org" <netbsd-bugs@NetBSD.org>
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Fri, 23 Feb 2024 19:50:16 -0500
if (USPACE + ctob(vm->vm_dsize + vm->vm_ssize) >=3D
p->p_rlimit[RLIMIT_CORE].rlim_cur) {
error =3D EFBIG; /* better error code? */
goto release;
}
Looks like the core resource limit is 0?
christos
> On Feb 23, 2024, at 7:48=E2=80=AFPM, Taylor R Campbell =
<riastradh@NetBSD.org> wrote:
>=20
>> Date: Fri, 23 Feb 2024 23:29:05 +0000
>> From: Taylor R Campbell <riastradh@NetBSD.org>
>>=20
>> So I infer that the network proxy must have crashed, in my original
>> case. But I don't see how to trigger a core dump.
>=20
> Just in case somehow there's set-user/group-id processes involved, I
> also tried:
>=20
> 1. make /var/chroot/dhcpcd/var/crash owned by _dhcpcd:_dhcpcd
> 2. add `kern.coredump.setid.dump=3D1' to /etc/modules.conf
> 3. reboot
> 4. kill -ABRT manager
>=20
> I got a zero-length core file and a console message;
>=20
> [ 1251.6878114] pid 6079 (dhcpcd): system write of =
64@0xffffc000b03a79c0 at 0 failed: 27
>=20
> I tried unlimiting the file size with
>=20
> # sysctl -w proc.$pid.rlimit.filesize.hard=3Dunlimited
> # sysctl -w proc.$pid.rlimit.filesize.soft=3Dunlimited
>=20
> and this time, kill -ABRT manager produced a core dump at
> /var/chroot/dhcpcd/var/crash/dhcpcd.core!
>=20
> So there's several issues:
>=20
> 1. dhcpcd somehow gets the kern.coredump.setid treatment (which I
> thought was reserved for executables having the set-user/group-id
> bit set, and I don't see any evidence of that in dhcpcd)
>=20
> 2. /var/crash doesn't exist under the chroot, /var/chroot/dhcpcd
>=20
> 3. the filesize rlimit prevents the core dump too
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/57952 (dhcpcd(8) inexplicably terminated in the night)
Date: Sat, 24 Feb 2024 07:46:29 -0000 (UTC)
riastradh@NetBSD.org (Taylor R Campbell) writes:
> <daemon.err>Feb 23 23:21:35 nanocons dhcpcd[5931]: ps_inet_dodispatch: Co=
>nnection reset by peer
> <daemon.err>Feb 23 23:21:35 nanocons dhcpcd[5931]: control_free: No such =
>file or directory
> <daemon.err>Feb 23 23:21:35 nanocons dhcpcd[5931]: ps_sendpsmmsg: Destina=
>tion address required
> <daemon.err>Feb 23 23:21:35 nanocons dhcpcd[5931]: ps_dostop: Destination=
> address required
>So I infer that the network proxy must have crashed, in my original
>case. But I don't see how to trigger a core dump.
Try with kern.coredump.setid.dump=1 and kern.coredump.setid.path.
change_root()
{
...
/* Broadcast our credentials to the process and other LWPs. */
proc_crmod_leave(ncred, p->p_cred, true);
}
void
proc_crmod_leave(kauth_cred_t scred, kauth_cred_t fcred, bool sugid)
{
...
if (sugid) {
/*
* Mark process as having changed credentials, stops
* tracing etc.
*/
p->p_flag |= PK_SUGID;
}
...
}
static int
coredump(struct lwp *l, const char *pattern)
{
...
/*
* Make sure the process has not set-id, to prevent data leaks,
* unless it was specifically requested to allow set-id coredumps.
*/
if (p->p_flag & PK_SUGID) {
if (!security_setidcore_dump) {
error = EPERM;
goto release;
}
pattern = security_setidcore_path;
}
}
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.