NetBSD Problem Report #57727

From www@netbsd.org  Mon Nov 27 10:31:03 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 006F41A9238
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 27 Nov 2023 10:31:02 +0000 (UTC)
Message-Id: <20231127103101.49F791A9239@mollari.NetBSD.org>
Date: Mon, 27 Nov 2023 10:31:01 +0000 (UTC)
From: abs@absd.org
Reply-To: abs@absd.org
To: gnats-bugs@NetBSD.org
Subject: dhcpcd exits after a few days
X-Send-Pr-Version: www-1.0

>Number:         57727
>Category:       bin
>Synopsis:       dhcpcd exits after a few days
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Nov 27 10:35:00 +0000 2023
>Last-Modified:  Mon Nov 27 20:15:01 +0000 2023
>Originator:     David Brownlee
>Release:        NetBSD 10.0_RC1
>Organization:
>Environment:
NetBSD iris.absd.org 10.0_RC1 NetBSD 10.0_RC1 (GENERIC) #0: Wed Nov  8 10:37:54 UTC 2023  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64

>Description:
Seen on two systems, after a few days dhcpcd exits

Both are running with a single active interface (one has a single interface the other has two interfaces only one of which is connected).
dhcpcd_flags are not set (so defaulting to "-qM")

Appears to be similar to https://github.com/NetworkConfiguration/dhcpcd/issues/179

Relevant extract from /var/log/messages

Nov 16 19:42:47 angus dhcpcd[11414]: dhcpcd-9.4.1 starting
Nov 16 19:42:47 angus dhcpcd[1580]: DUID 00:01:00:01:1e:8d:3d:1c:00:24:8c:bc:0c:d5
Nov 16 19:42:47 angus dhcpcd[1580]: bge0: IAID 45:fc:9c:f4
Nov 16 19:42:47 angus dhcpcd[1580]: bge1: waiting for carrier
Nov 16 19:42:47 angus dhcpcd[1580]: bge0: soliciting an IPv6 router
Nov 16 19:42:47 angus dhcpcd[1580]: bge0: soliciting a DHCP lease
Nov 16 19:42:51 angus dhcpcd[1580]: bge0: offered 192.168.1.71 from 192.168.1.1
Nov 16 19:42:57 angus dhcpcd[1580]: bge0: leased 192.168.1.71 for 43200 seconds
Nov 16 19:42:57 angus dhcpcd[1580]: bge0: adding route to 192.168.1.0/24
Nov 16 19:42:57 angus dhcpcd[1580]: bge0: adding default route via 192.168.1.1
Nov 16 19:42:59 angus dhcpcd[1580]: bge0: no IPv6 Routers available
Nov 20 18:05:16 angus dhcpcd[1580]: ps_inet_dodispatch: Connection reset by peer
Nov 20 18:05:16 angus dhcpcd[1580]: control_free: No such file or directory
Nov 20 18:05:16 angus dhcpcd[1580]: ps_sendpsmmsg: Destination address required
Nov 20 18:05:16 angus dhcpcd[1580]: ps_dostop: Destination address required
>How-To-Repeat:
Run dhcpcd, leave system
>Fix:
Current workaround is a cron job to restart dhcpcd nightly

>Audit-Trail:
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/57727: dhcpcd exits after a few days
Date: Mon, 27 Nov 2023 18:51:48 +0700

 Is it possible you have something which indiscriminately leans
 "old" fies in /var/run?

 If so, stop it, and try again - except at boot time, files in
 /var/run should be left alone ... if you have added soething there
 which needs clening, do that, but not any of the normal system
 added files (such as those in /var/run/dhcpcd/* - which it looks
 like some of have gone missing on your system).

 kre

From: David Brownlee <abs@absd.org>
To: gnats-bugs@netbsd.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: bin/57727: dhcpcd exits after a few days
Date: Mon, 27 Nov 2023 18:15:57 +0000

 On Mon, 27 Nov 2023 at 11:55, Robert Elz <kre@munnari.oz.au> wrote:
 >
 > The following reply was made to PR bin/57727; it has been noted by GNATS.
 >
 > From: Robert Elz <kre@munnari.OZ.AU>
 > To: gnats-bugs@netbsd.org
 > Cc:
 > Subject: Re: bin/57727: dhcpcd exits after a few days
 > Date: Mon, 27 Nov 2023 18:51:48 +0700
 >
 >  Is it possible you have something which indiscriminately leans
 >  "old" fies in /var/run?
 >
 >  If so, stop it, and try again - except at boot time, files in
 >  /var/run should be left alone ... if you have added soething there
 >  which needs clening, do that, but not any of the normal system
 >  added files (such as those in /var/run/dhcpcd/* - which it looks
 >  like some of have gone missing on your system).

 Checking on a system where dhcpcd has exited - there does not appear
 to be any files in /var/run/dhcpcd - though there are two directories
 and a selection of files in /var/db/dhcpcd, including lease files for
 interfaces not in this system (software and hardware have been
 upgraded a number of times).

 # find /var/*/dhcpcd | xargs ls -ld
 drwxr-xr-x  2 root  wheel  512 Oct 30 17:46 /var/chroot/dhcpcd
 drwxr-x---  2 root  wheel  512 Oct 30 17:46 /var/db/dhcpcd
 -rw-r-----  1 root  wheel  304 Nov 25 13:18 /var/db/dhcpcd/bge0.lease
 -rw-r--r--  1 root  wheel   42 Apr 28  2016 /var/db/dhcpcd/duid
 -r--r--r--  1 root  wheel  308 May 17  2022 /var/db/dhcpcd/re0.lease
 -r--------  1 root  wheel  192 Apr 18  2020 /var/db/dhcpcd/secret
 -r--r--r--  1 root  wheel  302 Aug  9  2011 /var/db/dhcpcd/wm0.lease
 drwxr-xr-x  3 root  wheel  512 Nov 25 18:24 /var/run/dhcpcd
 drwxr-xr-x  4 root  wheel  512 Nov 25 13:18 /var/run/dhcpcd/hook-state
 drwxr-xr-x  2 root  wheel  512 Nov  1 20:34 /var/run/dhcpcd/hook-state/ntp.conf
 drwxr-xr-x  2 root  wheel  512 Nov  1 20:34 /var/run/dhcpcd/hook-state/roaming

 I've saved a copy of them, then "rm -rf /var/run/dhcpcd
 /var/db/dhcpcd" and restarted dhcpcd

 # find /var/*/dhcpcd | xargs ls -ld
 drwxr-xr-x  2 root  wheel  512 Oct 30 17:46 /var/chroot/dhcpcd
 drwxr-x---  2 root  wheel  512 Nov 27 18:13 /var/db/dhcpcd
 -rw-r-----  1 root  wheel  304 Nov 27 18:13 /var/db/dhcpcd/bge0.lease
 drwxr-xr-x  3 root  wheel  512 Nov 27 18:13 /var/run/dhcpcd
 drwxr-xr-x  3 root  wheel  512 Nov 27 18:13 /var/run/dhcpcd/hook-state
 drwxr-xr-x  2 root  wheel  512 Nov 27 18:13 /var/run/dhcpcd/hook-state/ntp.conf
 -rw-r--r--  1 root  wheel    6 Nov 27 18:12 /var/run/dhcpcd/pid
 srw-rw----  1 root  wheel    0 Nov 27 18:12 /var/run/dhcpcd/sock
 srw-rw-rw-  1 root  wheel    0 Nov 27 18:12 /var/run/dhcpcd/unpriv.sock

 Will check back in a few days to see if the issue reoccurs.

 Thanks

 David

From: Robert Elz <kre@munnari.OZ.AU>
To: David Brownlee <abs@absd.org>
Cc: gnats-bugs@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: bin/57727: dhcpcd exits after a few days
Date: Tue, 28 Nov 2023 03:13:39 +0700

     Date:        Mon, 27 Nov 2023 18:15:57 +0000
     From:        David Brownlee <abs@absd.org>
     Message-ID:  <CAGN_6pZ+CrRzkNSzY735rXQf-pOwyN386XvrDHY2pSgOV8Gkgg@mail.gmail.com>


   | Checking on a system where dhcpcd has exited - there does not appear
   | to be any files in /var/run/dhcpcd

 Yes, that's what I was expecting from the error you sent

   | - though there are two directories
   | and a selection of files in /var/db/dhcpcd, including lease files for
   | interfaces not in this system

 Yes, stale stuff there will hang around forever - just in case that
 old hardware comes back...   The occasional cleanup doesn't hurt, and
 clearly you can remove it all (nothing gets installed there in a new
 installation after all) but in general files from /var/db shouldn't be
 arbitrarily removed, not even after reboots.

   | srw-rw----  1 root  wheel    0 Nov 27 18:12 /var/run/dhcpcd/sock
   | srw-rw-rw-  1 root  wheel    0 Nov 27 18:12 /var/run/dhcpcd/unpriv.sock

 Those two (or at least one of them) missing was the source of the errors.
 They should remain as long as dhcpcd is running (so should the rest of them,
 but those are the ones that are most likely to be noticed).

   | Will check back in a few days to see if the issue reoccurs.

 You might want to check crontab - make sure there's nothing there being
 a bit brutal with old files in /var/run (it is like /tmp when the system
 boots, but not while it remains running).

 On my system:

 jacaranda$ ls -l /var/run/dhcpcd
 total 2
 drwxr-xr-x  3 root  wheel  512 Nov 28 03:08 hook-state
 -rw-r--r--  1 root  wheel    6 Nov  4 15:59 pid
 srw-rw----  1 root  wheel    0 Nov  4 15:59 sock
 srw-rw-rw-  1 root  wheel    0 Nov  4 15:59 unpriv.sock

 Those files might look to be many weeks old, but they're still all needed.

 kre

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.