NetBSD Problem Report #56842
From www@netbsd.org Mon May 16 18:40:44 2022
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 082CA1A921F
for <gnats-bugs@gnats.NetBSD.org>; Mon, 16 May 2022 18:40:44 +0000 (UTC)
Message-Id: <20220516184042.5AF2C1A923A@mollari.NetBSD.org>
Date: Mon, 16 May 2022 18:40:42 +0000 (UTC)
From: jspath55@gmail.com
Reply-To: jspath55@gmail.com
To: gnats-bugs@NetBSD.org
Subject: Cron hangs on Raspberry Pi Zero 2W
X-Send-Pr-Version: www-1.0
>Number: 56842
>Category: port-arm
>Synopsis: Cron hangs on Raspberry Pi Zero 2W
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-arm-maintainer
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon May 16 18:45:00 +0000 2022
>Closed-Date: Fri Jul 22 18:14:18 +0000 2022
>Last-Modified: Fri Jul 22 18:14:18 +0000 2022
>Originator: Jim Spath
>Release: NetBSD 9.2_STABLE
>Organization:
>Environment:
System: NetBSD n0b 9.2_STABLE NetBSD 9.2_STABLE (GENERIC) #0: Mon Apr 25 12:39:27 UTC 2022 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/evbarm/compile/GENERIC evbarm
>Description:
New install of NetBSD 9.2 (stable) on a Raspberry Pi Zero 2W.
Followed install from: https://mail-index.netbsd.org/port-arm/2022/02/14/msg007592.html
I have a shell script that should run every minute. After several days of uptime, I noticed the cron commands were not being processed.
I tried to start and stop the cron process using /etc/rc.d script, and that worked to resume processing for a little while.
I can find no obvious error messages in /var/log.
If I run a command to show a crontab listing, that works and the attempt is logged:
= =
$ crontab -u _httpd -l
#
[...]
$ tail /var/log/cron
[...]
May 15 09:21:00 n0b cron[17560]: (_httpd) CMD FINISH (/usr/local/www/bin/graph-cputemp.sh >>/usr/local/www/logs/crontab-graph.log 2>>/usr/local/www/logs/crontab-graph.err)
May 16 18:25:30 n0b crontab[9766]: (root) LIST (_httpd)
= = =
ls -l /usr/local/www/logs/crontab-graph.???
-rw-r--r-- 1 _httpd _httpd 684 Apr 28 00:40 /usr/local/www/logs/crontab-graph.err
-rw-r--r-- 1 _httpd _httpd 695088 May 15 09:21 /usr/local/www/logs/crontab-graph.log
= = =
The system has a USB Ethernet adapter connected, and is otherwise a stock Pi Zero 2W.
The logrotate pkg is set up in cron also:
# Thu Apr 28 00:16:45 UTC 2022
0 0 * * * /usr/pkg/sbin/logrotate /usr/pkg/etc/logrotate.conf
No email from daily root cron jobs since May 4, 2022.
The dmesg output is replicated here:
https://jspath55.blogspot.com/2022/04/raspberry-pi-zero-2-w-netbsd-dmesg-text.html
>How-To-Repeat:
Unsure how to repeat elsewhere.
Issue recurred after a reboot.
>Fix:
Unknown.
>Release-Note:
>Audit-Trail:
From: "David H. Gutteridge" <david@gutteridge.ca>
To: Gnats Bugs <gnats-bugs@netbsd.org>
Cc:
Subject: Re: port-arm/56842: Cron hangs on Raspberry Pi Zero 2W
Date: Thu, 19 May 2022 21:57:05 -0400
Hello,
FWIW, I haven't seen this issue running older NetBSD releases
(presently 8.0_STABLE) on an "old" Raspberry Pi B+. Granted, I don't
run anything every minute. That machine has been running for years
without a hiccup (other than power outages). I could try upgrading it
to 9.2_STABLE and see if I can replicate this. A few thoughts off the
top of my head follow.
If you look at the system after the point cron seems to have stopped
working, what does ps(1) tell you about the state of cron? What's the
system load at the time? (I'm assuming you have a standard CRON_WITHIN
value like 7200 and the machine isn't under an incredibly high load all
the time, as that seems very unlikely.)
You might try enabling extra debugging information with the -x option.
Have you found any core files from cron?
What happens if you disable particular cron entries, like the script
meant to run every minute?
Regards,
Dave
From: Jim Spath <jspath55@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-arm/56842
Date: Fri, 20 May 2022 09:20:51 -0400
--000000000000aaee7005df715c67
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Dave:
Thank you for the feedback. I have NetBSD running also on a Pi3 and a Pi4;
this is the first time getting a Zero 2W working. The other systems are
running current:
NetBSD [pi3] 9.99.82 NetBSD 9.99.82 (GENERIC64) #0: Tue Apr 27 05:40:29 UTC
2021 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/evbarm/compile/GENERIC64
evbarm
NetBSD [pi4] 9.99.93 NetBSD 9.99.93 (GENERIC64) #0: Sun Jan 2 23:46:21 UTC
2022 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/evbarm/compile/GENERIC64
evbarm
Neither of those, nor an earlier 9.2 system have shown cron hangs; I have
the identical script running.
Your questions:
- what does ps(1) tell you about the state of cron?
Nothing useful to me yet; but see below for results from top.
USER PID %CPU %MEM VSZ RSS TTY STAT STARTED TIME COMMAND
root 18233 0.0 0.4 6728 1824 ? Ss Mon05PM 0:00.00
/usr/sbin/cron
- You might try enabling extra debugging information with the -x option.
I tried one iteration with debug flags and captured logs but saw nothing
useful there.
- What happens if you disable particular cron entries, like the script
meant to run every minute?
I will try lowering the frequency, after doing a reboot and seeing if/when
the issue recurs. It seems this might be a =E2=80=9Cslow leak=E2=80=9D that=
will take
patience to track.
I investigated further and found hangs on both top and vmstat, at varying
times.
For vmstat, the first line (summary) is returned, but then nothing:
n0b:jim> date ; vmstat 1 10
Tue May 17 13:09:14 UTC 2022
procs memory page disks faults cpu
r b avm fre flt re pi po fr sr l0 n0 in sy cs us sy id
1 0 304608 88784 23 0 0 0 0 0 0 0 8882 44 14 0 1 99
^C
n0b:jim> date
Tue May 17 13:09:45 UTC 2022
That stall is inconsistent though, as the results today are nominal:
n0b:jim> date
Fri May 20 12:58:26 UTC 2022
n0b:jim> vmstat 1 3
procs memory page disks faults cpu
r b avm fre flt re pi po fr sr l0 n0 in sy cs us sy id
1 0 310320 82568 22 0 0 0 0 0 0 0 8870 43 13 0 1 99
0 0 310320 82568 0 0 0 0 0 0 0 0 8826 32 11 0 1 99
0 0 310320 82568 0 0 0 0 0 0 0 0 8902 30 10 0 1 99
n0b:jim> date
Fri May 20 12:58:36 UTC 2022
n0b:jim>
The top command starts up, displays some data, but then does not refresh.
The data are incomplete (values are all 0):
load averages: 0.01, 0.02, 0.00; up 11+21:37:48 13:06:53
46 processes: 44 sleeping, 2 on CPU
CPU0 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle
CPU1 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle
CPU2 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle
CPU3 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle
Memory: 298M Act, 104K Inact, 12M Wired, 15M Exec, 259M File, 86M Free
Swap:
Like vmstat, top worked later (except one core shows all zeroes).
load averages: 0.01, 0.02, 0.00; up 14+21:32:34 13:01:39
50 processes: 48 sleeping, 2 on CPU
CPU0 states: 0.0% user, 0.0% nice, 0.0% system, 1.6% interrupt, 98.4% idle
CPU1 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
CPU2 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle
CPU3 states: 0.0% user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle
Memory: 303M Act, 96K Inact, 12M Wired, 15M Exec, 262M File, 80M Free
Swap:
However, cron commands have not run since.
My next steps will be:
1. Reboot, taking note of initial state
2. Try adding a swap device (have seen some odd Pi behavior with 0 swap)
3. Decrease the cron job frequency
Jim
--000000000000aaee7005df715c67
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr">
=09
=09
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">Dave=
:</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">Than=
k you for the
feedback. I have NetBSD running also on a Pi3 and a Pi4; this is the
first time getting a Zero 2W working. The other systems are running
current:</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"><spa=
n style=3D"background-color:transparent">NetBSD [pi3] 9.99.82
NetBSD 9.99.82 (GENERIC64) #0: Tue Apr 27 05:40:29 UTC 2021=20
mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/evbarm/compile/GENERIC64
evbarm</span><br></p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"><spa=
n style=3D"background-color:transparent">NetBSD [pi4] 9.99.93
NetBSD 9.99.93 (GENERIC64) #0: Sun Jan 2 23:46:21 UTC 2022=20
mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/evbarm/compile/GENERIC64
evbarm</span><br></p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"><spa=
n style=3D"background-color:transparent">Neither of those,
nor an earlier 9.2 system have shown cron hangs; I have the identical scrip=
t running.</span><br></p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"><spa=
n style=3D"background-color:transparent">Your questions:</span><br></p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"><spa=
n style=3D"background-color:transparent">- </span><span style=3D"background=
-color:transparent;font-variant-numeric:normal;font-variant-east-asian:norm=
al"><font color=3D"#222222"><font face=3D"Arial, Helvetica, sans-serif"><fo=
nt style=3D"font-size:12pt">what
does ps(1) tell you about the state of cron?</font></font></font></span><br=
></p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"><spa=
n style=3D"background-color:transparent">Nothing useful to me yet;
but see below for results from top.</span><br></p><p style=3D"line-height:1=
00%;margin-bottom:0in;background:transparent"><span style=3D"background-col=
or:transparent"><br></span></p><p style=3D"line-height:100%;margin-bottom:0=
in;background:transparent">USER =C2=A0 =C2=A0 =C2=A0PID %CPU %MEM =C2=A0 VS=
Z =C2=A0 RSS TTY =C2=A0 STAT STARTED =C2=A0 =C2=A0TIME COMMAND<br></p><p st=
yle=3D"line-height:100%;margin-bottom:0in;background:transparent">root =C2=
=A0 =C2=A018233 =C2=A00.0 =C2=A00.4 =C2=A06728 =C2=A01824 ? =C2=A0 =C2=A0 S=
s =C2=A0 Mon05PM 0:00.00 /usr/sbin/cron=C2=A0<br></p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"><spa=
n style=3D"font-size:12pt;background-color:transparent">-
You might try enabling extra debugging information with the -x
option.</span><br></p><p style=3D"line-height:100%;margin-bottom:0in;backgr=
ound:transparent"><span style=3D"background-color:transparent">I tried one
iteration with debug flags and captured logs but saw nothing useful
there.</span><br></p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"><spa=
n style=3D"background-color:transparent">- </span><span style=3D"background=
-color:transparent;font-variant-numeric:normal;font-variant-east-asian:norm=
al"><font color=3D"#222222"><font face=3D"Arial, Helvetica, sans-serif"><fo=
nt style=3D"font-size:12pt">What
happens if you disable particular cron entries, like the script meant
to run every minute?</font></font></font></span><br></p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"><spa=
n style=3D"background-color:transparent">I will try lowering
the frequency, after doing a reboot and seeing if/when the issue
recurs. It seems this might be a =E2=80=9Cslow leak=E2=80=9D that will take
patience to track.</span><br></p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"><spa=
n style=3D"background-color:transparent">I investigated
further and found hangs on both top and vmstat, at varying times.</span><br=
></p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"><spa=
n style=3D"background-color:transparent">For vmstat, the
first line (summary) is returned, but then nothing:</span><br></p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"><spa=
n style=3D"background-color:transparent">n0b:jim> date ;
vmstat 1 10</span><br></p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">Tue =
May 17 13:09:14
UTC 2022</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"> pro=
cs memory =20
page disks faults cpu</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"> r b=
avm fre
flt re pi po fr sr l0 n0 in sy cs us sy id</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"> 1 0=
304608 88784
23 0 0 0 0 0 0 0 8882 44 14 0 1 99</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">^C</=
p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">n0b:=
jim> date</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">Tue =
May 17 13:09:45
UTC 2022</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"><spa=
n style=3D"background-color:transparent">That stall is
inconsistent though, as the results today are nominal:</span><br></p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"><spa=
n style=3D"background-color:transparent">n0b:jim> date</span><br></p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">Fri =
May 20 12:58:26
UTC 2022</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">n0b:=
jim> vmstat 1
3</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"> pro=
cs memory =20
page disks faults cpu</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"> r b=
avm fre
flt re pi po fr sr l0 n0 in sy cs us sy id</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"> 1 0=
310320 82568
22 0 0 0 0 0 0 0 8870 43 13 0 1 99</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"> 0 0=
310320 82568
0 0 0 0 0 0 0 0 8826 32 11 0 1 99</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"> 0 0=
310320 82568
0 0 0 0 0 0 0 0 8902 30 10 0 1 99</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">n0b:=
jim> date</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">Fri =
May 20 12:58:36
UTC 2022</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">n0b:=
jim>=20
</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"><spa=
n style=3D"background-color:transparent">The top command
starts up, displays some data, but then does not refresh. The data
are incomplete (values are all 0):</span><br></p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"><spa=
n style=3D"background-color:transparent">load averages:=20
0.01, 0.02, 0.00; up 11+21:37:48 =20
13:06:53</span><br></p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">46 p=
rocesses: 44
sleeping, 2 on CPU</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">CPU0=
states: 0.0%
user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">CPU1=
states: 0.0%
user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">CPU2=
states: 0.0%
user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">CPU3=
states: 0.0%
user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">Memo=
ry: 298M Act,
104K Inact, 12M Wired, 15M Exec, 259M File, 86M Free</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">Swap=
:=20
</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"><br>
</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">Like=
vmstat, top
worked later (except one core shows all zeroes).</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"><spa=
n style=3D"background-color:transparent">load averages:=20
0.01, 0.02, 0.00; up 14+21:32:34 13:01:39</span><br><=
/p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">50 p=
rocesses: 48
sleeping, 2 on CPU</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">CPU0=
states: 0.0%
user, 0.0% nice, 0.0% system, 1.6% interrupt, 98.4% idle</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">CPU1=
states: 0.0%
user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">CPU2=
states: 0.0%
user, 0.0% nice, 0.0% system, 0.0% interrupt, 0.0% idle</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">CPU3=
states: 0.0%
user, 0.0% nice, 0.0% system, 0.0% interrupt, 100% idle</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">Memo=
ry: 303M Act,
96K Inact, 12M Wired, 15M Exec, 262M File, 80M Free</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">Swap=
:</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"><spa=
n style=3D"background-color:transparent">However, cron
commands have </span>not run<span style=3D"background-color:transparent"> s=
ince.</span><br></p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"><spa=
n style=3D"background-color:transparent">My next steps will
be:</span><br></p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent"><spa=
n style=3D"background-color:transparent">1. Reboot, taking
note of initial state</span><br></p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">2. T=
ry adding a swap
device (have seen some odd Pi behavior with 0 swap)</p>
<p style=3D"line-height:100%;margin-bottom:0in;background:transparent">3. D=
ecrease the cron
job frequency</p><p style=3D"line-height:100%;margin-bottom:0in;background:=
transparent"><br></p><p style=3D"line-height:100%;margin-bottom:0in;backgro=
und:transparent">Jim</p></div>
--000000000000aaee7005df715c67--
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-arm/56842
Date: Fri, 20 May 2022 23:21:08 +0000
On Fri, May 20, 2022 at 01:25:01PM +0000, Jim Spath wrote:
> USER PID %CPU %MEM VSZ RSS TTY STAT STARTED TIME COMMAND
>
> root 18233 0.0 0.4 6728 1824 ? Ss Mon05PM 0:00.00
> /usr/sbin/cron
ps -l might be interesting (it prints the WCHAN) but more likely not.
If other programs have similar problems, it's more likely not cron
itself.
I have no idea what the situation with timecounters on this hw is, but
if there's more than one option it might be interesting to try a
different one. (And, relatedly: is the system time behaving normally?)
--
David A. Holland
dholland@netbsd.org
From: "David H. Gutteridge" <david@gutteridge.ca>
To: Gnats Bugs <gnats-bugs@netbsd.org>
Cc:
Subject: Re: port-arm/56842: Cron hangs on Raspberry Pi Zero 2W
Date: Sat, 21 May 2022 19:22:44 -0400
It seems there's a general issue with both interactive and daemonized
processes not running as expected. Another thing to ask, then: have you
tried a -current kernel to see if there's any difference?
Dave
From: Jim Spath <jspath55@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-arm/56842
Date: Mon, 23 May 2022 13:17:50 -0400
Thank you for the tips. Answering several:
> ps -l
While I don't see anything obvious in the output from "ps -l" I did
observe several processes that cron started. These should have
finished quickly but I can't tell immediately why they stalled. It
does give me ideas for other tests from cron. I've captured the output
in a log to compare with future states.
> timecounters on this hw?
I don't understand this suggestion, sorry. I did find a fascinating
page from 2006 on porting NetBSD to a new ARM SoC:
https://www.netbsd.org/docs/kernel/porting_netbsd_arm_soc.html
Most of those details are beyond my ken, however.
The dmesg output shows:
[ 1.000000] timecounter: Timecounters tick every 10.000 msec
[ 1.000000] timecounter: Timecounter "armgtmr0" frequency 19200000
Hz quality 500
[ 1.000003] timecounter: Timecounter "clockinterrupt" frequency
100 Hz quality 0
My Pi3 running NetBSD shows the same values.
> system time?
The ntp daemon looked normal, but then I saw this:
May 3 11:32:01 n0b ntpd[436]: kernel reports TIME_ERROR: 0x41: Clock
Unsynchronized
After reboot, that message reappeared, as well as similar messages
that I didn't spot before. On an unrelated note, I wish I could find a
way to stop ntpd from seeking IPV6 hosts, as my ISP doesn't support
that path. Just wastes time not getting responses.
The ntpdate output seems OK.
$ ntpdate 2.netbsd.pool.ntp.org
23 May 17:07:03 ntpdate[497]: adjust time server 192.227.183.3 offset
-0.014973 sec
> newer kernel?
The image I'm running is the first NetBSD version I've found that will
run on the 02W. I will search for the steps to install a current
build, on different media so I can preserve the (mostly) working
install.
Thank you for the suggestions. I rebooted the system today. Alas, it
hit errors that required fsck to resolve as it did not halt cleanly
after a shutdown request, necessitating a power-off. A partial list of
recovered files:
- /var/db/entroy-file
- /var/log/cron
I found the _httpd user crontab file was corrupted, so I reinstalled
that. The /var/log/cron file was removed by fsck cleanup and I reset
that also. I think the entropy-file self-corrected after reboots.
Did not see this before (uncertain if it had not been flushed to disk
or I overlooked)
# ls -l /var/cron
-rw------- 1 root wheel 415424 May 15 09:02 cron.core
I will report in a couple days one way or the other. The cron jobs are
running now.
From: Jim Spath <jspath55@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: re: port-arm/56842
Date: Thu, 9 Jun 2022 09:42:42 -0400
My original install sd card is now corrupt/suspect after a power cycle
required fsck which then failed to recover some files.
Looking back, I found errors related to the USB/ethernet adapter:
May 3 10:50:38 n0b /netbsd: [ 58744.3697748] ure0: autoconfiguration
error: watchdog timeout
May 3 10:50:48 n0b /netbsd: [ 58754.3704177] ure0: autoconfiguration
error: usb error on tx: TIMEOUT
...
May 5 15:15:10 n0b dhcpcd[153]: ure0: dhcp_sendudp: Host is down
May 5 15:15:10 n0b dhcpcd[153]: ure0: bpf_send: No buffer space available
...
May 27 00:58:39 n0b dhcpcd[153]: ure0: bpf_send: No buffer space available
...
Jun 4 21:25:03 n0b dhcpcd[260]: ure0: bpf_send: No buffer space available
The first adapter:
Jun 4 21:27:21 n0b /netbsd: [ 7.0539144] ure0: Realtek (0xbda) USB
10/100/1000 LAN (0x8153), rev 2.10/31.00, addr 4
The second adapter:
Jun 4 23:16:43 n0b /netbsd: [ 6554.5805440] ure0: Realtek (0xbda) USB
10/100 LAN (0x8152), rev 2.10/20.00, addr 4
I installed the same 9.2 stable image on a second Pi, with the same
10/100 LAN adapter model. So far, no cron hangs on either. I will let
the new system run for a few more days, then switch to the suspect
adapter and see if the problem recurs. It seems feasible that network
driver errors could be a root cause.
Strangely, both adapter models say "Gigabit LAN" on the case, but the
one that connected at 1000BT had errors, while the 100BT connection
does not. And, the model that only connected at 100 works at 1000 on a
PC.
(apologies for missending this to gnats-admin first)
From: Jim Spath <jspath55@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: re: port-arm/56842
Date: Thu, 21 Jul 2022 18:05:27 -0400
I would like to update this problem report. I have been unable to
reproduce the original issue with cron jobs. The tests I have done
with a second Pi and 2 different types of ethernet adapters leads me
to believe the root cause is one of the adapters not behaving
properly. The correctly working adapter runs at 1000BT, while the
suspect adapter drops to 100BT, and sometimes stops working
altogether, showing various symptoms. If I can isolate this adapter
issue further I will open a new PR.
Thank you to those who gave me feedback.
State-Changed-From-To: open->closed
State-Changed-By: gutteridge@NetBSD.org
State-Changed-When: Fri, 22 Jul 2022 18:14:18 +0000
State-Changed-Why:
Closing ticket, per submitter. Thanks for your efforts investigating this!
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.