NetBSD Problem Report #56232
From www@netbsd.org Thu Jun 3 22:19:22 2021
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 9C5761A921F
for <gnats-bugs@gnats.NetBSD.org>; Thu, 3 Jun 2021 22:19:22 +0000 (UTC)
Message-Id: <20210603221921.7650A1A9239@mollari.NetBSD.org>
Date: Thu, 3 Jun 2021 22:19:21 +0000 (UTC)
From: rvp@SDF.ORG
Reply-To: rvp@SDF.ORG
To: gnats-bugs@NetBSD.org
Subject: Unstable system with tar on /dev
X-Send-Pr-Version: www-1.0
>Number: 56232
>Category: kern
>Synopsis: Unstable system with tar on /dev
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Jun 03 22:20:00 +0000 2021
>Last-Modified: Mon Jun 07 04:30:01 +0000 2021
>Originator: RVP
>Release: NetBSD/amd64 9.99.82 (GENERIC)
>Organization:
>Environment:
NetBSD x202e.localdomain 9.99.82 NetBSD 9.99.82 (GENERIC) #0: Sat May 8 19:36:28 UTC 2021 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
Running tar on /dev causes unstable system behaviour (the keyboard locks up, or generates random keys when pressed).
/tmp# tar -C /dev -cf /dev/null .
tar: Couldn't list extended attributes: Invalid argument
tar: Couldn't list extended attributes: Invalid argument
tar: Couldn't list extended attributes: Invalid argument
tar: Couldn't list extended attributes: Invalid argument
tar: Couldn't list extended attributes: Invalid argument
tar: Couldn't list extended attributes: Invalid argument
tar: Couldn't list extended attributes: Invalid argument
tar: Couldn't list extended attributes: Invalid argument
tar: Couldn't list extended attributes: Invalid argument
tar: Couldn't list extended attributes: Invalid argument
tar: Couldn't list extended attributes: Invalid argument
/tmp#
And, dmesg shows:
[ 730.595589] WARNING: module error: incompatible module class 1 for `nvmm' (wanted 3)
[ 730.595589] WARNING: module error: incompatible module class 1 for `nvmm' (wanted 3)
[ 730.746121] iscsi: attached. major = 203
[ 730.796534] tap0: Ethernet address f2:0b:a4:70:cf:1c
[ 730.796534] tap0: detached
[ 730.805590] pad0: outputs: 44100Hz, 16-bit, stereo
[ 730.805590] audio1 at pad0: playback
[ 730.805590] audio1: slinear_le:16 -> slinear_le:16 2ch 44100Hz, blk 1764 bytes (10ms) for playback
[ 730.805590] spkr2 at audio1: PC Speaker (synthesized)
[ 730.805590] wsbell at spkr2 not configured
[ 730.805590] spkr2: detached
[ 730.805590] audio1: detached
[ 730.805590] pad0: detached
[ 730.805590] pad1: outputs: 44100Hz, 16-bit, stereo
[ 730.805590] audio1 at pad1: playback
[ 730.805590] audio1: slinear_le:16 -> slinear_le:16 2ch 44100Hz, blk 1764 bytes (10ms) for playback
[ 730.805590] spkr2 at audio1: PC Speaker (synthesized)
[ 730.805590] wsbell at spkr2 not configured
[ 730.805590] spkr2: detached
[ 730.805590] audio1: detached
[ 730.805590] pad1: detached
[ 730.805590] pad2: outputs: 44100Hz, 16-bit, stereo
[ 730.805590] audio1 at pad2: playback
[ 730.815590] audio1: slinear_le:16 -> slinear_le:16 2ch 44100Hz, blk 1764 bytes (10ms) for playback
[ 730.815590] spkr2 at audio1: PC Speaker (synthesized)
[ 730.815590] wsbell at spkr2 not configured
[ 730.815590] spkr2: detached
[ 730.815590] audio1: detached
[ 730.815590] pad2: detached
[ 730.815590] pad3: outputs: 44100Hz, 16-bit, stereo
[ 730.815590] audio1 at pad3: playback
[ 730.815590] audio1: slinear_le:16 -> slinear_le:16 2ch 44100Hz, blk 1764 bytes (10ms) for playback
[ 730.815590] spkr2 at audio1: PC Speaker (synthesized)
[ 730.815590] wsbell at spkr2 not configured
[ 730.815590] spkr2: detached
[ 730.815590] audio1: detached
[ 730.815590] pad3: detached
[ 730.915588] WARNING: module error: incompatible module class 2 for `zfs' (wanted 3)
[ 730.915588] WARNING: module error: incompatible module class 2 for `zfs' (wanted 3)
[ 730.935587] WARNING: module error: incompatible module class 1 for `lua' (wanted 3)
[ 730.935587] WARNING: module error: incompatible module class 1 for `lua' (wanted 3)
[ 730.955587] WARNING: module error: incompatible module class 2 for `autofs' (wanted 3)
[ 730.955587] WARNING: module error: incompatible module class 2 for `autofs' (wanted 3)
[ 730.995587] WARNING: module error: incompatible module class 1 for `dtrace' (wanted 3)
[ 730.995587] WARNING: module error: incompatible module class 1 for `dtrace' (wanted 3)
>How-To-Repeat:
See above.
>Fix:
>Audit-Trail:
From: RVP <rvp@SDF.ORG>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Fri, 4 Jun 2021 06:56:00 +0000 (UTC)
Caused by, at least, pad(4), and possibly, lua(4):
$ id
uid=1001(rvp) gid=1001(rvp) groups=1001(rvp),0(wheel),9(wsrc)
$ sudo chmod go-rw /dev/pad* /dev/lua*
$ tar -C /dev -cf /dev/null .
tar: Couldn't list extended attributes: Permission denied
tar: Couldn't list extended attributes: Invalid argument
...
$ dmesg | tail
[ 6.059407] intelfb0 at i915drmkms0
[ 6.059407] intelfb0: framebuffer at 0xe0364000, size 1366x768, depth
32, stride 5504
[ 6.789401] wsdisplay0 at intelfb0 kbdmux 1: console (default, vt100
emulation), using wskbd0
[ 6.799401] wsmux1: connecting to wsdisplay0
[ 14.529371] wsdisplay0: screen 1 added (default, vt100 emulation)
[ 14.539372] wsdisplay0: screen 2 added (default, vt100 emulation)
[ 14.539372] wsdisplay0: screen 3 added (default, vt100 emulation)
[ 14.539372] wsdisplay0: screen 4 added (default, vt100 emulation)
[ 20.379350] cpu 0: ucode 0x15->0x21
[ 20.389352] cpu 1: ucode 0x15->0x21
$
No errors reported by the kernel if read perms. are removed from
/dev/pad*, /dev/lua*
Issues I noticed:
- In sys/dev/pad/pad.c, padattach() doesn't seem to be called from
anywhere
and therefore mutex_init() is never called.
`pad_ca' is used in 2 places, but is defined nowhere.
- Similarly, in sys/net/if_tap.c, tapattach() is not called by anybody.
-RVP
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Fri, 4 Jun 2021 10:24:00 +0200
We had this discussion before, I think there even is an open PR against tar.
Folks are in eiteher of two camps:
- tar needs to open the file and extract ACLs from the filedescriptor,
otherwise there would be races.
-> solution: the kernel should never do state changes (like rewind tapes
or similar) on plain "open" of a device node
- tar should avoid all this dance when there are no ACLs anyway on the
file system it is traversing. State changes on device open may be a hack,
but they are a very ancient unix hack and quite common.
There is an option to tar (I forgot which) to not backup ACLs - and then
everything should be fine.
IMHO this option should be on by default.
Martin
From: RVP <rvp@SDF.ORG>
To: gnats-bugs@netbsd.org
Cc: Martin Husemann <martin@duskware.de>
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Fri, 4 Jun 2021 09:06:16 +0000 (UTC)
On Fri, 4 Jun 2021, Martin Husemann wrote:
> There is an option to tar (I forgot which) to not backup ACLs - and then
> everything should be fine.
>
Not really an ACLs or extattrs issue, I think. Just doing an
open()/fstat()/close() on each of the entries in /dev
reliably reboots my laptop. Run the following program as root:
$ sudo ./a.out /dev
--- START CODE ---
/**
* ic-walk.c:
* Traverse given dir. like ``find dir'', calling iconv(3)
* on each pathname. This causes '?' to be displayed for
* chars. which are not valid for the currently charset.
*/
#include <sys/stat.h>
#include <assert.h>
#include <dirent.h>
#include <err.h>
#include <errno.h>
#include <fcntl.h>
#include <iconv.h>
#include <langinfo.h>
#include <locale.h>
#include <stdio.h>
#include <string.h>
#include <stdlib.h>
#include <unistd.h>
static int walk(char* path, iconv_t cd);
static char* concat(char* path, char* name);
static char* convpath(char* path, iconv_t cd);
static char* prog;
int
main(int argc, char* argv[])
{
char* tc, *fc;
iconv_t cd;
int rc = EXIT_FAILURE;
prog = argv[0];
printf("Current locale: %s\n", setlocale(LC_ALL, NULL));
(void)setlocale(LC_ALL, "");
fc = nl_langinfo(CODESET);
tc = "UTF-8";
printf("from charset: %s\n", fc);
printf(" to charset: %s\n", tc);
cd = iconv_open(tc, fc);
if (cd == (iconv_t)-1)
err(rc, "iconv failed");
if (argc != 2)
rc = walk(".", cd);
else
rc = walk(argv[1], cd);
iconv_close(cd);
return rc;
}
/*
* Walk the directory tree recursively, starting at `path'.
*/
static int
walk(char* path, iconv_t cd)
{
struct dirent* dent;
DIR* dir;
if ((dir = opendir(path)) == NULL) {
fprintf(stderr, "%s: %s: opendir failed. ", prog, path);
perror(NULL);
return EXIT_FAILURE;
}
while ((dent = readdir(dir)) != NULL) {
char* name = dent->d_name;
char* np = concat(path, name);
struct stat sb;
int fd;
if ((fd = open(np, O_RDONLY)) == -1) {
warn("%s: open failed", np);
free(np);
continue;
}
if (fstat(fd, &sb) != 0) {
warn("%s: fstat failed", np);
free(np);
close(fd);
continue;
}
close(fd);
char* s = convpath(np, cd);
if (S_ISDIR(sb.st_mode)) {
/* dir, but skip `.' and `..' */
if (strcmp(name, ".") && strcmp(name, "..")) {
printf("%s/\n", s);
walk(np, cd); /* recurse */
}
} else /* file */
printf("%s\n", s);
free(np);
free(s);
}
closedir(dir);
return EXIT_SUCCESS;
}
static char*
concat(char* path, char* name)
{
assert(path && name);
size_t len = strlen(path) + strlen(name) + 2;
char* s = malloc(len);
if (s == NULL)
err(EXIT_FAILURE, "malloc failed");
char* end = path + strlen(path);
while (end > path && *(end-1) == '/')
--end;
snprintf(s, len, "%.*s/%s", (int)(end-path), path, name);
return s;
}
static char*
convpath(char* path, iconv_t cd)
{
size_t slen = strlen(path) + 1;
size_t dlen = slen * 2 + 1;
char* dest;
char* sp, *dp;
dest = malloc(dlen);
if (dest == NULL)
err(EXIT_FAILURE, "malloc failed");
sp = path;
dp = dest;
while (slen >= 1) {
errno = 0;
size_t rc = iconv(cd, &sp, &slen, &dp, &dlen);
if (rc != (size_t)-1)
break;
if (errno == EILSEQ || errno == EINVAL) {
*dp++ = '?';
sp++;
dlen--;
slen--;
} else {
warn("iconv failed");
free(dest);
return strdup(path);
}
}
return dest;
}
--- END CODE ---
From: Martin Husemann <martin@duskware.de>
To: RVP <rvp@SDF.ORG>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Fri, 4 Jun 2021 12:18:53 +0200
On Fri, Jun 04, 2021 at 09:06:16AM +0000, RVP wrote:
> Not really an ACLs or extattrs issue, I think. Just doing an
> open()/fstat()/close() on each of the entries in /dev
> reliably reboots my laptop.
You should not randomly open dev nodes.
But as I said before, this point is part of the controversy.
Martin
From: RVP <rvp@SDF.ORG>
To: Martin Husemann <martin@duskware.de>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Fri, 4 Jun 2021 12:01:56 +0000 (UTC)
On Fri, 4 Jun 2021, Martin Husemann wrote:
> On Fri, Jun 04, 2021 at 09:06:16AM +0000, RVP wrote:
>> Not really an ACLs or extattrs issue, I think. Just doing an
>> open()/fstat()/close() on each of the entries in /dev
>> reliably reboots my laptop.
>
> You should not randomly open dev nodes.
>
Yeah...at least not without O_NONBLOCK. But, the OS shouldn't
panic and reboot if a regular user just happens to open some odd
device.
-RVP
From: Martin Husemann <martin@duskware.de>
To: RVP <rvp@SDF.ORG>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Fri, 4 Jun 2021 14:11:31 +0200
On Fri, Jun 04, 2021 at 12:01:56PM +0000, RVP wrote:
> Yeah...at least not without O_NONBLOCK. But, the OS shouldn't
> panic and reboot if a regular user just happens to open some odd
> device.
Uhm, no - I missed the unpriviledged user part, and panic is never the
right thing to do.
Martin
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Fri, 4 Jun 2021 14:13:35 -0000 (UTC)
rvp@SDF.ORG (RVP) writes:
> - In sys/dev/pad/pad.c, padattach() doesn't seem to be called from
> anywhere
> and therefore mutex_init() is never called.
pad is a pseudo device, part of the code calling the attach function
is autogenerated by config(1) when building a kernel.
> `pad_ca' is used in 2 places, but is defined nowhere.
It is defined by the CFATTACH_ECL2_NEW macro in sys/device.h.
> - Similarly, in sys/net/if_tap.c, tapattach() is not called by anybody.
dito.
When you look into the compile directory after a build, you see the generated
file ioconf.c.
The table 'pdevinit[]' lists all the pseudodevice attach routines that
will be called by config_finalize() in sys/kern/subr_autoconf.c.
Greetings,
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Fri, 4 Jun 2021 16:42:54 +0000
On Fri, Jun 04, 2021 at 08:25:01AM +0000, Martin Husemann wrote:
> We had this discussion before, I think there even is an open PR
> against tar.
> Folks are in eiteher of two camps:
>
> - tar needs to open the file and extract ACLs from the filedescriptor,
> otherwise there would be races.
>
> -> solution: the kernel should never do state changes (like rewind
> tapes or similar) on plain "open" of a device node
>
> - tar should avoid all this dance when there are no ACLs anyway on the
> file system it is traversing. State changes on device open may be a
> hack, but they are a very ancient unix hack and quite common.
Devices with side-effecting open are a thing; they aren't going to go
away. Volumes with ACLs are also a thing, and might reasonably include
the root fs, and in fact devices are one of the things you might
specifically want custom access control for.
Conclusion: tar (and other things) need to be able to fetch acls
without open().
Do you remember where this discussion was, or some search keywords? My
gnats index is not what it used to be.
> There is an option to tar (I forgot which) to not backup ACLs - and then
> everything should be fine.
>
> IMHO this option should be on by default.
Silently throwing away your ACLs in your backups isn't the right
answer either :-(
--
David A. Holland
dholland@netbsd.org
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Fri, 4 Jun 2021 18:52:01 +0200
On Fri, Jun 04, 2021 at 04:45:02PM +0000, David Holland wrote:
> Do you remember where this discussion was, or some search keywords? My
> gnats index is not what it used to be.
I was probably thinking of PR 55815
Martin
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Fri, 4 Jun 2021 16:59:17 +0000
On Fri, Jun 04, 2021 at 04:55:02PM +0000, Martin Husemann wrote:
> The following reply was made to PR kern/56232; it has been noted by GNATS.
>
> From: Martin Husemann <martin@duskware.de>
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: kern/56232: Unstable system with tar on /dev
> Date: Fri, 4 Jun 2021 18:52:01 +0200
>
> On Fri, Jun 04, 2021 at 04:45:02PM +0000, David Holland wrote:
> > Do you remember where this discussion was, or some search keywords? My
> > gnats index is not what it used to be.
>
> I was probably thinking of PR 55815
thanks, I'll take my noise there, and leave this PR for devices that
blow up the system when opened, which is not an admissible type of
side effect :-)
--
David A. Holland
dholland@netbsd.org
From: nia <nia@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Fri, 4 Jun 2021 21:35:36 +0000
On Fri, Jun 04, 2021 at 07:00:02AM +0000, RVP wrote:
> No errors reported by the kernel if read perms. are removed from
> /dev/pad*, /dev/lua*
Note that it's possible in HEAD to trigger a kernel assertion failure
when reading from a pad(4) as an unprivileged user, see
PR kern/56073
From: RVP <rvp@SDF.ORG>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Sun, 6 Jun 2021 05:54:46 +0000 (UTC)
OK, I've found out which device is causing the kernel crashes.
It's iscsi0. Running the program below as root reproducibly
crashes my system.
$ sudo ./opentest /dev/iscsi0
/dev/iscsi0
[ The system hangs here, then reboots ]
Another issue found during this test:
$ sudo find /dev/ -not \( -type f -o -name iscsi0 \) \
-exec ./opentest {} + 2>&1 | fgrep -B1 close
/dev/ipmi0
opentest: /dev/ipmi0: close failed.: Device not configured
$
ipmi(4) should fail on open()--not later on in close(), I think.
Should I file a separate PR for this, or is leaving it here OK?
Thanks,
-RVP
----- START CODE -----
/**
* sudo ./opentest /dev/iscsi0
*
* sudo find /dev -not \( -type f -o -name iscsi0 \) \
* -exec ./opentest {} + 2>&1 | fgrep -B1 close
*/
#include <sys/stat.h>
#include <err.h>
#include <fcntl.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
int
main(int argc, char* argv[])
{
int i, rc = EXIT_SUCCESS;
for (i = 1; i < argc; i++) {
char* fn = argv[i];
struct stat sb;
int fd;
printf("%s\n", fn);
fflush(stdout);
fd = open(fn, O_RDONLY | O_NONBLOCK);
if (fd == -1) {
warn("%s: open failed.", fn);
rc = EXIT_FAILURE;
continue;
}
if (fstat(fd, &sb) == -1) {
warn("%s: fstat failed.", fn);
rc = EXIT_FAILURE;
}
if (close(fd) == -1) {
warn("%s: close failed.", fn);
rc = EXIT_FAILURE;
}
}
return rc;
}
----- END CODE -----
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Sun, 6 Jun 2021 07:09:58 -0000 (UTC)
rvp@SDF.ORG (RVP) writes:
> ipmi(4) should fail on open()--not later on in close(), I think.
> Should I file a separate PR for this, or is leaving it here OK?
The crash happens because there is no read function defined.
Not sure yet why close fails.
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Mon, 7 Jun 2021 02:22:22 +0000
On Sun, Jun 06, 2021 at 05:55:01AM +0000, RVP wrote:
> opentest: /dev/ipmi0: close failed.: Device not configured
> $
>
> ipmi(4) should fail on open()--not later on in close(), I think.
yeah, failing with "Device not configured" is a bit off...
> Should I file a separate PR for this, or is leaving it here OK?
Probably best to. As soon as there's more than one issue in the same
PR it becomes a nuisance to try to keep track of which ones are and
aren't fixed.
--
David A. Holland
dholland@netbsd.org
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Mon, 7 Jun 2021 04:27:42 -0000 (UTC)
dholland-bugs@netbsd.org (David Holland) writes:
> > ipmi(4) should fail on open()--not later on in close(), I think.
> yeah, failing with "Device not configured" is a bit off...
Fixed. Now open already fails when no ipmi device exists.
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.