NetBSD Problem Report #56232

From www@netbsd.org  Thu Jun  3 22:19:22 2021
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 9C5761A921F
	for <gnats-bugs@gnats.NetBSD.org>; Thu,  3 Jun 2021 22:19:22 +0000 (UTC)
Message-Id: <20210603221921.7650A1A9239@mollari.NetBSD.org>
Date: Thu,  3 Jun 2021 22:19:21 +0000 (UTC)
From: rvp@SDF.ORG
Reply-To: rvp@SDF.ORG
To: gnats-bugs@NetBSD.org
Subject: Unstable system with tar on /dev
X-Send-Pr-Version: www-1.0

>Number:         56232
>Category:       kern
>Synopsis:       Unstable system with tar on /dev
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Jun 03 22:20:00 +0000 2021
>Last-Modified:  Mon Jun 07 04:30:01 +0000 2021
>Originator:     RVP
>Release:        NetBSD/amd64 9.99.82 (GENERIC)
>Organization:
>Environment:
NetBSD x202e.localdomain 9.99.82 NetBSD 9.99.82 (GENERIC) #0: Sat May  8 19:36:28 UTC 2021  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
Running tar on /dev causes unstable system behaviour (the keyboard locks up, or generates random keys when pressed).

/tmp# tar -C /dev -cf /dev/null .
tar: Couldn't list extended attributes: Invalid argument
tar: Couldn't list extended attributes: Invalid argument
tar: Couldn't list extended attributes: Invalid argument
tar: Couldn't list extended attributes: Invalid argument
tar: Couldn't list extended attributes: Invalid argument
tar: Couldn't list extended attributes: Invalid argument
tar: Couldn't list extended attributes: Invalid argument
tar: Couldn't list extended attributes: Invalid argument
tar: Couldn't list extended attributes: Invalid argument
tar: Couldn't list extended attributes: Invalid argument
tar: Couldn't list extended attributes: Invalid argument
/tmp# 

And, dmesg shows:

[   730.595589] WARNING: module error: incompatible module class 1 for `nvmm' (wanted 3)
[   730.595589] WARNING: module error: incompatible module class 1 for `nvmm' (wanted 3)
[   730.746121] iscsi: attached.  major = 203
[   730.796534] tap0: Ethernet address f2:0b:a4:70:cf:1c
[   730.796534] tap0: detached
[   730.805590] pad0: outputs: 44100Hz, 16-bit, stereo
[   730.805590] audio1 at pad0: playback
[   730.805590] audio1: slinear_le:16 -> slinear_le:16 2ch 44100Hz, blk 1764 bytes (10ms) for playback
[   730.805590] spkr2 at audio1: PC Speaker (synthesized)
[   730.805590] wsbell at spkr2 not configured
[   730.805590] spkr2: detached
[   730.805590] audio1: detached
[   730.805590] pad0: detached
[   730.805590] pad1: outputs: 44100Hz, 16-bit, stereo
[   730.805590] audio1 at pad1: playback
[   730.805590] audio1: slinear_le:16 -> slinear_le:16 2ch 44100Hz, blk 1764 bytes (10ms) for playback
[   730.805590] spkr2 at audio1: PC Speaker (synthesized)
[   730.805590] wsbell at spkr2 not configured
[   730.805590] spkr2: detached
[   730.805590] audio1: detached
[   730.805590] pad1: detached
[   730.805590] pad2: outputs: 44100Hz, 16-bit, stereo
[   730.805590] audio1 at pad2: playback
[   730.815590] audio1: slinear_le:16 -> slinear_le:16 2ch 44100Hz, blk 1764 bytes (10ms) for playback
[   730.815590] spkr2 at audio1: PC Speaker (synthesized)
[   730.815590] wsbell at spkr2 not configured
[   730.815590] spkr2: detached
[   730.815590] audio1: detached
[   730.815590] pad2: detached
[   730.815590] pad3: outputs: 44100Hz, 16-bit, stereo
[   730.815590] audio1 at pad3: playback
[   730.815590] audio1: slinear_le:16 -> slinear_le:16 2ch 44100Hz, blk 1764 bytes (10ms) for playback
[   730.815590] spkr2 at audio1: PC Speaker (synthesized)
[   730.815590] wsbell at spkr2 not configured
[   730.815590] spkr2: detached
[   730.815590] audio1: detached
[   730.815590] pad3: detached
[   730.915588] WARNING: module error: incompatible module class 2 for `zfs' (wanted 3)
[   730.915588] WARNING: module error: incompatible module class 2 for `zfs' (wanted 3)
[   730.935587] WARNING: module error: incompatible module class 1 for `lua' (wanted 3)
[   730.935587] WARNING: module error: incompatible module class 1 for `lua' (wanted 3)
[   730.955587] WARNING: module error: incompatible module class 2 for `autofs' (wanted 3)
[   730.955587] WARNING: module error: incompatible module class 2 for `autofs' (wanted 3)
[   730.995587] WARNING: module error: incompatible module class 1 for `dtrace' (wanted 3)
[   730.995587] WARNING: module error: incompatible module class 1 for `dtrace' (wanted 3)

>How-To-Repeat:
See above.
>Fix:

>Audit-Trail:
From: RVP <rvp@SDF.ORG>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Fri, 4 Jun 2021 06:56:00 +0000 (UTC)

 Caused by, at least, pad(4), and possibly, lua(4):

 $ id
 uid=1001(rvp) gid=1001(rvp) groups=1001(rvp),0(wheel),9(wsrc)
 $ sudo chmod go-rw /dev/pad* /dev/lua*
 $ tar -C /dev -cf /dev/null .
 tar: Couldn't list extended attributes: Permission denied
 tar: Couldn't list extended attributes: Invalid argument
 ...
 $ dmesg | tail
 [     6.059407] intelfb0 at i915drmkms0
 [     6.059407] intelfb0: framebuffer at 0xe0364000, size 1366x768, depth 
 32, stride 5504
 [     6.789401] wsdisplay0 at intelfb0 kbdmux 1: console (default, vt100 
 emulation), using wskbd0
 [     6.799401] wsmux1: connecting to wsdisplay0
 [    14.529371] wsdisplay0: screen 1 added (default, vt100 emulation)
 [    14.539372] wsdisplay0: screen 2 added (default, vt100 emulation)
 [    14.539372] wsdisplay0: screen 3 added (default, vt100 emulation)
 [    14.539372] wsdisplay0: screen 4 added (default, vt100 emulation)
 [    20.379350] cpu 0: ucode 0x15->0x21
 [    20.389352] cpu 1: ucode 0x15->0x21
 $

 No errors reported by the kernel if read perms. are removed from
 /dev/pad*, /dev/lua*

 Issues I noticed:

 - In sys/dev/pad/pad.c, padattach() doesn't seem to be called from 
 anywhere
    and therefore mutex_init() is never called.

    `pad_ca' is used in 2 places, but is defined nowhere.

 - Similarly, in sys/net/if_tap.c, tapattach() is not called by anybody.


 -RVP

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Fri, 4 Jun 2021 10:24:00 +0200

 We had this discussion before, I think there even is an open PR against tar.
 Folks are in eiteher of two camps:

  - tar needs to open the file and extract ACLs from the filedescriptor,
    otherwise there would be races.

     -> solution: the kernel should never do state changes (like rewind tapes
        or similar) on plain "open" of a device node

  - tar should avoid all this dance when there are no ACLs anyway on the
    file system it is traversing. State changes on device open may be a hack,
    but they are a very ancient unix hack and quite common.

 There is an option to tar (I forgot which) to not backup ACLs - and then 
 everything should be fine.

 IMHO this option should be on by default.

 Martin

From: RVP <rvp@SDF.ORG>
To: gnats-bugs@netbsd.org
Cc: Martin Husemann <martin@duskware.de>
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Fri, 4 Jun 2021 09:06:16 +0000 (UTC)

 On Fri, 4 Jun 2021, Martin Husemann wrote:

 > There is an option to tar (I forgot which) to not backup ACLs - and then
 > everything should be fine.
 >

 Not really an ACLs or extattrs issue, I think. Just doing an
 open()/fstat()/close() on each of the entries in /dev
 reliably reboots my laptop. Run the following program as root:

  	$ sudo ./a.out /dev

 --- START CODE ---
 /**
   * ic-walk.c:
   * Traverse given dir. like ``find dir'', calling iconv(3)
   * on each pathname. This causes '?' to be displayed for
   * chars. which are not valid for the currently charset.
   */
 #include <sys/stat.h>
 #include <assert.h>
 #include <dirent.h>
 #include <err.h>
 #include <errno.h>
 #include <fcntl.h>
 #include <iconv.h>
 #include <langinfo.h>
 #include <locale.h>
 #include <stdio.h>
 #include <string.h>
 #include <stdlib.h>
 #include <unistd.h>

 static int walk(char* path, iconv_t cd);
 static char* concat(char* path, char* name);
 static char* convpath(char* path, iconv_t cd);

 static char* prog;

 int
 main(int argc, char* argv[])
 {
  	char* tc, *fc;
  	iconv_t cd;
  	int rc = EXIT_FAILURE;

  	prog = argv[0];

  	printf("Current locale: %s\n", setlocale(LC_ALL, NULL));
  	(void)setlocale(LC_ALL, "");
  	fc = nl_langinfo(CODESET);
  	tc = "UTF-8";
  	printf("from charset: %s\n", fc);
  	printf("  to charset: %s\n", tc);
  	cd = iconv_open(tc, fc);
  	if (cd == (iconv_t)-1)
  		err(rc, "iconv failed");
  	if (argc != 2)
  		rc = walk(".", cd);
  	else
  		rc = walk(argv[1], cd);
  	iconv_close(cd);
  	return rc;
 }

 /*
   * Walk the directory tree recursively, starting at `path'.
   */
 static int
 walk(char* path, iconv_t cd)
 {
  	struct dirent* dent;
  	DIR* dir;

  	if ((dir = opendir(path)) == NULL) {
  		fprintf(stderr, "%s: %s: opendir failed. ", prog, path);
  		perror(NULL);
  		return EXIT_FAILURE;
  	}
  	while ((dent = readdir(dir)) != NULL) {
  		char* name = dent->d_name;
  		char* np = concat(path, name);
  		struct stat sb;
  		int fd;
  		if ((fd = open(np, O_RDONLY)) == -1) {
  			warn("%s: open failed", np);
  			free(np);
  			continue;
  		}
  		if (fstat(fd, &sb) != 0) {
  			warn("%s: fstat failed", np);
  			free(np);
  			close(fd);
  			continue;
  		}
  		close(fd);
  		char* s = convpath(np, cd);
  		if (S_ISDIR(sb.st_mode)) {
  			/* dir, but skip `.' and `..' */
  			if (strcmp(name, ".") && strcmp(name, "..")) {
  				printf("%s/\n", s);
  				walk(np, cd);	/* recurse */
  			}
  		} else	/* file */
  			printf("%s\n", s);
  		free(np);
  		free(s);
  	}
  	closedir(dir);
  	return EXIT_SUCCESS;
 }

 static char*
 concat(char* path, char* name)
 {
  	assert(path && name);

  	size_t len = strlen(path) + strlen(name) + 2;
  	char* s = malloc(len);
  	if (s == NULL)
  		err(EXIT_FAILURE, "malloc failed");

  	char* end = path + strlen(path);
  	while (end > path && *(end-1) == '/')
  		--end;
  	snprintf(s, len, "%.*s/%s", (int)(end-path), path, name);

  	return s;
 }

 static char*
 convpath(char* path, iconv_t cd)
 {
  	size_t slen = strlen(path) + 1;
  	size_t dlen = slen * 2 + 1;
  	char* dest;
  	char* sp, *dp;

  	dest = malloc(dlen);
  	if (dest == NULL)
  		err(EXIT_FAILURE, "malloc failed");

  	sp = path;
  	dp = dest;
  	while (slen >= 1) {
  		errno = 0;
  		size_t rc = iconv(cd, &sp, &slen, &dp, &dlen);
  		if (rc != (size_t)-1)
  			break;
  		if (errno == EILSEQ || errno == EINVAL) {
  			*dp++ = '?';
  			sp++;
  			dlen--;
  			slen--;
  		} else {
  			warn("iconv failed");
  			free(dest);
  			return strdup(path);
  		}
  	}
  	return dest;
 }
 --- END CODE ---

From: Martin Husemann <martin@duskware.de>
To: RVP <rvp@SDF.ORG>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Fri, 4 Jun 2021 12:18:53 +0200

 On Fri, Jun 04, 2021 at 09:06:16AM +0000, RVP wrote:
 > Not really an ACLs or extattrs issue, I think. Just doing an
 > open()/fstat()/close() on each of the entries in /dev
 > reliably reboots my laptop.

 You should not randomly open dev nodes.

 But as I said before, this point is part of the controversy.

 Martin

From: RVP <rvp@SDF.ORG>
To: Martin Husemann <martin@duskware.de>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Fri, 4 Jun 2021 12:01:56 +0000 (UTC)

 On Fri, 4 Jun 2021, Martin Husemann wrote:

 > On Fri, Jun 04, 2021 at 09:06:16AM +0000, RVP wrote:
 >> Not really an ACLs or extattrs issue, I think. Just doing an
 >> open()/fstat()/close() on each of the entries in /dev
 >> reliably reboots my laptop.
 >
 > You should not randomly open dev nodes.
 >

 Yeah...at least not without O_NONBLOCK. But, the OS shouldn't
 panic and reboot if a regular user just happens to open some odd
 device.

 -RVP

From: Martin Husemann <martin@duskware.de>
To: RVP <rvp@SDF.ORG>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Fri, 4 Jun 2021 14:11:31 +0200

 On Fri, Jun 04, 2021 at 12:01:56PM +0000, RVP wrote:
 > Yeah...at least not without O_NONBLOCK. But, the OS shouldn't
 > panic and reboot if a regular user just happens to open some odd
 > device.

 Uhm, no - I missed the unpriviledged user part, and panic is never the
 right thing to do.

 Martin

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Fri, 4 Jun 2021 14:13:35 -0000 (UTC)

 rvp@SDF.ORG (RVP) writes:

 > - In sys/dev/pad/pad.c, padattach() doesn't seem to be called from 
 > anywhere
 >    and therefore mutex_init() is never called.

 pad is a pseudo device, part of the code calling the attach function
 is autogenerated by config(1) when building a kernel.

 >    `pad_ca' is used in 2 places, but is defined nowhere.

 It is defined by the CFATTACH_ECL2_NEW macro in sys/device.h.


 > - Similarly, in sys/net/if_tap.c, tapattach() is not called by anybody.

 dito.

 When you look into the compile directory after a build, you see the generated
 file ioconf.c.

 The table 'pdevinit[]' lists all the pseudodevice attach routines that
 will be called by config_finalize() in sys/kern/subr_autoconf.c.


 Greetings,



From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Fri, 4 Jun 2021 16:42:54 +0000

 On Fri, Jun 04, 2021 at 08:25:01AM +0000, Martin Husemann wrote:
  >  We had this discussion before, I think there even is an open PR
  >  against tar.
  >  Folks are in eiteher of two camps:
  >  
  >   - tar needs to open the file and extract ACLs from the filedescriptor,
  >     otherwise there would be races.
  >  
  >      -> solution: the kernel should never do state changes (like rewind
  >         tapes or similar) on plain "open" of a device node
  >  
  >   - tar should avoid all this dance when there are no ACLs anyway on the
  >     file system it is traversing. State changes on device open may be a
  >     hack, but they are a very ancient unix hack and quite common.

 Devices with side-effecting open are a thing; they aren't going to go
 away. Volumes with ACLs are also a thing, and might reasonably include
 the root fs, and in fact devices are one of the things you might
 specifically want custom access control for.

 Conclusion: tar (and other things) need to be able to fetch acls
 without open().

 Do you remember where this discussion was, or some search keywords? My
 gnats index is not what it used to be.

  >  There is an option to tar (I forgot which) to not backup ACLs - and then 
  >  everything should be fine.
  >  
  >  IMHO this option should be on by default.

 Silently throwing away your ACLs in your backups isn't the right
 answer either :-(

 -- 
 David A. Holland
 dholland@netbsd.org

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Fri, 4 Jun 2021 18:52:01 +0200

 On Fri, Jun 04, 2021 at 04:45:02PM +0000, David Holland wrote:
 >  Do you remember where this discussion was, or some search keywords? My
 >  gnats index is not what it used to be.

 I was probably thinking of PR 55815

 Martin

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Fri, 4 Jun 2021 16:59:17 +0000

 On Fri, Jun 04, 2021 at 04:55:02PM +0000, Martin Husemann wrote:
  > The following reply was made to PR kern/56232; it has been noted by GNATS.
  > 
  > From: Martin Husemann <martin@duskware.de>
  > To: gnats-bugs@netbsd.org
  > Cc: 
  > Subject: Re: kern/56232: Unstable system with tar on /dev
  > Date: Fri, 4 Jun 2021 18:52:01 +0200
  > 
  >  On Fri, Jun 04, 2021 at 04:45:02PM +0000, David Holland wrote:
  >  >  Do you remember where this discussion was, or some search keywords? My
  >  >  gnats index is not what it used to be.
  >  
  >  I was probably thinking of PR 55815

 thanks, I'll take my noise there, and leave this PR for devices that
 blow up the system when opened, which is not an admissible type of
 side effect :-)

 -- 
 David A. Holland
 dholland@netbsd.org

From: nia <nia@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Fri, 4 Jun 2021 21:35:36 +0000

 On Fri, Jun 04, 2021 at 07:00:02AM +0000, RVP wrote:
 >  No errors reported by the kernel if read perms. are removed from
 >  /dev/pad*, /dev/lua*

 Note that it's possible in HEAD to trigger a kernel assertion failure
 when reading from a pad(4) as an unprivileged user, see
 PR kern/56073

From: RVP <rvp@SDF.ORG>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Sun, 6 Jun 2021 05:54:46 +0000 (UTC)

 OK, I've found out which device is causing the kernel crashes.
 It's iscsi0. Running the program below as root reproducibly
 crashes my system.

 $ sudo ./opentest /dev/iscsi0
 /dev/iscsi0
 [ The system hangs here, then reboots ]

 Another issue found during this test:

 $ sudo find /dev/ -not \( -type f -o -name iscsi0 \) \
  	-exec ./opentest {} + 2>&1 | fgrep -B1 close
 /dev/ipmi0
 opentest: /dev/ipmi0: close failed.: Device not configured
 $

 ipmi(4) should fail on open()--not later on in close(), I think.

 Should I file a separate PR for this, or is leaving it here OK?

 Thanks,
 -RVP

 ----- START CODE -----
 /**
   * sudo ./opentest /dev/iscsi0
   *
   * sudo find /dev -not \( -type f -o -name iscsi0 \) \
   *	-exec ./opentest {} + 2>&1 | fgrep -B1 close
   */
 #include <sys/stat.h>
 #include <err.h>
 #include <fcntl.h>
 #include <stdio.h>
 #include <stdlib.h>
 #include <unistd.h>

 int
 main(int argc, char* argv[])
 {
  	int i, rc = EXIT_SUCCESS;

  	for (i = 1; i < argc; i++) {
  		char* fn = argv[i];
  		struct stat sb;
  		int fd;

  		printf("%s\n", fn);
  		fflush(stdout);
  		fd = open(fn, O_RDONLY | O_NONBLOCK);
  		if (fd == -1) {
  			warn("%s: open failed.", fn);
  			rc = EXIT_FAILURE;
  			continue;
  		}
  		if (fstat(fd, &sb) == -1) {
  			warn("%s: fstat failed.", fn);
  			rc = EXIT_FAILURE;
  		}
  		if (close(fd) == -1) {
  			warn("%s: close failed.", fn);
  			rc = EXIT_FAILURE;
  		}
  	}
  	return rc;
 }
 ----- END CODE -----

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Sun, 6 Jun 2021 07:09:58 -0000 (UTC)

 rvp@SDF.ORG (RVP) writes:

 > ipmi(4) should fail on open()--not later on in close(), I think.
 > Should I file a separate PR for this, or is leaving it here OK?

 The crash happens because there is no read function defined.
 Not sure yet why close fails.


From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Mon, 7 Jun 2021 02:22:22 +0000

 On Sun, Jun 06, 2021 at 05:55:01AM +0000, RVP wrote:
  >  opentest: /dev/ipmi0: close failed.: Device not configured
  >  $
  >  
  >  ipmi(4) should fail on open()--not later on in close(), I think.

 yeah, failing with "Device not configured" is a bit off...

  >  Should I file a separate PR for this, or is leaving it here OK?

 Probably best to. As soon as there's more than one issue in the same
 PR it becomes a nuisance to try to keep track of which ones are and
 aren't fixed.

 -- 
 David A. Holland
 dholland@netbsd.org

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56232: Unstable system with tar on /dev
Date: Mon, 7 Jun 2021 04:27:42 -0000 (UTC)

 dholland-bugs@netbsd.org (David Holland) writes:

 >  >  ipmi(4) should fail on open()--not later on in close(), I think.
 > yeah, failing with "Device not configured" is a bit off...

 Fixed. Now open already fails when no ipmi device exists.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.