NetBSD Problem Report #55816

From martin@aprisoft.de  Sun Nov 22 13:30:08 2020
Return-Path: <martin@aprisoft.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id A3A101A921F
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 22 Nov 2020 13:30:08 +0000 (UTC)
Message-Id: <20201122132958.6F6095CC848@emmas.aprisoft.de>
Date: Sun, 22 Nov 2020 14:29:58 +0100 (CET)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: mdopen() kills the kernel
X-Send-Pr-Version: 3.95

>Number:         55816
>Category:       port-amd64
>Synopsis:       mdopen() kills the kernel
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-amd64-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Nov 22 13:35:00 +0000 2020
>Closed-Date:    Fri Jun 04 17:13:22 +0000 2021
>Last-Modified:  Fri Jun 04 17:13:22 +0000 2021
>Originator:     Martin Husemann
>Release:        NetBSD 9.99.75
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD seven-days-to-the-wolves.aprisoft.de 9.99.75 NetBSD 9.99.75 (GENERIC) #425: Wed Nov 4 15:34:33 CET 2020 martin@seven-days-to-the-wolves.aprisoft.de:/work/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:

I am trying to back up some old shark disk, connected via USB to a -current 
amd64 machine. I have the ffs on that disk mounted readonly on /targetroot.

Now when I do:

 # cd /targetroot && tar cvf - . | gzip -9 > $somewherelse

I get:

[..]
a ./dev/rsd0j
a ./dev/rsd0k
uvm_fault..
fatal page fault..
..
config_devalloc+0x178
config_attach_pseucdo+0x16
mdopen+0x15c
spec_open+0x176
VOP_OPEN+0x3c
vn_open+0x19d
do_open+0x119
do_sys_openat+0x74
...

where it dies on cmpq $0,0(%rcx,%rax,8)
with %rcx = 0 and %rax = ffffffffffff8000

(see also PR 55815, this of course is also a very stupid bug in tar)

>How-To-Repeat:
s/a
>Fix:
n/a

>Release-Note:

>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-amd64/55816: mdopen() kills the kernel
Date: Sun, 22 Nov 2020 14:39:10 +0100

 The /dev to be backed up has:

 crw-r-----  1 root  operator   24,  524289 Mar 22  2017 rsd0j
 crw-r-----  1 root  operator   24,  524290 Mar 22  2017 rsd0k
 crw-r-----  1 root  operator   24,  524291 Mar 22  2017 rsd0l

 which on amd64 maps to some rmd:

 crw-r-----  1 root    operator   24,       3 Oct 12  2019 rmd0
 crw-r-----  1 root    operator   24,       0 Jul 18  2011 rmd0a
 crw-r-----  1 root    operator   24,       3 Jul 18  2011 rmd0d
 crw-r-----  1 root    operator   24,      19 Oct 12  2019 rmd1
 crw-r-----  1 root    operator   24,      16 Jul 18  2011 rmd1a
 crw-r-----  1 root    operator   24,      19 Jul 18  2011 rmd1d

 ... and of course the currently running kernel has no md(4) at all.

 Martin

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-amd64/55816: mdopen() kills the kernel
Date: Sun, 22 Nov 2020 14:39:08 -0000 (UTC)

 martin@duskware.de (Martin Husemann) writes:

 >The following reply was made to PR port-amd64/55816; it has been noted by GNATS.

 >From: Martin Husemann <martin@duskware.de>
 >To: gnats-bugs@netbsd.org
 >Cc: 
 >Subject: Re: port-amd64/55816: mdopen() kills the kernel
 >Date: Sun, 22 Nov 2020 14:39:10 +0100

 > The /dev to be backed up has:
 > 
 > crw-r-----  1 root  operator   24,  524289 Mar 22  2017 rsd0j
 > crw-r-----  1 root  operator   24,  524290 Mar 22  2017 rsd0k
 > crw-r-----  1 root  operator   24,  524291 Mar 22  2017 rsd0l
 > 
 > which on amd64 maps to some rmd:
 > 
 > crw-r-----  1 root    operator   24,       3 Oct 12  2019 rmd0
 > crw-r-----  1 root    operator   24,       0 Jul 18  2011 rmd0a
 > crw-r-----  1 root    operator   24,       3 Jul 18  2011 rmd0d
 > crw-r-----  1 root    operator   24,      19 Oct 12  2019 rmd1
 > crw-r-----  1 root    operator   24,      16 Jul 18  2011 rmd1a
 > crw-r-----  1 root    operator   24,      19 Jul 18  2011 rmd1d
 > 
 > ... and of course the currently running kernel has no md(4) at all.


 The crash occurs when dereferencing cd->cd_devs[unit] with
 a negative unit fetched from cf->cf_unit.

 minor(x) is a 20bit unsigned integer.

 DISKUNIT is minor(x)/MAXPARTITIONS. On amd64 this yields a
 16bit unsigned integer.

 cf_unit is used by autoconf to store this unit number but
 is a 16bit _signed_ integer. A minor of 524289 gives a unit
 of 32768 which is interpreted as -32768.

 The code assumes that any unit number < cd_ndevs is valid
 and dereferences cd_devs.

 Initially cd_ndevs is 0 and cd_devs is NULL (that's the %rcx
 value). But even when an initial array would have been allocated
 the negative unit number would cause havoc.

 If cf_unit is expanded to hold a >20bit signed integer (-1 is
 used as an invalid unit in some places), this might not fail.
 But the allocated array could be too large on some systems.

 -- 
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

From: Christos Zoulas <christos@zoulas.com>
To: gnats-bugs@netbsd.org
Cc: port-amd64-maintainer@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org,
 "martin@netbsd.org" <martin@NetBSD.org>
Subject: Re: port-amd64/55816: mdopen() kills the kernel
Date: Sun, 22 Nov 2020 10:00:48 -0500

 I looked in the whole tree for cf_unit comparisons < 0 or -1 and could =
 not find any.
 I wonder if the following would work.

 christos

 Index: device.h
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
 RCS file: /cvsroot/src/sys/sys/device.h,v
 retrieving revision 1.158
 diff -u -p -u -r1.158 device.h
 --- device.h    3 Oct 2020 22:32:50 -0000       1.158
 +++ device.h    22 Nov 2020 15:00:16 -0000
 @@ -279,8 +279,8 @@ struct cfparent {
  struct cfdata {
         const char *cf_name;            /* driver name */
         const char *cf_atname;          /* attachment name */
 -       short   cf_unit;                /* unit number */
 -       short   cf_fstate;              /* finding state (below) */
 +       unsigned int cf_unit:24;        /* unit number */
 +       unsigned char cf_fstate;        /* finding state (below) */
         int     *cf_loc;                /* locators (machine dependent) =
 */
         int     cf_flags;               /* flags from config */
         const struct cfparent *cf_pspec;/* parent specification */

From: "Christos Zoulas" <christos@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55816 CVS commit: src/sys/sys
Date: Tue, 24 Nov 2020 11:17:04 -0500

 Module Name:	src
 Committed By:	christos
 Date:		Tue Nov 24 16:17:04 UTC 2020

 Modified Files:
 	src/sys/sys: device.h param.h

 Log Message:
 PR/55816: Martin Husemann: widen cfunit to 24 bits so that it fits the
 largest minor number which is 20 bits. Welcome to 2x2x19.


 To generate a diff of this commit:
 cvs rdiff -u -r1.158 -r1.159 src/sys/sys/device.h
 cvs rdiff -u -r1.679 -r1.680 src/sys/sys/param.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Fri, 04 Jun 2021 17:13:22 +0000
State-Changed-Why:
Christos fixed it.
(and while it might be nice to get the fix into -9, it looks like it'll
break the module abi so shouldn't)


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.