NetBSD Problem Report #55816
From martin@aprisoft.de Sun Nov 22 13:30:08 2020
Return-Path: <martin@aprisoft.de>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id A3A101A921F
for <gnats-bugs@gnats.NetBSD.org>; Sun, 22 Nov 2020 13:30:08 +0000 (UTC)
Message-Id: <20201122132958.6F6095CC848@emmas.aprisoft.de>
Date: Sun, 22 Nov 2020 14:29:58 +0100 (CET)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: mdopen() kills the kernel
X-Send-Pr-Version: 3.95
>Number: 55816
>Category: port-amd64
>Synopsis: mdopen() kills the kernel
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: port-amd64-maintainer
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Nov 22 13:35:00 +0000 2020
>Closed-Date: Fri Jun 04 17:13:22 +0000 2021
>Last-Modified: Fri Jun 04 17:13:22 +0000 2021
>Originator: Martin Husemann
>Release: NetBSD 9.99.75
>Organization:
The NetBSD Foundation, Inc.
>Environment:
System: NetBSD seven-days-to-the-wolves.aprisoft.de 9.99.75 NetBSD 9.99.75 (GENERIC) #425: Wed Nov 4 15:34:33 CET 2020 martin@seven-days-to-the-wolves.aprisoft.de:/work/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
I am trying to back up some old shark disk, connected via USB to a -current
amd64 machine. I have the ffs on that disk mounted readonly on /targetroot.
Now when I do:
# cd /targetroot && tar cvf - . | gzip -9 > $somewherelse
I get:
[..]
a ./dev/rsd0j
a ./dev/rsd0k
uvm_fault..
fatal page fault..
..
config_devalloc+0x178
config_attach_pseucdo+0x16
mdopen+0x15c
spec_open+0x176
VOP_OPEN+0x3c
vn_open+0x19d
do_open+0x119
do_sys_openat+0x74
...
where it dies on cmpq $0,0(%rcx,%rax,8)
with %rcx = 0 and %rax = ffffffffffff8000
(see also PR 55815, this of course is also a very stupid bug in tar)
>How-To-Repeat:
s/a
>Fix:
n/a
>Release-Note:
>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-amd64/55816: mdopen() kills the kernel
Date: Sun, 22 Nov 2020 14:39:10 +0100
The /dev to be backed up has:
crw-r----- 1 root operator 24, 524289 Mar 22 2017 rsd0j
crw-r----- 1 root operator 24, 524290 Mar 22 2017 rsd0k
crw-r----- 1 root operator 24, 524291 Mar 22 2017 rsd0l
which on amd64 maps to some rmd:
crw-r----- 1 root operator 24, 3 Oct 12 2019 rmd0
crw-r----- 1 root operator 24, 0 Jul 18 2011 rmd0a
crw-r----- 1 root operator 24, 3 Jul 18 2011 rmd0d
crw-r----- 1 root operator 24, 19 Oct 12 2019 rmd1
crw-r----- 1 root operator 24, 16 Jul 18 2011 rmd1a
crw-r----- 1 root operator 24, 19 Jul 18 2011 rmd1d
... and of course the currently running kernel has no md(4) at all.
Martin
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-amd64/55816: mdopen() kills the kernel
Date: Sun, 22 Nov 2020 14:39:08 -0000 (UTC)
martin@duskware.de (Martin Husemann) writes:
>The following reply was made to PR port-amd64/55816; it has been noted by GNATS.
>From: Martin Husemann <martin@duskware.de>
>To: gnats-bugs@netbsd.org
>Cc:
>Subject: Re: port-amd64/55816: mdopen() kills the kernel
>Date: Sun, 22 Nov 2020 14:39:10 +0100
> The /dev to be backed up has:
>
> crw-r----- 1 root operator 24, 524289 Mar 22 2017 rsd0j
> crw-r----- 1 root operator 24, 524290 Mar 22 2017 rsd0k
> crw-r----- 1 root operator 24, 524291 Mar 22 2017 rsd0l
>
> which on amd64 maps to some rmd:
>
> crw-r----- 1 root operator 24, 3 Oct 12 2019 rmd0
> crw-r----- 1 root operator 24, 0 Jul 18 2011 rmd0a
> crw-r----- 1 root operator 24, 3 Jul 18 2011 rmd0d
> crw-r----- 1 root operator 24, 19 Oct 12 2019 rmd1
> crw-r----- 1 root operator 24, 16 Jul 18 2011 rmd1a
> crw-r----- 1 root operator 24, 19 Jul 18 2011 rmd1d
>
> ... and of course the currently running kernel has no md(4) at all.
The crash occurs when dereferencing cd->cd_devs[unit] with
a negative unit fetched from cf->cf_unit.
minor(x) is a 20bit unsigned integer.
DISKUNIT is minor(x)/MAXPARTITIONS. On amd64 this yields a
16bit unsigned integer.
cf_unit is used by autoconf to store this unit number but
is a 16bit _signed_ integer. A minor of 524289 gives a unit
of 32768 which is interpreted as -32768.
The code assumes that any unit number < cd_ndevs is valid
and dereferences cd_devs.
Initially cd_ndevs is 0 and cd_devs is NULL (that's the %rcx
value). But even when an initial array would have been allocated
the negative unit number would cause havoc.
If cf_unit is expanded to hold a >20bit signed integer (-1 is
used as an invalid unit in some places), this might not fail.
But the allocated array could be too large on some systems.
--
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: Christos Zoulas <christos@zoulas.com>
To: gnats-bugs@netbsd.org
Cc: port-amd64-maintainer@netbsd.org,
gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org,
"martin@netbsd.org" <martin@NetBSD.org>
Subject: Re: port-amd64/55816: mdopen() kills the kernel
Date: Sun, 22 Nov 2020 10:00:48 -0500
I looked in the whole tree for cf_unit comparisons < 0 or -1 and could =
not find any.
I wonder if the following would work.
christos
Index: device.h
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
RCS file: /cvsroot/src/sys/sys/device.h,v
retrieving revision 1.158
diff -u -p -u -r1.158 device.h
--- device.h 3 Oct 2020 22:32:50 -0000 1.158
+++ device.h 22 Nov 2020 15:00:16 -0000
@@ -279,8 +279,8 @@ struct cfparent {
struct cfdata {
const char *cf_name; /* driver name */
const char *cf_atname; /* attachment name */
- short cf_unit; /* unit number */
- short cf_fstate; /* finding state (below) */
+ unsigned int cf_unit:24; /* unit number */
+ unsigned char cf_fstate; /* finding state (below) */
int *cf_loc; /* locators (machine dependent) =
*/
int cf_flags; /* flags from config */
const struct cfparent *cf_pspec;/* parent specification */
From: "Christos Zoulas" <christos@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55816 CVS commit: src/sys/sys
Date: Tue, 24 Nov 2020 11:17:04 -0500
Module Name: src
Committed By: christos
Date: Tue Nov 24 16:17:04 UTC 2020
Modified Files:
src/sys/sys: device.h param.h
Log Message:
PR/55816: Martin Husemann: widen cfunit to 24 bits so that it fits the
largest minor number which is 20 bits. Welcome to 2x2x19.
To generate a diff of this commit:
cvs rdiff -u -r1.158 -r1.159 src/sys/sys/device.h
cvs rdiff -u -r1.679 -r1.680 src/sys/sys/param.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Fri, 04 Jun 2021 17:13:22 +0000
State-Changed-Why:
Christos fixed it.
(and while it might be nice to get the fix into -9, it looks like it'll
break the module abi so shouldn't)
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.