NetBSD Problem Report #49709
From www@NetBSD.org Mon Mar 2 00:06:58 2015
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id D2CEAA5B2E
for <gnats-bugs@gnats.NetBSD.org>; Mon, 2 Mar 2015 00:06:58 +0000 (UTC)
Message-Id: <20150302000657.4A57BA6558@mollari.NetBSD.org>
Date: Mon, 2 Mar 2015 00:06:57 +0000 (UTC)
From: jdbaker@mylinuxisp.com
Reply-To: jdbaker@mylinuxisp.com
To: gnats-bugs@NetBSD.org
Subject: radeondrmkms panic if "/dev/" is on NFS root
X-Send-Pr-Version: www-1.0
>Number: 49709
>Category: kern
>Synopsis: radeondrmkms panic if "/dev/" is on NFS root
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: mrg
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Mar 02 00:10:01 +0000 2015
>Closed-Date: Tue Mar 10 18:49:46 +0000 2015
>Last-Modified: Tue Mar 10 18:49:46 +0000 2015
>Originator: John D. Baker
>Release: NetBSD/i386-7.99.5, NetBSD/amd64-7.99.5
>Organization:
>Environment:
NetBSD slab 7.99.5 NetBSD 7.99.5 (SLAB_KMS) #7: Sun Mar 1 14:17:39 CST 2015 sysop@skuld.technoskunk.fur:/d0/build/current/obj/i386/sys/arch/i386/compile/SLAB_KMS i386
>Description:
If a diskless client uses radeondrmkms, the kernel will panic during
radeondrmkms attachment if the NFS-resident "/dev" is used.
[...]
drm: initializing kernel modesetting (RV200 0x1002:0x4C58 0x1014:0x0518).
drm: register mmio base: 0xd0100000
drm: register mmio size: 65536
radeon0: info: GTT: 64M 0xE0000000 - 0xE3FFFFFF
radeon0: info: VRAM: 128M 0x00000000E8000000 - 0x00000000EFFFFFFF (64M used)
drm: Detected VRAM RAM=80M, BAR=128M
drm: RAM width 128bits DDR
Zone kernel: Available graphics memory: 801196 kiB
drm: radeon: 64M of VRAM memory ready
drm: radeon: 64M of GTT memory ready.
radeon0: info: WB disabled
radeon0: info: fence driver on ring 0 use gpu addr 0x00000000e0000000 and cpu addr 0x0xdb4f0000
drm: Supports vblank timestamp caching Rev 2 (21.10.2013).
drm: Driver supports precise vblank timestamp query.
radeon0: interrupting at irq 9 (radeon)
drm: radeon: irq initialized.
drm: Loading R100 microcode
panic: cnopen: no console device
fatal breakpoiknt trap in supervisor mode
trap type 1 code 0 eip c02516b4 cs 8 eflags 246 cr2 bba5ae24 ilevel 0 esp db539ce0
curlwp 0xc37d3d20 pid 2 lid 1 lowest kstack 0xdb5372c0
Stopped in pid 2.1 (init) at netbsd:breakpoint+0x4: popl %ebp
db{0}> bt
breakpoint at netbsd:breakpoint+0x4
vpanic at netbsd:vpanic+0x127
panic at netbsd:panic+0x18
cnopen at at netbsd:cnopen+0x112
cdev_open at netbsd:cdev_open+0xea
spec_open at netbsd:spec_open+0x20e
VOP_OPEN at netbsd:VOP_OPEN+0x58
vn_open at netbsd:vn_open+0x21b
do_open at netbsd:do_open+0xd0
do_sys_openat at netbsd:do_sys_openat+0x75
sys_open at netbsd:sys_open+0x2c
syscall() at netbsd:syscall+0x82
--- syscall (number 5) ---
bba77b07:
db{0}> sh reg
ds c06f0010 extent_insert_and_optimize.isra.0+0x70
es 10
fs 30
gs 10
edi db539cfc
esi c097335b ostype+0x144be
ebp db539cbc
ebx 104
edx 1
ecx 0
eax 1
eip c02516b4
cs 8
eflags 246
esp db539cbc
ss 10
netbsd:breakpoint+0x4: popl %ebp
db{0}>
Also observed on an HP ProLiant DL380 G5 (amd64). The HP also panicked
in this fashion with "/dev" on local disk. Output of 'uname -a'
and kernel message excerpt to follow in email addenda.
>How-To-Repeat:
Boot an i386 (amd64) system which uses radeondrmkms as a diskless client
with a fully-populated "/dev" directory on the NFS root.
Observe panic and claim of no console device.
>Fix:
Workaround: use serial console
Workaround: delete "/dev/console" from client root on server causing
"/dev" to be populated on boot-time tmpfs.
>Release-Note:
>Audit-Trail:
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: re: kern/49709: radeondrmkms panic if "/dev/" is on NFS root
Date: Mon, 02 Mar 2015 16:29:31 +1100
> >Synopsis: radeondrmkms panic if "/dev/" is on NFS root
actually, it's a timing issue, not NFS root related.
the problem is that /sbin/init is allowed to run as soon as mountroot
completes, but before the paused configuration threads complete their
task and in this case, the radeondrmkms driver has not gotten to
attaching a wsdisplay0 and taking over the console.
fortunately, we have working fixes for this, and are just trying to
figure out which is the best. one that i've tested that work is
below.
.mrg.
Index: sys/device.h
===================================================================
RCS file: /cvsroot/src/sys/sys/device.h,v
retrieving revision 1.146
diff -p -u -r1.146 device.h
--- sys/device.h 22 Nov 2014 11:04:57 -0000 1.146
+++ sys/device.h 1 Mar 2015 13:02:53 -0000
@@ -479,6 +479,7 @@ void config_create_mountrootthreads(void
int config_finalize_register(device_t, int (*)(device_t));
void config_finalize(void);
+void config_finalize_mountroot(void);
void config_twiddle_init(void);
void config_twiddle_fn(void *);
Index: kern/init_main.c
===================================================================
RCS file: /cvsroot/src/sys/kern/init_main.c,v
retrieving revision 1.461
diff -p -u -r1.461 init_main.c
--- kern/init_main.c 27 Nov 2014 14:38:09 -0000 1.461
+++ kern/init_main.c 1 Mar 2015 13:02:53 -0000
@@ -712,6 +712,9 @@ main(void)
uvm_aiodone_worker, NULL, PRI_VM, IPL_NONE, WQ_MPSAFE))
panic("fork aiodoned");
+ /* Wait for final configure threads to complete. */
+ config_finalize_mountroot();
+
/*
* Okay, now we can let init(8) exec! It's off to userland!
*/
Index: kern/subr_autoconf.c
===================================================================
RCS file: /cvsroot/src/sys/kern/subr_autoconf.c,v
retrieving revision 1.233
diff -p -u -r1.233 subr_autoconf.c
--- kern/subr_autoconf.c 6 Nov 2014 08:46:04 -0000 1.233
+++ kern/subr_autoconf.c 1 Mar 2015 13:02:53 -0000
@@ -202,6 +202,7 @@ int interrupt_config_threads = 8;
struct deferred_config_head mountroot_config_queue =
TAILQ_HEAD_INITIALIZER(mountroot_config_queue);
int mountroot_config_threads = 2;
+lwp_t **mountroot_config_lwpids;
static bool root_is_mounted = false;
static void config_process_deferred(struct deferred_config_head *, device_t);
@@ -481,9 +482,32 @@ config_create_mountrootthreads(void)
if (!root_is_mounted)
root_is_mounted = true;
+ mountroot_config_lwpids = kmem_alloc(sizeof(mountroot_config_lwpids) *
+ mountroot_config_threads,
+ KM_NOSLEEP);
+ KASSERT(mountroot_config_lwpids);
for (i = 0; i < mountroot_config_threads; i++) {
- (void)kthread_create(PRI_NONE, 0, NULL,
- config_mountroot_thread, NULL, NULL, "configroot");
+ mountroot_config_lwpids[i] = 0;
+ (void)kthread_create(PRI_NONE, KTHREAD_MUSTJOIN, NULL,
+ config_mountroot_thread, NULL,
+ &mountroot_config_lwpids[i],
+ "configroot");
+ }
+}
+
+void
+config_finalize_mountroot(void)
+{
+ int i, error;
+
+ for (i = 0; i < mountroot_config_threads; i++) {
+ if (mountroot_config_lwpids[i] == 0)
+ continue;
+
+ error = kthread_join(mountroot_config_lwpids[i]);
+ if (error)
+ printf("%s: thread %x joined with error %d\n",
+ __func__, i, error);
}
}
From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: re: kern/49709: radeondrmkms panic if "/dev/" is on NFS root
Date: Tue, 3 Mar 2015 21:39:38 -0600 (CST)
On Mon, 2 Mar 2015, matthew green wrote:
> actually, it's a timing issue, not NFS root related.
>
> the problem is that /sbin/init is allowed to run as soon as mountroot
> completes, but before the paused configuration threads complete their
> task and in this case, the radeondrmkms driver has not gotten to
> attaching a wsdisplay0 and taking over the console.
This would explain why the HP ProLiant DL380 G5 had the same problem
even on local disk.
> fortunately, we have working fixes for this, and are just trying to
> figure out which is the best. one that i've tested that work is
> below.
I've applied this patch and it works for me as well (at least on the
ThinkPad A31p--will try on the other machines where I observed this soon.
I hope you come to a decision about which way to fix the issue soon.
Thanks.
--
|/"\ John D. Baker, KN5UKS NetBSD Darwin/MacOS X
|\ / jdbaker[snail]mylinuxisp[flyspeck]com OpenBSD FreeBSD
| X No HTML/proprietary data in email. BSD just sits there and works!
|/ \ GPGkeyID: D703 4A7E 479F 63F8 D3F4 BD99 9572 8F23 E4AD 1645
State-Changed-From-To: open->pending-pullups
State-Changed-By: mrg@NetBSD.org
State-Changed-When: Fri, 06 Mar 2015 09:28:40 +0000
State-Changed-Why:
a fix has been commited to -current.
Responsible-Changed-From-To: kern-bug-people->mrg
Responsible-Changed-By: mrg@NetBSD.org
Responsible-Changed-When: Fri, 06 Mar 2015 09:28:50 +0000
Responsible-Changed-Why:
i fixed this.
From: "matthew green" <mrg@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/49709 CVS commit: src/sys
Date: Fri, 6 Mar 2015 09:28:16 +0000
Module Name: src
Committed By: mrg
Date: Fri Mar 6 09:28:16 UTC 2015
Modified Files:
src/sys/kern: init_main.c subr_autoconf.c
src/sys/sys: device.h
Log Message:
wait for config_mountroot threads to complete before we tell init it
can start up. this solves the problem where a console device needs
mountroot to complete attaching, and must create wsdisplay0 before
init tries to open /dev/console. fixes PR#49709.
XXX: pullup-7
To generate a diff of this commit:
cvs rdiff -u -r1.461 -r1.462 src/sys/kern/init_main.c
cvs rdiff -u -r1.233 -r1.234 src/sys/kern/subr_autoconf.c
cvs rdiff -u -r1.146 -r1.147 src/sys/sys/device.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Soren Jacobsen" <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/49709 CVS commit: [netbsd-7] src/sys
Date: Mon, 9 Mar 2015 08:56:02 +0000
Module Name: src
Committed By: snj
Date: Mon Mar 9 08:56:02 UTC 2015
Modified Files:
src/sys/kern [netbsd-7]: init_main.c subr_autoconf.c
src/sys/sys [netbsd-7]: device.h
Log Message:
Pull up following revision(s) (requested by mrg in ticket #576):
sys/kern/init_main.c: revision 1.462
sys/kern/subr_autoconf.c: revision 1.234
sys/sys/device.h: revision 1.147
wait for config_mountroot threads to complete before we tell init it
can start up. this solves the problem where a console device needs
mountroot to complete attaching, and must create wsdisplay0 before
init tries to open /dev/console. fixes PR#49709.
XXX: pullup-7
To generate a diff of this commit:
cvs rdiff -u -r1.458.2.1 -r1.458.2.2 src/sys/kern/init_main.c
cvs rdiff -u -r1.231 -r1.231.2.1 src/sys/kern/subr_autoconf.c
cvs rdiff -u -r1.144 -r1.144.4.1 src/sys/sys/device.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: mrg@NetBSD.org
State-Changed-When: Tue, 10 Mar 2015 18:49:46 +0000
State-Changed-Why:
all done.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.