NetBSD Problem Report #49709

From www@NetBSD.org  Mon Mar  2 00:06:58 2015
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id D2CEAA5B2E
	for <gnats-bugs@gnats.NetBSD.org>; Mon,  2 Mar 2015 00:06:58 +0000 (UTC)
Message-Id: <20150302000657.4A57BA6558@mollari.NetBSD.org>
Date: Mon,  2 Mar 2015 00:06:57 +0000 (UTC)
From: jdbaker@mylinuxisp.com
Reply-To: jdbaker@mylinuxisp.com
To: gnats-bugs@NetBSD.org
Subject: radeondrmkms panic if "/dev/" is on NFS root
X-Send-Pr-Version: www-1.0

>Number:         49709
>Category:       kern
>Synopsis:       radeondrmkms panic if "/dev/" is on NFS root
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    mrg
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Mar 02 00:10:01 +0000 2015
>Closed-Date:    Tue Mar 10 18:49:46 +0000 2015
>Last-Modified:  Tue Mar 10 18:49:46 +0000 2015
>Originator:     John D. Baker
>Release:        NetBSD/i386-7.99.5, NetBSD/amd64-7.99.5
>Organization:
>Environment:
NetBSD slab 7.99.5 NetBSD 7.99.5 (SLAB_KMS) #7: Sun Mar  1 14:17:39 CST 2015  sysop@skuld.technoskunk.fur:/d0/build/current/obj/i386/sys/arch/i386/compile/SLAB_KMS i386

>Description:
If a diskless client uses radeondrmkms, the kernel will panic during
radeondrmkms attachment if the NFS-resident "/dev" is used.

[...]
drm: initializing kernel modesetting (RV200 0x1002:0x4C58 0x1014:0x0518).
drm: register mmio base: 0xd0100000
drm: register mmio size: 65536
radeon0: info: GTT: 64M 0xE0000000 - 0xE3FFFFFF
radeon0: info: VRAM: 128M 0x00000000E8000000 - 0x00000000EFFFFFFF (64M used)
drm: Detected VRAM RAM=80M, BAR=128M
drm: RAM width 128bits DDR
Zone  kernel: Available graphics memory: 801196 kiB
drm: radeon: 64M of VRAM memory ready
drm: radeon: 64M of GTT memory ready.
radeon0: info: WB disabled
radeon0: info: fence driver on ring 0 use gpu addr 0x00000000e0000000 and cpu addr 0x0xdb4f0000
drm: Supports vblank timestamp caching Rev 2 (21.10.2013).
drm: Driver supports precise vblank timestamp query.
radeon0: interrupting at irq 9 (radeon)
drm: radeon: irq initialized.
drm: Loading R100 microcode
panic: cnopen: no console device
fatal breakpoiknt trap in supervisor mode
trap type 1 code 0 eip c02516b4 cs 8 eflags 246 cr2 bba5ae24 ilevel 0 esp db539ce0
curlwp 0xc37d3d20 pid 2 lid 1 lowest kstack 0xdb5372c0
Stopped in pid 2.1 (init) at    netbsd:breakpoint+0x4:  popl    %ebp
db{0}> bt
breakpoint at netbsd:breakpoint+0x4
vpanic at netbsd:vpanic+0x127
panic at netbsd:panic+0x18
cnopen at at netbsd:cnopen+0x112
cdev_open at netbsd:cdev_open+0xea
spec_open at netbsd:spec_open+0x20e
VOP_OPEN at netbsd:VOP_OPEN+0x58
vn_open at netbsd:vn_open+0x21b
do_open at netbsd:do_open+0xd0
do_sys_openat at netbsd:do_sys_openat+0x75
sys_open at netbsd:sys_open+0x2c
syscall() at netbsd:syscall+0x82
--- syscall (number 5) ---
bba77b07:
db{0}> sh reg
ds          c06f0010    extent_insert_and_optimize.isra.0+0x70
es          10
fs          30
gs          10
edi         db539cfc
esi         c097335b    ostype+0x144be
ebp         db539cbc
ebx         104
edx         1
ecx         0
eax         1
eip         c02516b4
cs          8
eflags      246
esp         db539cbc
ss          10
netbsd:breakpoint+0x4:  popl     %ebp
db{0}>


Also observed on an HP ProLiant DL380 G5 (amd64).  The HP also panicked
in this fashion with "/dev" on local disk.  Output of 'uname -a'
and kernel message excerpt to follow in email addenda.

>How-To-Repeat:
Boot an i386 (amd64) system which uses radeondrmkms as a diskless client
with a fully-populated "/dev" directory on the NFS root.

Observe panic and claim of no console device.
>Fix:
Workaround:  use serial console
Workaround:  delete "/dev/console" from client root on server causing
"/dev" to be populated on boot-time tmpfs.

>Release-Note:

>Audit-Trail:
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: re: kern/49709: radeondrmkms panic if "/dev/" is on NFS root
Date: Mon, 02 Mar 2015 16:29:31 +1100

 > >Synopsis:       radeondrmkms panic if "/dev/" is on NFS root

 actually, it's a timing issue, not NFS root related.

 the problem is that /sbin/init is allowed to run as soon as mountroot
 completes, but before the paused configuration threads complete their
 task and in this case, the radeondrmkms driver has not gotten to
 attaching a wsdisplay0 and taking over the console.

 fortunately, we have working fixes for this, and are just trying to
 figure out which is the best.  one that i've tested that work is
 below.


 .mrg.


 Index: sys/device.h
 ===================================================================
 RCS file: /cvsroot/src/sys/sys/device.h,v
 retrieving revision 1.146
 diff -p -u -r1.146 device.h
 --- sys/device.h	22 Nov 2014 11:04:57 -0000	1.146
 +++ sys/device.h	1 Mar 2015 13:02:53 -0000
 @@ -479,6 +479,7 @@ void	config_create_mountrootthreads(void

  int	config_finalize_register(device_t, int (*)(device_t));
  void	config_finalize(void);
 +void	config_finalize_mountroot(void);

  void	config_twiddle_init(void);
  void	config_twiddle_fn(void *);
 Index: kern/init_main.c
 ===================================================================
 RCS file: /cvsroot/src/sys/kern/init_main.c,v
 retrieving revision 1.461
 diff -p -u -r1.461 init_main.c
 --- kern/init_main.c	27 Nov 2014 14:38:09 -0000	1.461
 +++ kern/init_main.c	1 Mar 2015 13:02:53 -0000
 @@ -712,6 +712,9 @@ main(void)
  	    uvm_aiodone_worker, NULL, PRI_VM, IPL_NONE, WQ_MPSAFE))
  		panic("fork aiodoned");

 +	/* Wait for final configure threads to complete. */
 +	config_finalize_mountroot();
 +
  	/*
  	 * Okay, now we can let init(8) exec!  It's off to userland!
  	 */
 Index: kern/subr_autoconf.c
 ===================================================================
 RCS file: /cvsroot/src/sys/kern/subr_autoconf.c,v
 retrieving revision 1.233
 diff -p -u -r1.233 subr_autoconf.c
 --- kern/subr_autoconf.c	6 Nov 2014 08:46:04 -0000	1.233
 +++ kern/subr_autoconf.c	1 Mar 2015 13:02:53 -0000
 @@ -202,6 +202,7 @@ int interrupt_config_threads = 8;
  struct deferred_config_head mountroot_config_queue =
  	TAILQ_HEAD_INITIALIZER(mountroot_config_queue);
  int mountroot_config_threads = 2;
 +lwp_t **mountroot_config_lwpids;
  static bool root_is_mounted = false;

  static void config_process_deferred(struct deferred_config_head *, device_t);
 @@ -481,9 +482,32 @@ config_create_mountrootthreads(void)
  	if (!root_is_mounted)
  		root_is_mounted = true;

 +	mountroot_config_lwpids = kmem_alloc(sizeof(mountroot_config_lwpids) *
 +					     mountroot_config_threads,
 +					     KM_NOSLEEP);
 +	KASSERT(mountroot_config_lwpids);
  	for (i = 0; i < mountroot_config_threads; i++) {
 -		(void)kthread_create(PRI_NONE, 0, NULL,
 -		    config_mountroot_thread, NULL, NULL, "configroot");
 +		mountroot_config_lwpids[i] = 0;
 +		(void)kthread_create(PRI_NONE, KTHREAD_MUSTJOIN, NULL,
 +				     config_mountroot_thread, NULL,
 +				     &mountroot_config_lwpids[i],
 +				     "configroot");
 +	}
 +}
 +
 +void
 +config_finalize_mountroot(void)
 +{
 +	int i, error;
 +
 +	for (i = 0; i < mountroot_config_threads; i++) {
 +		if (mountroot_config_lwpids[i] == 0)
 +			continue;
 +
 +		error = kthread_join(mountroot_config_lwpids[i]);
 +		if (error)
 +			printf("%s: thread %x joined with error %d\n",
 +			       __func__, i, error);
  	}
  }


From: "John D. Baker" <jdbaker@mylinuxisp.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: re: kern/49709: radeondrmkms panic if "/dev/" is on NFS root
Date: Tue, 3 Mar 2015 21:39:38 -0600 (CST)

 On Mon, 2 Mar 2015, matthew green wrote:

 >  actually, it's a timing issue, not NFS root related.
 >  
 >  the problem is that /sbin/init is allowed to run as soon as mountroot
 >  completes, but before the paused configuration threads complete their
 >  task and in this case, the radeondrmkms driver has not gotten to
 >  attaching a wsdisplay0 and taking over the console.

 This would explain why the HP ProLiant DL380 G5 had the same problem
 even on local disk.

 >  fortunately, we have working fixes for this, and are just trying to
 >  figure out which is the best.  one that i've tested that work is
 >  below.

 I've applied this patch and it works for me as well (at least on the
 ThinkPad A31p--will try on the other machines where I observed this soon.

 I hope you come to a decision about which way to fix the issue soon.

 Thanks.

 -- 
 |/"\ John D. Baker, KN5UKS               NetBSD     Darwin/MacOS X
 |\ / jdbaker[snail]mylinuxisp[flyspeck]com    OpenBSD            FreeBSD
 | X  No HTML/proprietary data in email.   BSD just sits there and works!
 |/ \ GPGkeyID:  D703 4A7E 479F 63F8 D3F4  BD99 9572 8F23 E4AD 1645

State-Changed-From-To: open->pending-pullups
State-Changed-By: mrg@NetBSD.org
State-Changed-When: Fri, 06 Mar 2015 09:28:40 +0000
State-Changed-Why:
a fix has been commited to -current.


Responsible-Changed-From-To: kern-bug-people->mrg
Responsible-Changed-By: mrg@NetBSD.org
Responsible-Changed-When: Fri, 06 Mar 2015 09:28:50 +0000
Responsible-Changed-Why:
i fixed this.


From: "matthew green" <mrg@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/49709 CVS commit: src/sys
Date: Fri, 6 Mar 2015 09:28:16 +0000

 Module Name:	src
 Committed By:	mrg
 Date:		Fri Mar  6 09:28:16 UTC 2015

 Modified Files:
 	src/sys/kern: init_main.c subr_autoconf.c
 	src/sys/sys: device.h

 Log Message:
 wait for config_mountroot threads to complete before we tell init it
 can start up.  this solves the problem where a console device needs
 mountroot to complete attaching, and must create wsdisplay0 before
 init tries to open /dev/console.  fixes PR#49709.

 XXX: pullup-7


 To generate a diff of this commit:
 cvs rdiff -u -r1.461 -r1.462 src/sys/kern/init_main.c
 cvs rdiff -u -r1.233 -r1.234 src/sys/kern/subr_autoconf.c
 cvs rdiff -u -r1.146 -r1.147 src/sys/sys/device.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Soren Jacobsen" <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/49709 CVS commit: [netbsd-7] src/sys
Date: Mon, 9 Mar 2015 08:56:02 +0000

 Module Name:	src
 Committed By:	snj
 Date:		Mon Mar  9 08:56:02 UTC 2015

 Modified Files:
 	src/sys/kern [netbsd-7]: init_main.c subr_autoconf.c
 	src/sys/sys [netbsd-7]: device.h

 Log Message:
 Pull up following revision(s) (requested by mrg in ticket #576):
 	sys/kern/init_main.c: revision 1.462
 	sys/kern/subr_autoconf.c: revision 1.234
 	sys/sys/device.h: revision 1.147
 wait for config_mountroot threads to complete before we tell init it
 can start up.  this solves the problem where a console device needs
 mountroot to complete attaching, and must create wsdisplay0 before
 init tries to open /dev/console.  fixes PR#49709.
 XXX: pullup-7


 To generate a diff of this commit:
 cvs rdiff -u -r1.458.2.1 -r1.458.2.2 src/sys/kern/init_main.c
 cvs rdiff -u -r1.231 -r1.231.2.1 src/sys/kern/subr_autoconf.c
 cvs rdiff -u -r1.144 -r1.144.4.1 src/sys/sys/device.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: mrg@NetBSD.org
State-Changed-When: Tue, 10 Mar 2015 18:49:46 +0000
State-Changed-Why:
all done.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.