NetBSD Problem Report #38425

From simonb@thistledown.com.au  Mon Apr 14 04:35:50 2008
Return-Path: <simonb@thistledown.com.au>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id BF21663B898
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 14 Apr 2008 04:35:50 +0000 (UTC)
Message-Id: <20080414043548.81A48AFD03@thoreau.thistledown.com.au>
Date: Mon, 14 Apr 2008 14:35:48 +1000 (EST)
From: Simon Burge <simonb@NetBSD.org>
Reply-To: Simon Burge <simonb@NetBSD.org>
To: gnats-bugs@gnats.NetBSD.org
Subject: savecore confused if getbootfile(3) fails
X-Send-Pr-Version: 3.95

>Number:         38425
>Category:       misc
>Synopsis:       savecore confused if getbootfile(3) fails
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    misc-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Apr 14 04:40:00 +0000 2008
>Closed-Date:    Sat Mar 21 19:35:02 +0000 2009
>Last-Modified:  Mon Jun 15 12:25:01 +0000 2009
>Originator:     Simon Burge
>Release:        NetBSD current, any time after 2008/01/15
>Organization:
>Environment:
  Architecture: all ?
  Machine: all ?
>Description:
	If you try to boot a non-root-owned kernel that isn't called
	'netbsd', getbootfile() fails and savecore will try to use
	/netbsd anyway.  The KREAD in kmem_setup() that fetches the
	'dumpcdev' value will then read some random garbage value.  In
	my case, that reads a "0".  find_dev() then searches for dev_t
	0, returns "/dev/console" and any further operations that
	try to read from "kernel" memory now block waiting for console
	input (!!).

>How-To-Repeat:
	Boot from a non-root-owned kernel that isn't called 'netbsd'

>Fix:
	Not sure..  Better error checking in kmem_setup()?

>Release-Note:

>Audit-Trail:
From: Joerg Sonnenberger <joerg@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/38425 CVS commit: src/sbin/savecore
Date: Thu,  9 Oct 2008 13:59:51 +0000 (UTC)

 Module Name:	src
 Committed By:	joerg
 Date:		Thu Oct  9 13:59:51 UTC 2008

 Modified Files:
 	src/sbin/savecore: savecore.c

 Log Message:
 Explicitly check that the dump device is not the console, a tty or pty.
 While the list is adhoc, the problems reported are always with
 /dev/console. Adresses PR 38425 and similiar issues with Xen.


 To generate a diff of this commit:
 cvs rdiff -r1.74 -r1.75 src/sbin/savecore/savecore.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sun, 22 Feb 2009 08:09:40 +0000
State-Changed-Why:
Fixed?


From: Simon Burge <simonb@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: misc-bug-people@netbsd.org, netbsd-bugs@netbsd.org,
    gnats-admin@netbsd.org, dholland@NetBSD.org
Subject: Re: misc/38425 (savecore confused if getbootfile(3) fails) 
Date: Sun, 22 Feb 2009 20:05:14 +1100

 dholland@NetBSD.org wrote:

 > Synopsis: savecore confused if getbootfile(3) fails
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: dholland@NetBSD.org
 > State-Changed-When: Sun, 22 Feb 2009 08:09:40 +0000
 > State-Changed-Why:
 > Fixed?

 You mean rev 1.75 of savecore.c?  That seems a bit hackish.  From my
 reading, it just picks a few random device classes to ignore when you
 read an incorrect value.  The /dev/console reference is that the random
 read from /netbsd now seems to be most likely a "0", but it could be any
 other valid device number.  It doesn't say protect you from reading a
 tape drive for example, or other devices where a random read might not
 be either safe or at least have random consequences.

 savecore is "safer" for my particular problem, but I don't think it's
 "fixed" yet..

 Simon.

From: Andrew Doran <ad@netbsd.org>
To: Simon Burge <simonb@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, tls@netbsd.org
Subject: Re: misc/38425 (savecore confused if getbootfile(3) fails)
Date: Sun, 22 Feb 2009 09:18:18 +0000

 On Sun, Feb 22, 2009 at 08:05:14PM +1100, Simon Burge wrote:

 > dholland@NetBSD.org wrote:
 > 
 > > Synopsis: savecore confused if getbootfile(3) fails
 > > 
 > > State-Changed-From-To: open->feedback
 > > State-Changed-By: dholland@NetBSD.org
 > > State-Changed-When: Sun, 22 Feb 2009 08:09:40 +0000
 > > State-Changed-Why:
 > > Fixed?
 > 
 > You mean rev 1.75 of savecore.c?  That seems a bit hackish.  From my
 > reading, it just picks a few random device classes to ignore when you
 > read an incorrect value.  The /dev/console reference is that the random
 > read from /netbsd now seems to be most likely a "0", but it could be any
 > other valid device number.  It doesn't say protect you from reading a
 > tape drive for example, or other devices where a random read might not
 > be either safe or at least have random consequences.
 > 
 > savecore is "safer" for my particular problem, but I don't think it's
 > "fixed" yet..

 Making it use /dev/ksyms like Thor proposed would improve the situation..

 Andrew

State-Changed-From-To: feedback->open
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sun, 22 Feb 2009 10:03:38 +0000
State-Changed-Why:
No, not fixed, only hacked. The PR should stay open until someone does
something more solid.


From: Thor Lancelot Simon <tls@rek.tjls.com>
To: Andrew Doran <ad@netbsd.org>
Cc: Simon Burge <simonb@NetBSD.org>, gnats-bugs@NetBSD.org, tls@netbsd.org
Subject: Re: misc/38425 (savecore confused if getbootfile(3) fails)
Date: Fri, 20 Mar 2009 12:19:03 -0400

 On Sun, Feb 22, 2009 at 09:18:18AM +0000, Andrew Doran wrote:
 > 
 > Making it use /dev/ksyms like Thor proposed would improve the situation..

 Try the below - savecore had a stupid bug (it should probably not feed
 a filename to libkvm at all, but it certainly should not feed the wrong
 one) and libkvm had all the right code but the wrong default.

 On Christos' suggestion I have not changed getbootfile() as I originally
 planned.  I am sorely tempted however to remove it from the system as it
 seems to only be used wrong -- literally everywhere it is used, what is
 really wanted is "where do I get the kernel symbols", not "what name
 was supplied to the boot loader".

 Thor

 Index: sbin/savecore/savecore.c
 ===================================================================
 RCS file: /cvsroot/src/sbin/savecore/savecore.c,v
 retrieving revision 1.76
 diff -u -r1.76 savecore.c
 --- sbin/savecore/savecore.c	20 Oct 2008 10:34:54 -0000	1.76
 +++ sbin/savecore/savecore.c	20 Mar 2009 16:09:56 -0000
 @@ -229,7 +229,11 @@
  		dirname = argv[0];

  	if (kernel == NULL) {
 -		kernel = getbootfile();
 +		if (access(_PATH_KSYMS, R_OK) == 0) {
 +			kernel = _PATH_KSYMS;
 +		} else {
 +			kernel = getbootfile();
 +		}
  	}

  	(void)time(&now);
 Index: lib/libkvm/kvm.c
 ===================================================================
 RCS file: /cvsroot/src/lib/libkvm/kvm.c,v
 retrieving revision 1.92
 diff -u -r1.92 kvm.c
 --- lib/libkvm/kvm.c	15 Jan 2008 14:16:30 -0000	1.92
 +++ lib/libkvm/kvm.c	20 Mar 2009 16:09:57 -0000
 @@ -288,6 +288,10 @@

  	ufgiven = (uf != NULL);
  	if (!ufgiven) {
 +		if (access(_PATH_KSYMS, R_OK) == 0) {
 +			uf = _PATH_KSYMS;
 +			goto got_symfile;
 +		}
  #ifdef CPU_BOOTED_KERNEL
  		/* 130 is 128 + '/' + '\0' */
  		static char booted_kernel[130];
 @@ -315,6 +319,7 @@
  		_kvm_err(kd, kd->program, "exec file name too long");
  		goto failed;
  	}
 +got_symfile:
  	if (flag & ~O_RDWR) {
  		_kvm_err(kd, kd->program, "bad flags arg");
  		goto failed;

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, misc-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	Simon Burge <simonb@NetBSD.org>
Cc: 
Subject: Re: misc/38425 (savecore confused if getbootfile(3) fails)
Date: Fri, 20 Mar 2009 14:00:36 -0400

 On Mar 20,  4:20pm, tls@rek.tjls.com (Thor Lancelot Simon) wrote:
 -- Subject: Re: misc/38425 (savecore confused if getbootfile(3) fails)

 | The following reply was made to PR misc/38425; it has been noted by GNATS.
 | 
 | From: Thor Lancelot Simon <tls@rek.tjls.com>
 | To: Andrew Doran <ad@netbsd.org>
 | Cc: Simon Burge <simonb@NetBSD.org>, gnats-bugs@NetBSD.org, tls@netbsd.org
 | Subject: Re: misc/38425 (savecore confused if getbootfile(3) fails)
 | Date: Fri, 20 Mar 2009 12:19:03 -0400
 | 
 |  On Sun, Feb 22, 2009 at 09:18:18AM +0000, Andrew Doran wrote:
 |  > 
 |  > Making it use /dev/ksyms like Thor proposed would improve the situation..
 |  
 |  Try the below - savecore had a stupid bug (it should probably not feed
 |  a filename to libkvm at all, but it certainly should not feed the wrong
 |  one) and libkvm had all the right code but the wrong default.
 |  
 |  On Christos' suggestion I have not changed getbootfile() as I originally
 |  planned.  I am sorely tempted however to remove it from the system as it
 |  seems to only be used wrong -- literally everywhere it is used, what is
 |  really wanted is "where do I get the kernel symbols", not "what name
 |  was supplied to the boot loader".
 |  
 |  Thor

 Is access enough? We don't have devfs :-) What if ksyms is not compiled
 in the kernel and /dev/ksyms exists? I guess this is not an issue for current..

 christos

From: Thor Lancelot Simon <tls@rek.tjls.com>
To: gnats-bugs@netbsd.org
Cc: Andrew Doran <ad@netbsd.org>, Simon Burge <simonb@NetBSD.org>
Subject: Re: misc/38425 (savecore confused if getbootfile(3) fails)
Date: Fri, 20 Mar 2009 14:34:50 -0400

 Actually, the patch below is better.  No change to libkvm required:
 though the control flow is not tremendously clear, if you pass a NULL
 kernel filename it DTRT.


 Index: sbin/savecore/savecore.c
 ===================================================================
 RCS file: /cvsroot/src/sbin/savecore/savecore.c,v
 retrieving revision 1.76
 diff -u -r1.76 savecore.c
 --- sbin/savecore/savecore.c	20 Oct 2008 10:34:54 -0000	1.76
 +++ sbin/savecore/savecore.c	20 Mar 2009 18:29:31 -0000
 @@ -228,10 +228,6 @@
  	if (!clear)
  		dirname = argv[0];

 -	if (kernel == NULL) {
 -		kernel = getbootfile();
 -	}
 -
  	(void)time(&now);
  	kmem_setup();


From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org, Thor Lancelot Simon <tls@rek.tjls.com>
Cc: misc-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org, Simon Burge <simonb@NetBSD.org>
Subject: re: misc/38425 (savecore confused if getbootfile(3) fails)
Date: Sat, 21 Mar 2009 05:52:58 +1100

 i wonder if perhaps trying to open /dev/ksyms to make sure it
 works should be done as well as the access() check?


 .mrg.

State-Changed-From-To: open->closed
State-Changed-By: tls@NetBSD.org
State-Changed-When: Sat, 21 Mar 2009 19:35:02 +0000
State-Changed-Why:
I fixed the bug.


From: Tom Spindler <dogcow@babymeat.com>
To: gnats-bugs@NetBSD.org, Thor Lancelot Simon <tls@rek.tjls.com>
Cc: misc-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, Simon Burge <simonb@NetBSD.org>
Subject: re: misc/38425 (savecore confused if getbootfile(3) fails)
Date: Sun, 14 Jun 2009 06:50:17 -0700

 I don't know about any of you, but savecore completely fails to work
 for me after 1.78 savecore.c; in particular, failing to set the
 booted kernel causes check_space to die a painful death:

 	if (stat(kernel, &st) < 0) {
 		syslog(LOG_ERR, "%s: %m", kernel);
 		exit(1);

 which causes the 'savecore: (null): Bad address' message seen so
 infamously as of late; I don't see how it could ever work, save 
 eliminating the (almost entirely useless, save for the mostly
 useless /minfree part) check_space logic.


From: Thor Lancelot Simon <tls@rek.tjls.com>
To: Tom Spindler <dogcow@babymeat.com>
Cc: gnats-bugs@NetBSD.org, Thor Lancelot Simon <tls@rek.tjls.com>,
	misc-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, Simon Burge <simonb@NetBSD.org>
Subject: Re: misc/38425 (savecore confused if getbootfile(3) fails)
Date: Sun, 14 Jun 2009 15:17:55 -0400

 On Sun, Jun 14, 2009 at 06:50:17AM -0700, Tom Spindler wrote:
 > I don't know about any of you, but savecore completely fails to work
 > for me after 1.78 savecore.c; in particular, failing to set the
 > booted kernel causes check_space to die a painful death:

 I'll poke at it some more.  This works in our local tree -- maybe I
 missed a change when integrating it back into NetBSD.

 It is absolutely, positively *not* the right solution to cause any
 of these utilities to try to know the kernel name.  Any operation
 they have any business doing can be done via /dev/ksyms -- the
 question is where I missed one or more references to the kernel name
 that were exposed beyond libkvm, which correctly defaults to use
 /dev/ksyms if some other kernel name is not rammed down its throat.

 Thor

From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: tls@rek.tjls.com
Cc: dogcow@babymeat.com, gnats-bugs@NetBSD.org, misc-bug-people@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, simonb@NetBSD.org
Subject: Re: misc/38425 (savecore confused if getbootfile(3) fails)
Date: Sun, 14 Jun 2009 22:03:52 +0000 (UTC)

 hi,

 > On Sun, Jun 14, 2009 at 06:50:17AM -0700, Tom Spindler wrote:
 >> I don't know about any of you, but savecore completely fails to work
 >> for me after 1.78 savecore.c; in particular, failing to set the
 >> booted kernel causes check_space to die a painful death:
 > 
 > I'll poke at it some more.  This works in our local tree -- maybe I
 > missed a change when integrating it back into NetBSD.
 > 
 > It is absolutely, positively *not* the right solution to cause any
 > of these utilities to try to know the kernel name.  Any operation
 > they have any business doing can be done via /dev/ksyms -- the
 > question is where I missed one or more references to the kernel name
 > that were exposed beyond libkvm, which correctly defaults to use
 > /dev/ksyms if some other kernel name is not rammed down its throat.
 > 
 > Thor

 talking about the right solution, savecore should not use /dev/ksyms
 or info from the live kernel at all.  we should store necessary info
 in the dump itself.  (as freebsd does?)

 YAMAMOTO Takashi

From: Simon Burge <simonb@NetBSD.org>
To: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
Cc: tls@rek.tjls.com, dogcow@babymeat.com, gnats-bugs@NetBSD.org,
    misc-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: Re: misc/38425 (savecore confused if getbootfile(3) fails) 
Date: Mon, 15 Jun 2009 09:00:41 +1000

 YAMAMOTO Takashi wrote:

 > > On Sun, Jun 14, 2009 at 06:50:17AM -0700, Tom Spindler wrote:
 > >> I don't know about any of you, but savecore completely fails to work
 > >> for me after 1.78 savecore.c; in particular, failing to set the
 > >> booted kernel causes check_space to die a painful death:
 > > 
 > > I'll poke at it some more.  This works in our local tree -- maybe I
 > > missed a change when integrating it back into NetBSD.
 > > 
 > > It is absolutely, positively *not* the right solution to cause any
 > > of these utilities to try to know the kernel name.  Any operation
 > > they have any business doing can be done via /dev/ksyms -- the
 > > question is where I missed one or more references to the kernel name
 > > that were exposed beyond libkvm, which correctly defaults to use
 > > /dev/ksyms if some other kernel name is not rammed down its throat.
 > 
 > talking about the right solution, savecore should not use /dev/ksyms
 > or info from the live kernel at all.  we should store necessary info
 > in the dump itself.  (as freebsd does?)

 I had thought that /dev/ksyms seemed right, but that obviously doesn't
 help if the kernel that left the core wasn't the currently running
 kernel (as Yamaoto aludes to).

 Cheers,
 Simon.

From: Tom Spindler <dogcow@babymeat.com>
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
Cc: tls@rek.tjls.com, gnats-bugs@NetBSD.org, misc-bug-people@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, simonb@NetBSD.org
Subject: Re: misc/38425 (savecore confused if getbootfile(3) fails)
Date: Mon, 15 Jun 2009 05:20:55 -0700

 > talking about the right solution, savecore should not use /dev/ksyms
 > or info from the live kernel at all.  we should store necessary info
 > in the dump itself.  (as freebsd does?)

 Well, the only thing that the current booted kernel filename is
 used for is in the check_free() routine; if you eliminate that,
 savecore works as expected. I'm somewhat tempted to just remove the
 func entirely, as these days it's difficult to guess what the size
 of the coredump + size of the kernel will actually be - especially
 if they're gzipped.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.