NetBSD Problem Report #38425
From simonb@thistledown.com.au Mon Apr 14 04:35:50 2008
Return-Path: <simonb@thistledown.com.au>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by narn.NetBSD.org (Postfix) with ESMTP id BF21663B898
for <gnats-bugs@gnats.NetBSD.org>; Mon, 14 Apr 2008 04:35:50 +0000 (UTC)
Message-Id: <20080414043548.81A48AFD03@thoreau.thistledown.com.au>
Date: Mon, 14 Apr 2008 14:35:48 +1000 (EST)
From: Simon Burge <simonb@NetBSD.org>
Reply-To: Simon Burge <simonb@NetBSD.org>
To: gnats-bugs@gnats.NetBSD.org
Subject: savecore confused if getbootfile(3) fails
X-Send-Pr-Version: 3.95
>Number: 38425
>Category: misc
>Synopsis: savecore confused if getbootfile(3) fails
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: misc-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Apr 14 04:40:00 +0000 2008
>Closed-Date: Sat Mar 21 19:35:02 +0000 2009
>Last-Modified: Mon Jun 15 12:25:01 +0000 2009
>Originator: Simon Burge
>Release: NetBSD current, any time after 2008/01/15
>Organization:
>Environment:
Architecture: all ?
Machine: all ?
>Description:
If you try to boot a non-root-owned kernel that isn't called
'netbsd', getbootfile() fails and savecore will try to use
/netbsd anyway. The KREAD in kmem_setup() that fetches the
'dumpcdev' value will then read some random garbage value. In
my case, that reads a "0". find_dev() then searches for dev_t
0, returns "/dev/console" and any further operations that
try to read from "kernel" memory now block waiting for console
input (!!).
>How-To-Repeat:
Boot from a non-root-owned kernel that isn't called 'netbsd'
>Fix:
Not sure.. Better error checking in kmem_setup()?
>Release-Note:
>Audit-Trail:
From: Joerg Sonnenberger <joerg@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/38425 CVS commit: src/sbin/savecore
Date: Thu, 9 Oct 2008 13:59:51 +0000 (UTC)
Module Name: src
Committed By: joerg
Date: Thu Oct 9 13:59:51 UTC 2008
Modified Files:
src/sbin/savecore: savecore.c
Log Message:
Explicitly check that the dump device is not the console, a tty or pty.
While the list is adhoc, the problems reported are always with
/dev/console. Adresses PR 38425 and similiar issues with Xen.
To generate a diff of this commit:
cvs rdiff -r1.74 -r1.75 src/sbin/savecore/savecore.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sun, 22 Feb 2009 08:09:40 +0000
State-Changed-Why:
Fixed?
From: Simon Burge <simonb@NetBSD.org>
To: gnats-bugs@NetBSD.org
Cc: misc-bug-people@netbsd.org, netbsd-bugs@netbsd.org,
gnats-admin@netbsd.org, dholland@NetBSD.org
Subject: Re: misc/38425 (savecore confused if getbootfile(3) fails)
Date: Sun, 22 Feb 2009 20:05:14 +1100
dholland@NetBSD.org wrote:
> Synopsis: savecore confused if getbootfile(3) fails
>
> State-Changed-From-To: open->feedback
> State-Changed-By: dholland@NetBSD.org
> State-Changed-When: Sun, 22 Feb 2009 08:09:40 +0000
> State-Changed-Why:
> Fixed?
You mean rev 1.75 of savecore.c? That seems a bit hackish. From my
reading, it just picks a few random device classes to ignore when you
read an incorrect value. The /dev/console reference is that the random
read from /netbsd now seems to be most likely a "0", but it could be any
other valid device number. It doesn't say protect you from reading a
tape drive for example, or other devices where a random read might not
be either safe or at least have random consequences.
savecore is "safer" for my particular problem, but I don't think it's
"fixed" yet..
Simon.
From: Andrew Doran <ad@netbsd.org>
To: Simon Burge <simonb@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, tls@netbsd.org
Subject: Re: misc/38425 (savecore confused if getbootfile(3) fails)
Date: Sun, 22 Feb 2009 09:18:18 +0000
On Sun, Feb 22, 2009 at 08:05:14PM +1100, Simon Burge wrote:
> dholland@NetBSD.org wrote:
>
> > Synopsis: savecore confused if getbootfile(3) fails
> >
> > State-Changed-From-To: open->feedback
> > State-Changed-By: dholland@NetBSD.org
> > State-Changed-When: Sun, 22 Feb 2009 08:09:40 +0000
> > State-Changed-Why:
> > Fixed?
>
> You mean rev 1.75 of savecore.c? That seems a bit hackish. From my
> reading, it just picks a few random device classes to ignore when you
> read an incorrect value. The /dev/console reference is that the random
> read from /netbsd now seems to be most likely a "0", but it could be any
> other valid device number. It doesn't say protect you from reading a
> tape drive for example, or other devices where a random read might not
> be either safe or at least have random consequences.
>
> savecore is "safer" for my particular problem, but I don't think it's
> "fixed" yet..
Making it use /dev/ksyms like Thor proposed would improve the situation..
Andrew
State-Changed-From-To: feedback->open
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sun, 22 Feb 2009 10:03:38 +0000
State-Changed-Why:
No, not fixed, only hacked. The PR should stay open until someone does
something more solid.
From: Thor Lancelot Simon <tls@rek.tjls.com>
To: Andrew Doran <ad@netbsd.org>
Cc: Simon Burge <simonb@NetBSD.org>, gnats-bugs@NetBSD.org, tls@netbsd.org
Subject: Re: misc/38425 (savecore confused if getbootfile(3) fails)
Date: Fri, 20 Mar 2009 12:19:03 -0400
On Sun, Feb 22, 2009 at 09:18:18AM +0000, Andrew Doran wrote:
>
> Making it use /dev/ksyms like Thor proposed would improve the situation..
Try the below - savecore had a stupid bug (it should probably not feed
a filename to libkvm at all, but it certainly should not feed the wrong
one) and libkvm had all the right code but the wrong default.
On Christos' suggestion I have not changed getbootfile() as I originally
planned. I am sorely tempted however to remove it from the system as it
seems to only be used wrong -- literally everywhere it is used, what is
really wanted is "where do I get the kernel symbols", not "what name
was supplied to the boot loader".
Thor
Index: sbin/savecore/savecore.c
===================================================================
RCS file: /cvsroot/src/sbin/savecore/savecore.c,v
retrieving revision 1.76
diff -u -r1.76 savecore.c
--- sbin/savecore/savecore.c 20 Oct 2008 10:34:54 -0000 1.76
+++ sbin/savecore/savecore.c 20 Mar 2009 16:09:56 -0000
@@ -229,7 +229,11 @@
dirname = argv[0];
if (kernel == NULL) {
- kernel = getbootfile();
+ if (access(_PATH_KSYMS, R_OK) == 0) {
+ kernel = _PATH_KSYMS;
+ } else {
+ kernel = getbootfile();
+ }
}
(void)time(&now);
Index: lib/libkvm/kvm.c
===================================================================
RCS file: /cvsroot/src/lib/libkvm/kvm.c,v
retrieving revision 1.92
diff -u -r1.92 kvm.c
--- lib/libkvm/kvm.c 15 Jan 2008 14:16:30 -0000 1.92
+++ lib/libkvm/kvm.c 20 Mar 2009 16:09:57 -0000
@@ -288,6 +288,10 @@
ufgiven = (uf != NULL);
if (!ufgiven) {
+ if (access(_PATH_KSYMS, R_OK) == 0) {
+ uf = _PATH_KSYMS;
+ goto got_symfile;
+ }
#ifdef CPU_BOOTED_KERNEL
/* 130 is 128 + '/' + '\0' */
static char booted_kernel[130];
@@ -315,6 +319,7 @@
_kvm_err(kd, kd->program, "exec file name too long");
goto failed;
}
+got_symfile:
if (flag & ~O_RDWR) {
_kvm_err(kd, kd->program, "bad flags arg");
goto failed;
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, misc-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
Simon Burge <simonb@NetBSD.org>
Cc:
Subject: Re: misc/38425 (savecore confused if getbootfile(3) fails)
Date: Fri, 20 Mar 2009 14:00:36 -0400
On Mar 20, 4:20pm, tls@rek.tjls.com (Thor Lancelot Simon) wrote:
-- Subject: Re: misc/38425 (savecore confused if getbootfile(3) fails)
| The following reply was made to PR misc/38425; it has been noted by GNATS.
|
| From: Thor Lancelot Simon <tls@rek.tjls.com>
| To: Andrew Doran <ad@netbsd.org>
| Cc: Simon Burge <simonb@NetBSD.org>, gnats-bugs@NetBSD.org, tls@netbsd.org
| Subject: Re: misc/38425 (savecore confused if getbootfile(3) fails)
| Date: Fri, 20 Mar 2009 12:19:03 -0400
|
| On Sun, Feb 22, 2009 at 09:18:18AM +0000, Andrew Doran wrote:
| >
| > Making it use /dev/ksyms like Thor proposed would improve the situation..
|
| Try the below - savecore had a stupid bug (it should probably not feed
| a filename to libkvm at all, but it certainly should not feed the wrong
| one) and libkvm had all the right code but the wrong default.
|
| On Christos' suggestion I have not changed getbootfile() as I originally
| planned. I am sorely tempted however to remove it from the system as it
| seems to only be used wrong -- literally everywhere it is used, what is
| really wanted is "where do I get the kernel symbols", not "what name
| was supplied to the boot loader".
|
| Thor
Is access enough? We don't have devfs :-) What if ksyms is not compiled
in the kernel and /dev/ksyms exists? I guess this is not an issue for current..
christos
From: Thor Lancelot Simon <tls@rek.tjls.com>
To: gnats-bugs@netbsd.org
Cc: Andrew Doran <ad@netbsd.org>, Simon Burge <simonb@NetBSD.org>
Subject: Re: misc/38425 (savecore confused if getbootfile(3) fails)
Date: Fri, 20 Mar 2009 14:34:50 -0400
Actually, the patch below is better. No change to libkvm required:
though the control flow is not tremendously clear, if you pass a NULL
kernel filename it DTRT.
Index: sbin/savecore/savecore.c
===================================================================
RCS file: /cvsroot/src/sbin/savecore/savecore.c,v
retrieving revision 1.76
diff -u -r1.76 savecore.c
--- sbin/savecore/savecore.c 20 Oct 2008 10:34:54 -0000 1.76
+++ sbin/savecore/savecore.c 20 Mar 2009 18:29:31 -0000
@@ -228,10 +228,6 @@
if (!clear)
dirname = argv[0];
- if (kernel == NULL) {
- kernel = getbootfile();
- }
-
(void)time(&now);
kmem_setup();
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org, Thor Lancelot Simon <tls@rek.tjls.com>
Cc: misc-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, Simon Burge <simonb@NetBSD.org>
Subject: re: misc/38425 (savecore confused if getbootfile(3) fails)
Date: Sat, 21 Mar 2009 05:52:58 +1100
i wonder if perhaps trying to open /dev/ksyms to make sure it
works should be done as well as the access() check?
.mrg.
State-Changed-From-To: open->closed
State-Changed-By: tls@NetBSD.org
State-Changed-When: Sat, 21 Mar 2009 19:35:02 +0000
State-Changed-Why:
I fixed the bug.
From: Tom Spindler <dogcow@babymeat.com>
To: gnats-bugs@NetBSD.org, Thor Lancelot Simon <tls@rek.tjls.com>
Cc: misc-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, Simon Burge <simonb@NetBSD.org>
Subject: re: misc/38425 (savecore confused if getbootfile(3) fails)
Date: Sun, 14 Jun 2009 06:50:17 -0700
I don't know about any of you, but savecore completely fails to work
for me after 1.78 savecore.c; in particular, failing to set the
booted kernel causes check_space to die a painful death:
if (stat(kernel, &st) < 0) {
syslog(LOG_ERR, "%s: %m", kernel);
exit(1);
which causes the 'savecore: (null): Bad address' message seen so
infamously as of late; I don't see how it could ever work, save
eliminating the (almost entirely useless, save for the mostly
useless /minfree part) check_space logic.
From: Thor Lancelot Simon <tls@rek.tjls.com>
To: Tom Spindler <dogcow@babymeat.com>
Cc: gnats-bugs@NetBSD.org, Thor Lancelot Simon <tls@rek.tjls.com>,
misc-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, Simon Burge <simonb@NetBSD.org>
Subject: Re: misc/38425 (savecore confused if getbootfile(3) fails)
Date: Sun, 14 Jun 2009 15:17:55 -0400
On Sun, Jun 14, 2009 at 06:50:17AM -0700, Tom Spindler wrote:
> I don't know about any of you, but savecore completely fails to work
> for me after 1.78 savecore.c; in particular, failing to set the
> booted kernel causes check_space to die a painful death:
I'll poke at it some more. This works in our local tree -- maybe I
missed a change when integrating it back into NetBSD.
It is absolutely, positively *not* the right solution to cause any
of these utilities to try to know the kernel name. Any operation
they have any business doing can be done via /dev/ksyms -- the
question is where I missed one or more references to the kernel name
that were exposed beyond libkvm, which correctly defaults to use
/dev/ksyms if some other kernel name is not rammed down its throat.
Thor
From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: tls@rek.tjls.com
Cc: dogcow@babymeat.com, gnats-bugs@NetBSD.org, misc-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, simonb@NetBSD.org
Subject: Re: misc/38425 (savecore confused if getbootfile(3) fails)
Date: Sun, 14 Jun 2009 22:03:52 +0000 (UTC)
hi,
> On Sun, Jun 14, 2009 at 06:50:17AM -0700, Tom Spindler wrote:
>> I don't know about any of you, but savecore completely fails to work
>> for me after 1.78 savecore.c; in particular, failing to set the
>> booted kernel causes check_space to die a painful death:
>
> I'll poke at it some more. This works in our local tree -- maybe I
> missed a change when integrating it back into NetBSD.
>
> It is absolutely, positively *not* the right solution to cause any
> of these utilities to try to know the kernel name. Any operation
> they have any business doing can be done via /dev/ksyms -- the
> question is where I missed one or more references to the kernel name
> that were exposed beyond libkvm, which correctly defaults to use
> /dev/ksyms if some other kernel name is not rammed down its throat.
>
> Thor
talking about the right solution, savecore should not use /dev/ksyms
or info from the live kernel at all. we should store necessary info
in the dump itself. (as freebsd does?)
YAMAMOTO Takashi
From: Simon Burge <simonb@NetBSD.org>
To: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
Cc: tls@rek.tjls.com, dogcow@babymeat.com, gnats-bugs@NetBSD.org,
misc-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: misc/38425 (savecore confused if getbootfile(3) fails)
Date: Mon, 15 Jun 2009 09:00:41 +1000
YAMAMOTO Takashi wrote:
> > On Sun, Jun 14, 2009 at 06:50:17AM -0700, Tom Spindler wrote:
> >> I don't know about any of you, but savecore completely fails to work
> >> for me after 1.78 savecore.c; in particular, failing to set the
> >> booted kernel causes check_space to die a painful death:
> >
> > I'll poke at it some more. This works in our local tree -- maybe I
> > missed a change when integrating it back into NetBSD.
> >
> > It is absolutely, positively *not* the right solution to cause any
> > of these utilities to try to know the kernel name. Any operation
> > they have any business doing can be done via /dev/ksyms -- the
> > question is where I missed one or more references to the kernel name
> > that were exposed beyond libkvm, which correctly defaults to use
> > /dev/ksyms if some other kernel name is not rammed down its throat.
>
> talking about the right solution, savecore should not use /dev/ksyms
> or info from the live kernel at all. we should store necessary info
> in the dump itself. (as freebsd does?)
I had thought that /dev/ksyms seemed right, but that obviously doesn't
help if the kernel that left the core wasn't the currently running
kernel (as Yamaoto aludes to).
Cheers,
Simon.
From: Tom Spindler <dogcow@babymeat.com>
To: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
Cc: tls@rek.tjls.com, gnats-bugs@NetBSD.org, misc-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, simonb@NetBSD.org
Subject: Re: misc/38425 (savecore confused if getbootfile(3) fails)
Date: Mon, 15 Jun 2009 05:20:55 -0700
> talking about the right solution, savecore should not use /dev/ksyms
> or info from the live kernel at all. we should store necessary info
> in the dump itself. (as freebsd does?)
Well, the only thing that the current booted kernel filename is
used for is in the check_free() routine; if you eliminate that,
savecore works as expected. I'm somewhat tempted to just remove the
func entirely, as these days it's difficult to guess what the size
of the coredump + size of the kernel will actually be - especially
if they're gzipped.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.