NetBSD Problem Report #43182
From www@NetBSD.org Mon Apr 19 16:29:33 2010
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id B218363B8BC
for <gnats-bugs@gnats.NetBSD.org>; Mon, 19 Apr 2010 16:29:33 +0000 (UTC)
Message-Id: <20100419162933.70CAE63B873@www.NetBSD.org>
Date: Mon, 19 Apr 2010 16:29:33 +0000 (UTC)
From: frederic@fauberteau.org
Reply-To: frederic@fauberteau.org
To: gnats-bugs@NetBSD.org
Subject: Reboot due to system crash
X-Send-Pr-Version: www-1.0
>Number: 43182
>Category: kern
>Synopsis: Reboot due to system crash (probably umass)
>Confidential: no
>Severity: serious
>Priority: low
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Apr 19 16:30:00 +0000 2010
>Closed-Date: Fri Jan 04 21:37:07 +0000 2019
>Last-Modified: Fri Jan 04 21:37:07 +0000 2019
>Originator: Frédéric Fauberteau
>Release: 5.0.2
>Organization:
Université Paris-Est
>Environment:
NetBSD trashware 5.0.2 NetBSD 5.0.2 (GENERIC) #0: Sat Feb 6 17:53:27 UTC 2010 builds@b7.netbsd.org:/home/builds/ab/netbsd-5-0-2-RELEASE/i386/201002061851Z-obj/home/builds/ab/netbsd-5-0-2-RELEASE/src/sys/arch/i386/compile/GENERIC i386
>Description:
I apologize for my inexperience in bug reporting (this is my first one).
I have just seen my server to reboot but the problem seems to be older. I found these log in my message files :
Mar 7 20:56:54 syslogd: restart
Mar 7 20:56:54 /netbsd: umass0: BBB bulk-out clear stall failed, IOERROR
Mar 7 20:56:54 /netbsd: uvm_fault(0xcc6309f8, 0, 1) -> 0xe
Mar 7 20:56:54 /netbsd: fatal page fault in supervisor mode
Mar 7 20:56:54 /netbsd: trap type 6 code 0 eip c03b0ea5 cs 8 eflags 10246 cr2 0 ilevel 0
Mar 7 20:56:54 /netbsd: panic: trap
Mar 7 20:56:54 /netbsd: Begin traceback...
Mar 7 20:56:54 /netbsd: End traceback...
Apr 3 07:15:22 syslogd: restart
Apr 3 07:15:22 /netbsd: fatal page fault in supervisor mode
Apr 3 07:15:22 /netbsd: trap type 6 code 2 eip c046a188 cs 8 eflags 10292 cr2 cc666c84 ilevel 2
Apr 3 07:15:22 /netbsd: panic: trap
Apr 3 07:15:22 /netbsd: Begin traceback...
Apr 3 07:15:22 /netbsd: fatal page fault in supervisor mode
Apr 3 07:15:22 /netbsd: trap type 6 code 0 eip c053ef81 cs 8 eflags 10246 cr2 0 ilevel 2
Apr 3 07:15:22 /netbsd: panic: trap
Apr 3 07:15:22 /netbsd: Faulted in mid-traceback; aborting...
Apr 3 07:15:22 /netbsd: dumping to dev 0,1 offset 3180407
Apr 3 07:15:22 /netbsd: dump succeeded
Apr 3 07:15:22 /netbsd:
Apr 3 07:15:22 /netbsd:
Apr 3 07:15:22 /netbsd: sd0(umass0:0:0:0): generic HBA error
Apr 3 07:15:22 /netbsd: fatal page fault in supervisor mode
Apr 3 07:15:22 /netbsd: trap type 6 code 0 eip 0 cs 8 eflags 10246 cr2 0 ilevel 4
Apr 3 07:15:22 /netbsd: panic: trap
Apr 3 07:15:22 /netbsd: Faulted in mid-traceback; aborting...
Apr 11 21:15:38 syslogd: restart
Apr 11 21:15:38 /netbsd: fatal page fault in supervisor mode
Apr 11 21:15:38 /netbsd: trap type 6 code 2 eip c046a188 cs 8 eflags 10292 cr2 cc863c84 ilevel 2
Apr 11 21:15:38 /netbsd: panic: trap
Apr 11 21:15:38 /netbsd: Begin traceback...
Apr 11 21:15:38 /netbsd: fatal page fault in supervisor mode
Apr 11 21:15:38 /netbsd: trap type 6 code 0 eip c053ef81 cs 8 eflags 10246 cr2 0 ilevel 2
Apr 11 21:15:38 /netbsd: panic: trap
Apr 11 21:15:38 /netbsd: Faulted in mid-traceback; aborting...
And the crash occurs all the day until today. Maybe a bug with the USB mass storage ...
Have you some advices to help me to produce better informations to understand this problem ?
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: misc/43182: Reboot due to system crash
Date: Fri, 23 Apr 2010 04:51:40 +0000
On Mon, Apr 19, 2010 at 04:30:00PM +0000, frederic@fauberteau.org wrote:
> I apologize for my inexperience in bug reporting (this is my first one).
You're not doing badly :-)
> Mar 7 20:56:54 /netbsd: umass0: BBB bulk-out clear stall failed, IOERROR
> Mar 7 20:56:54 /netbsd: uvm_fault(0xcc6309f8, 0, 1) -> 0xe
> Mar 7 20:56:54 /netbsd: fatal page fault in supervisor mode
> Mar 7 20:56:54 /netbsd: trap type 6 code 0 eip c03b0ea5 cs 8 eflags 10246 cr2 0 ilevel 0
(1)
> Apr 3 07:15:22 /netbsd: fatal page fault in supervisor mode
> Apr 3 07:15:22 /netbsd: trap type 6 code 2 eip c046a188 cs 8 eflags 10292 cr2 cc666c84 ilevel 2
(2)
> Apr 3 07:15:22 /netbsd: dumping to dev 0,1 offset 3180407
> Apr 3 07:15:22 /netbsd: dump succeeded
> Apr 3 07:15:22 /netbsd:
> Apr 3 07:15:22 /netbsd:
> Apr 3 07:15:22 /netbsd: sd0(umass0:0:0:0): generic HBA error
> Apr 3 07:15:22 /netbsd: fatal page fault in supervisor mode
> Apr 3 07:15:22 /netbsd: trap type 6 code 0 eip 0 cs 8 eflags 10246 cr2 0 ilevel 4
> Apr 3 07:15:22 /netbsd: panic: trap
> Apr 3 07:15:22 /netbsd: Faulted in mid-traceback; aborting...
(3)
> Apr 11 21:15:38 /netbsd: fatal page fault in supervisor mode
> Apr 11 21:15:38 /netbsd: trap type 6 code 2 eip c046a188 cs 8 eflags 10292 cr2 cc863c84 ilevel 2
same as (2).
> And the crash occurs all the day until today. Maybe a bug with the
> USB mass storage ...
Yes, quite likely, although (1) and (2) may actually be different
problems, and I'm curious why in (3) it seems to be accessing sd0 on
umass0 after doing a crashdump to wd0b. (0,1 is wd0b; ls -l /dev/wd0b.)
Ordinarily by that point in crashing it shouldn't be touching anything
besides the system console.
> Have you some advices to help me to produce better informations to
> understand this problem ?
Some things that would probably be helpful to know:
(a) What's umass0 attached to? Is it going through an ehci, uhci, or
ohci USB controller, and what's in between? (The easiest way to answer
this question is to forward a boot log from /var/run/dmesg.boot.)
Also it might be useful to what your USB device calls itself, which
will also be in the boot log.
(b) Is there anything unusual you're doing with the USB device that
might explain what's happening in case (3)?
(c) Where did crashes (1) and (2) happen? If you feel up to it, run
"nm -n /netbsd | less" and find the last name before the EIP address
from the crash (c03b0ea5 in case (1), c046a188 in case (2)) -- this is
the name of the function it died in. In my kernel c03b0ea5 is between
these:
c03b0e60 T i4b_dl_release_ind
c03b0f30 T i4b_dl_establish_cnf
but that doesn't mean anything; it'll be different in yours. (If you
can't do this, because you're using the prebuild 5.0.2 GENERIC kernel
someone else can; but if you can, it saves waiting for someone else to
get around to downloading that kernel and checking.)
Note that crash (3) jumped to 0 and trying to look that up won't yield
anything particularly interesting. :-/
Unfortunately, there are a number of more-or-less known but unsolved
problems with umass...
--
David A. Holland
dholland@netbsd.org
Responsible-Changed-From-To: misc-bug-people->kern-bug-people
Responsible-Changed-By: dholland@NetBSD.org
Responsible-Changed-When: Fri, 23 Apr 2010 05:15:46 +0000
Responsible-Changed-Why:
the submitter is seeing panics => kernel issue
From: =?UTF-8?Q?Fr=C3=A9d=C3=A9ric_Fauberteau?= <frederic@fauberteau.org>
To: <gnats-bugs@NetBSD.org>
Cc:
Subject: Re: misc/43182: Reboot due to system crash
Date: Fri, 23 Apr 2010 08:23:25 +0200
On Fri, 23 Apr 2010 04:55:01 +0000 (UTC), David Holland
<dholland-bugs@netbsd.org> wrote:
> The following reply was made to PR misc/43182; it has been noted by
GNATS.
>
> From: David Holland <dholland-bugs@netbsd.org>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: misc/43182: Reboot due to system crash
> Date: Fri, 23 Apr 2010 04:51:40 +0000
>
> On Mon, Apr 19, 2010 at 04:30:00PM +0000, frederic@fauberteau.org
wrote:
> > I apologize for my inexperience in bug reporting (this is my first
> > one).
>
> You're not doing badly :-)
>
> > Mar 7 20:56:54 /netbsd: umass0: BBB bulk-out clear stall failed,
> > IOERROR
> > Mar 7 20:56:54 /netbsd: uvm_fault(0xcc6309f8, 0, 1) -> 0xe
> > Mar 7 20:56:54 /netbsd: fatal page fault in supervisor mode
> > Mar 7 20:56:54 /netbsd: trap type 6 code 0 eip c03b0ea5 cs 8 eflags
> > 10246 cr2 0 ilevel 0
>
> (1)
>
> > Apr 3 07:15:22 /netbsd: fatal page fault in supervisor mode
> > Apr 3 07:15:22 /netbsd: trap type 6 code 2 eip c046a188 cs 8 eflags
> > 10292 cr2 cc666c84 ilevel 2
>
> (2)
>
> > Apr 3 07:15:22 /netbsd: dumping to dev 0,1 offset 3180407
> > Apr 3 07:15:22 /netbsd: dump succeeded
> > Apr 3 07:15:22 /netbsd:
> > Apr 3 07:15:22 /netbsd:
> > Apr 3 07:15:22 /netbsd: sd0(umass0:0:0:0): generic HBA error
> > Apr 3 07:15:22 /netbsd: fatal page fault in supervisor mode
> > Apr 3 07:15:22 /netbsd: trap type 6 code 0 eip 0 cs 8 eflags 10246
> > cr2 0 ilevel 4
> > Apr 3 07:15:22 /netbsd: panic: trap
> > Apr 3 07:15:22 /netbsd: Faulted in mid-traceback; aborting...
>
> (3)
>
> > Apr 11 21:15:38 /netbsd: fatal page fault in supervisor mode
> > Apr 11 21:15:38 /netbsd: trap type 6 code 2 eip c046a188 cs 8 eflags
> > 10292 cr2 cc863c84 ilevel 2
>
> same as (2).
>
> > And the crash occurs all the day until today. Maybe a bug with the
> > USB mass storage ...
>
> Yes, quite likely, although (1) and (2) may actually be different
> problems, and I'm curious why in (3) it seems to be accessing sd0 on
> umass0 after doing a crashdump to wd0b. (0,1 is wd0b; ls -l /dev/wd0b.)
> Ordinarily by that point in crashing it shouldn't be touching anything
> besides the system console.
>
> > Have you some advices to help me to produce better informations to
> > understand this problem ?
>
> Some things that would probably be helpful to know:
>
> (a) What's umass0 attached to? Is it going through an ehci, uhci, or
> ohci USB controller, and what's in between? (The easiest way to answer
> this question is to forward a boot log from /var/run/dmesg.boot.)
> Also it might be useful to what your USB device calls itself, which
> will also be in the boot log.
>
> (b) Is there anything unusual you're doing with the USB device that
> might explain what's happening in case (3)?
>
> (c) Where did crashes (1) and (2) happen? If you feel up to it, run
> "nm -n /netbsd | less" and find the last name before the EIP address
> from the crash (c03b0ea5 in case (1), c046a188 in case (2)) -- this is
> the name of the function it died in. In my kernel c03b0ea5 is between
> these:
> c03b0e60 T i4b_dl_release_ind
> c03b0f30 T i4b_dl_establish_cnf
>
> but that doesn't mean anything; it'll be different in yours. (If you
> can't do this, because you're using the prebuild 5.0.2 GENERIC kernel
> someone else can; but if you can, it saves waiting for someone else to
> get around to downloading that kernel and checking.)
>
> Note that crash (3) jumped to 0 and trying to look that up won't yield
> anything particularly interesting. :-/
>
> Unfortunately, there are a number of more-or-less known but unsolved
> problems with umass...
>
> --
> David A. Holland
> dholland@netbsd.org
In case (1) c03b0ea5 is between :
c03b0e70 T ext2fs_inactive
c03b0fc0 T ext2fs_checkpath
In case (2) c046a188 is between :
c046a1d0 T kpsignal
c046a270 T kpgsignal
My USB disk was mounted is ext2fs. I has umounted it but my kernel
continue to panic. In ddb mode, I obtained this trace :
kpsignal2(cc552cf4,cad21d04,cad21d04,cbd41a5c,cc552cf4,cad21d24,cad21d40,c0476cd9,cc552cf4,cad21d04)
at netbsd:kpsignal2+0x5a8
kpsignal(cc552cf4,cad21d04,0,10c,10c,cad21d04,cad21d40,0,0,0) at
netbsd:kpsignal+0x7a
timer_intr(0,ca920010,ca920030,cad20010,ca920010,0,a3d360,c16b0400,0,cad21da0)
at netbsd:timer_intr+0x229
softint_dispatch(ca927c80,2,0,0,0,0,cad21d90,cad21ce4,cad21d00,0) at
netbsd:softint_dispatch+0x64
DDB lost frame for netbsd:Xsoftintr+0x3d, trying 0xcad21d88
Xsoftintr() at netbsd:Xsoftintr+0x3d
--- interrupt ---
fatal page fault in supervisor mode
trap type 6 code 0 eip c053f8a7 cs 8 eflags 10206 cr2 3a ilevel 8
kernel: supervisor trap page fault, code=0
Faulted in DDB; continuing...
but I don't know if it is usefull ... I don't know why, but the dump of
the kernel fails (nothin in /var/crash except 'minfree' file).
I have compile my kernel from sources. Now I'm waiting for a panic and I
know that nm is my friend ;)
--
Frédéric Fauberteau
frederic@fauberteau.org
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: misc/43182: Reboot due to system crash
Date: Sat, 24 Apr 2010 02:32:21 +0000
On Fri, Apr 23, 2010 at 07:45:01AM +0000, Fr?d?ric Fauberteau wrote:
> In case (1) c03b0ea5 is between :
> c03b0e70 T ext2fs_inactive
> c03b0fc0 T ext2fs_checkpath
For this to happen ext2fs_inactive would have to be passed either a
null vnode, a vnode with a null inode attached to it, or a vnode with
a null v_mount. In theory, none of these should happen, even on error,
so if you can get a stack trace from ddb or from a crashdump it might
be useful.
> In case (2) c046a188 is between :
> c046a1d0 T kpsignal
> c046a270 T kpgsignal
no it isn't, a188 < a1d0... but, since you have a trace:
> My USB disk was mounted is ext2fs. I has umounted it but my kernel
> continue to panic. In ddb mode, I obtained this trace :
>
> kpsignal2(cc552cf4,cad21d04,cad21d04,cbd41a5c,cc552cf4,cad21d24,cad21d40,c0476cd9,cc552cf4,cad21d04)
> at netbsd:kpsignal2+0x5a8
> kpsignal(cc552cf4,cad21d04,0,10c,10c,cad21d04,cad21d40,0,0,0) at
> netbsd:kpsignal+0x7a
> timer_intr(0,ca920010,ca920030,cad20010,ca920010,0,a3d360,c16b0400,0,cad21da0)
> at netbsd:timer_intr+0x229
I don't see how this can be the same problem. Does this one continue
to happen if you boot the machine without the USB device and don't
insert it? Is there anything that seems to cause it or that it seems
to be related to doing?
> but I don't know if it is usefull ... I don't know why, but the dump of
> the kernel fails (nothin in /var/crash except 'minfree' file).
The most common reason is that the swap device isn't big enough to
hold the dump image. There's a newish feature for writing small dumps
but I'm not sure if it's in 5.0.2.
> I have compile my kernel from sources. Now I'm waiting for a panic and I
> know that nm is my friend ;)
If you've done that, ddb is usually more your friend... although the
nm -n technique is still sometimes useful. (objdump -d /netbsd can be
useful too sometimes if one isn't afraid of wading through assembly
code.)
There's also some chance that if you build a kernel from the netbsd-5
branch (5.0_STABLE, what will eventually become 5.1) either or both of
these problems will go away. The 5.0.x releases (netbsd-5-0 branch)
only get critical fixes.
--
David A. Holland
dholland@netbsd.org
State-Changed-From-To: open->closed
State-Changed-By: triaxx@NetBSD.org
State-Changed-When: Fri, 04 Jan 2019 21:37:07 +0000
State-Changed-Why:
I don't use NetBSD-5 for a long time and I don't have this disk anymore.
Thank you David for giving me the desire to stay.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.