NetBSD Problem Report #43182

From www@NetBSD.org  Mon Apr 19 16:29:33 2010
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id B218363B8BC
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 19 Apr 2010 16:29:33 +0000 (UTC)
Message-Id: <20100419162933.70CAE63B873@www.NetBSD.org>
Date: Mon, 19 Apr 2010 16:29:33 +0000 (UTC)
From: frederic@fauberteau.org
Reply-To: frederic@fauberteau.org
To: gnats-bugs@NetBSD.org
Subject: Reboot due to system crash
X-Send-Pr-Version: www-1.0

>Number:         43182
>Category:       kern
>Synopsis:       Reboot due to system crash (probably umass)
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Apr 19 16:30:00 +0000 2010
>Closed-Date:    Fri Jan 04 21:37:07 +0000 2019
>Last-Modified:  Fri Jan 04 21:37:07 +0000 2019
>Originator:     Frédéric Fauberteau
>Release:        5.0.2
>Organization:
Université Paris-Est
>Environment:
NetBSD trashware 5.0.2 NetBSD 5.0.2 (GENERIC) #0: Sat Feb  6 17:53:27 UTC 2010  builds@b7.netbsd.org:/home/builds/ab/netbsd-5-0-2-RELEASE/i386/201002061851Z-obj/home/builds/ab/netbsd-5-0-2-RELEASE/src/sys/arch/i386/compile/GENERIC i386
>Description:
I apologize for my inexperience in bug reporting (this is my first one).

I have just seen my server to reboot but the problem seems to be older. I found these log in my message files :

Mar  7 20:56:54 syslogd: restart
Mar  7 20:56:54 /netbsd: umass0: BBB bulk-out clear stall failed, IOERROR
Mar  7 20:56:54 /netbsd: uvm_fault(0xcc6309f8, 0, 1) -> 0xe
Mar  7 20:56:54 /netbsd: fatal page fault in supervisor mode
Mar  7 20:56:54 /netbsd: trap type 6 code 0 eip c03b0ea5 cs 8 eflags 10246 cr2 0 ilevel 0
Mar  7 20:56:54 /netbsd: panic: trap
Mar  7 20:56:54 /netbsd: Begin traceback...
Mar  7 20:56:54 /netbsd: End traceback...

Apr  3 07:15:22 syslogd: restart
Apr  3 07:15:22 /netbsd: fatal page fault in supervisor mode
Apr  3 07:15:22 /netbsd: trap type 6 code 2 eip c046a188 cs 8 eflags 10292 cr2 cc666c84 ilevel 2
Apr  3 07:15:22 /netbsd: panic: trap
Apr  3 07:15:22 /netbsd: Begin traceback...
Apr  3 07:15:22 /netbsd: fatal page fault in supervisor mode
Apr  3 07:15:22 /netbsd: trap type 6 code 0 eip c053ef81 cs 8 eflags 10246 cr2 0 ilevel 2
Apr  3 07:15:22 /netbsd: panic: trap
Apr  3 07:15:22 /netbsd: Faulted in mid-traceback; aborting...
Apr  3 07:15:22 /netbsd: dumping to dev 0,1 offset 3180407
Apr  3 07:15:22 /netbsd: dump succeeded
Apr  3 07:15:22 /netbsd: 
Apr  3 07:15:22 /netbsd: 
Apr  3 07:15:22 /netbsd: sd0(umass0:0:0:0): generic HBA error
Apr  3 07:15:22 /netbsd: fatal page fault in supervisor mode
Apr  3 07:15:22 /netbsd: trap type 6 code 0 eip 0 cs 8 eflags 10246 cr2 0 ilevel 4
Apr  3 07:15:22 /netbsd: panic: trap
Apr  3 07:15:22 /netbsd: Faulted in mid-traceback; aborting...

Apr 11 21:15:38 syslogd: restart
Apr 11 21:15:38 /netbsd: fatal page fault in supervisor mode
Apr 11 21:15:38 /netbsd: trap type 6 code 2 eip c046a188 cs 8 eflags 10292 cr2 cc863c84 ilevel 2
Apr 11 21:15:38 /netbsd: panic: trap
Apr 11 21:15:38 /netbsd: Begin traceback...
Apr 11 21:15:38 /netbsd: fatal page fault in supervisor mode
Apr 11 21:15:38 /netbsd: trap type 6 code 0 eip c053ef81 cs 8 eflags 10246 cr2 0 ilevel 2
Apr 11 21:15:38 /netbsd: panic: trap
Apr 11 21:15:38 /netbsd: Faulted in mid-traceback; aborting...

And the crash occurs all the day until today. Maybe a bug with the USB mass storage ...

Have you some advices to help me to produce better informations to understand this problem ?

>How-To-Repeat:

>Fix:

>Release-Note:

>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: misc/43182: Reboot due to system crash
Date: Fri, 23 Apr 2010 04:51:40 +0000

 On Mon, Apr 19, 2010 at 04:30:00PM +0000, frederic@fauberteau.org wrote:
  > I apologize for my inexperience in bug reporting (this is my first one).

 You're not doing badly :-)

  > Mar  7 20:56:54 /netbsd: umass0: BBB bulk-out clear stall failed, IOERROR
  > Mar  7 20:56:54 /netbsd: uvm_fault(0xcc6309f8, 0, 1) -> 0xe
  > Mar  7 20:56:54 /netbsd: fatal page fault in supervisor mode
  > Mar  7 20:56:54 /netbsd: trap type 6 code 0 eip c03b0ea5 cs 8 eflags 10246 cr2 0 ilevel 0

 (1)

  > Apr  3 07:15:22 /netbsd: fatal page fault in supervisor mode
  > Apr  3 07:15:22 /netbsd: trap type 6 code 2 eip c046a188 cs 8 eflags 10292 cr2 cc666c84 ilevel 2

 (2)

  > Apr  3 07:15:22 /netbsd: dumping to dev 0,1 offset 3180407
  > Apr  3 07:15:22 /netbsd: dump succeeded
  > Apr  3 07:15:22 /netbsd: 
  > Apr  3 07:15:22 /netbsd: 
  > Apr  3 07:15:22 /netbsd: sd0(umass0:0:0:0): generic HBA error
  > Apr  3 07:15:22 /netbsd: fatal page fault in supervisor mode
  > Apr  3 07:15:22 /netbsd: trap type 6 code 0 eip 0 cs 8 eflags 10246 cr2 0 ilevel 4
  > Apr  3 07:15:22 /netbsd: panic: trap
  > Apr  3 07:15:22 /netbsd: Faulted in mid-traceback; aborting...

 (3)

  > Apr 11 21:15:38 /netbsd: fatal page fault in supervisor mode
  > Apr 11 21:15:38 /netbsd: trap type 6 code 2 eip c046a188 cs 8 eflags 10292 cr2 cc863c84 ilevel 2

 same as (2).

  > And the crash occurs all the day until today. Maybe a bug with the
  > USB mass storage ...

 Yes, quite likely, although (1) and (2) may actually be different
 problems, and I'm curious why in (3) it seems to be accessing sd0 on
 umass0 after doing a crashdump to wd0b. (0,1 is wd0b; ls -l /dev/wd0b.)
 Ordinarily by that point in crashing it shouldn't be touching anything
 besides the system console.

  > Have you some advices to help me to produce better informations to
  > understand this problem ?

 Some things that would probably be helpful to know:

 (a) What's umass0 attached to? Is it going through an ehci, uhci, or
 ohci USB controller, and what's in between? (The easiest way to answer
 this question is to forward a boot log from /var/run/dmesg.boot.)
 Also it might be useful to what your USB device calls itself, which
 will also be in the boot log.

 (b) Is there anything unusual you're doing with the USB device that
 might explain what's happening in case (3)? 

 (c) Where did crashes (1) and (2) happen? If you feel up to it, run
 "nm -n /netbsd | less" and find the last name before the EIP address
 from the crash (c03b0ea5 in case (1), c046a188 in case (2)) -- this is
 the name of the function it died in. In my kernel c03b0ea5 is between
 these:
    c03b0e60 T i4b_dl_release_ind
    c03b0f30 T i4b_dl_establish_cnf

 but that doesn't mean anything; it'll be different in yours. (If you
 can't do this, because you're using the prebuild 5.0.2 GENERIC kernel
 someone else can; but if you can, it saves waiting for someone else to
 get around to downloading that kernel and checking.)

 Note that crash (3) jumped to 0 and trying to look that up won't yield
 anything particularly interesting. :-/

 Unfortunately, there are a number of more-or-less known but unsolved
 problems with umass...

 -- 
 David A. Holland
 dholland@netbsd.org

Responsible-Changed-From-To: misc-bug-people->kern-bug-people
Responsible-Changed-By: dholland@NetBSD.org
Responsible-Changed-When: Fri, 23 Apr 2010 05:15:46 +0000
Responsible-Changed-Why:
the submitter is seeing panics => kernel issue


From: =?UTF-8?Q?Fr=C3=A9d=C3=A9ric_Fauberteau?= <frederic@fauberteau.org>
To: <gnats-bugs@NetBSD.org>
Cc: 
Subject: Re: misc/43182: Reboot due to system crash
Date: Fri, 23 Apr 2010 08:23:25 +0200

 On Fri, 23 Apr 2010 04:55:01 +0000 (UTC), David Holland
 <dholland-bugs@netbsd.org> wrote:
 > The following reply was made to PR misc/43182; it has been noted by
 GNATS.
 > 
 > From: David Holland <dholland-bugs@netbsd.org>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: misc/43182: Reboot due to system crash
 > Date: Fri, 23 Apr 2010 04:51:40 +0000
 > 
 >  On Mon, Apr 19, 2010 at 04:30:00PM +0000, frederic@fauberteau.org
 wrote:
 >   > I apologize for my inexperience in bug reporting (this is my first
 >   > one).
 >  
 >  You're not doing badly :-)
 >  
 >   > Mar  7 20:56:54 /netbsd: umass0: BBB bulk-out clear stall failed,
 >   > IOERROR
 >   > Mar  7 20:56:54 /netbsd: uvm_fault(0xcc6309f8, 0, 1) -> 0xe
 >   > Mar  7 20:56:54 /netbsd: fatal page fault in supervisor mode
 >   > Mar  7 20:56:54 /netbsd: trap type 6 code 0 eip c03b0ea5 cs 8 eflags
 >   > 10246 cr2 0 ilevel 0
 >  
 >  (1)
 >  
 >   > Apr  3 07:15:22 /netbsd: fatal page fault in supervisor mode
 >   > Apr  3 07:15:22 /netbsd: trap type 6 code 2 eip c046a188 cs 8 eflags
 >   > 10292 cr2 cc666c84 ilevel 2
 >  
 >  (2)
 >  
 >   > Apr  3 07:15:22 /netbsd: dumping to dev 0,1 offset 3180407
 >   > Apr  3 07:15:22 /netbsd: dump succeeded
 >   > Apr  3 07:15:22 /netbsd: 
 >   > Apr  3 07:15:22 /netbsd: 
 >   > Apr  3 07:15:22 /netbsd: sd0(umass0:0:0:0): generic HBA error
 >   > Apr  3 07:15:22 /netbsd: fatal page fault in supervisor mode
 >   > Apr  3 07:15:22 /netbsd: trap type 6 code 0 eip 0 cs 8 eflags 10246
 >   > cr2 0 ilevel 4
 >   > Apr  3 07:15:22 /netbsd: panic: trap
 >   > Apr  3 07:15:22 /netbsd: Faulted in mid-traceback; aborting...
 >  
 >  (3)
 >  
 >   > Apr 11 21:15:38 /netbsd: fatal page fault in supervisor mode
 >   > Apr 11 21:15:38 /netbsd: trap type 6 code 2 eip c046a188 cs 8 eflags
 >   > 10292 cr2 cc863c84 ilevel 2
 >  
 >  same as (2).
 >  
 >   > And the crash occurs all the day until today. Maybe a bug with the
 >   > USB mass storage ...
 >  
 >  Yes, quite likely, although (1) and (2) may actually be different
 >  problems, and I'm curious why in (3) it seems to be accessing sd0 on
 >  umass0 after doing a crashdump to wd0b. (0,1 is wd0b; ls -l /dev/wd0b.)
 >  Ordinarily by that point in crashing it shouldn't be touching anything
 >  besides the system console.
 >  
 >   > Have you some advices to help me to produce better informations to
 >   > understand this problem ?
 >  
 >  Some things that would probably be helpful to know:
 >  
 >  (a) What's umass0 attached to? Is it going through an ehci, uhci, or
 >  ohci USB controller, and what's in between? (The easiest way to answer
 >  this question is to forward a boot log from /var/run/dmesg.boot.)
 >  Also it might be useful to what your USB device calls itself, which
 >  will also be in the boot log.
 >  
 >  (b) Is there anything unusual you're doing with the USB device that
 >  might explain what's happening in case (3)? 
 >  
 >  (c) Where did crashes (1) and (2) happen? If you feel up to it, run
 >  "nm -n /netbsd | less" and find the last name before the EIP address
 >  from the crash (c03b0ea5 in case (1), c046a188 in case (2)) -- this is
 >  the name of the function it died in. In my kernel c03b0ea5 is between
 >  these:
 >     c03b0e60 T i4b_dl_release_ind
 >     c03b0f30 T i4b_dl_establish_cnf
 >  
 >  but that doesn't mean anything; it'll be different in yours. (If you
 >  can't do this, because you're using the prebuild 5.0.2 GENERIC kernel
 >  someone else can; but if you can, it saves waiting for someone else to
 >  get around to downloading that kernel and checking.)
 >  
 >  Note that crash (3) jumped to 0 and trying to look that up won't yield
 >  anything particularly interesting. :-/
 >  
 >  Unfortunately, there are a number of more-or-less known but unsolved
 >  problems with umass...
 >  
 >  -- 
 >  David A. Holland
 >  dholland@netbsd.org

 In case (1) c03b0ea5 is between :
   c03b0e70 T ext2fs_inactive
   c03b0fc0 T ext2fs_checkpath
 In case (2) c046a188 is between :
   c046a1d0 T kpsignal
   c046a270 T kpgsignal

 My USB disk was mounted is ext2fs. I has umounted it but my kernel
 continue to panic. In ddb mode, I obtained this trace :

 kpsignal2(cc552cf4,cad21d04,cad21d04,cbd41a5c,cc552cf4,cad21d24,cad21d40,c0476cd9,cc552cf4,cad21d04)
 at netbsd:kpsignal2+0x5a8
 kpsignal(cc552cf4,cad21d04,0,10c,10c,cad21d04,cad21d40,0,0,0) at
 netbsd:kpsignal+0x7a
 timer_intr(0,ca920010,ca920030,cad20010,ca920010,0,a3d360,c16b0400,0,cad21da0)
 at netbsd:timer_intr+0x229
 softint_dispatch(ca927c80,2,0,0,0,0,cad21d90,cad21ce4,cad21d00,0) at
 netbsd:softint_dispatch+0x64
 DDB lost frame for netbsd:Xsoftintr+0x3d, trying 0xcad21d88
 Xsoftintr() at netbsd:Xsoftintr+0x3d
 --- interrupt ---
 fatal page fault in supervisor mode
 trap type 6 code 0 eip c053f8a7 cs 8 eflags 10206 cr2 3a ilevel 8
 kernel: supervisor trap page fault, code=0
 Faulted in DDB; continuing...

 but I don't know if it is usefull ... I don't know why, but the dump of
 the kernel fails (nothin in /var/crash except 'minfree' file).
 I have compile my kernel from sources. Now I'm waiting for a panic and I
 know that nm is my friend ;)

 --
 FrÃ©dÃ©ric Fauberteau
 frederic@fauberteau.org

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: misc/43182: Reboot due to system crash
Date: Sat, 24 Apr 2010 02:32:21 +0000

 On Fri, Apr 23, 2010 at 07:45:01AM +0000, Fr?d?ric Fauberteau wrote:
  >  In case (1) c03b0ea5 is between :
  >    c03b0e70 T ext2fs_inactive
  >    c03b0fc0 T ext2fs_checkpath

 For this to happen ext2fs_inactive would have to be passed either a
 null vnode, a vnode with a null inode attached to it, or a vnode with
 a null v_mount. In theory, none of these should happen, even on error,
 so if you can get a stack trace from ddb or from a crashdump it might
 be useful.

  >  In case (2) c046a188 is between :
  >    c046a1d0 T kpsignal
  >    c046a270 T kpgsignal

 no it isn't, a188 < a1d0... but, since you have a trace:

  >  My USB disk was mounted is ext2fs. I has umounted it but my kernel
  >  continue to panic. In ddb mode, I obtained this trace :
  >  
  >  kpsignal2(cc552cf4,cad21d04,cad21d04,cbd41a5c,cc552cf4,cad21d24,cad21d40,c0476cd9,cc552cf4,cad21d04)
  >  at netbsd:kpsignal2+0x5a8
  >  kpsignal(cc552cf4,cad21d04,0,10c,10c,cad21d04,cad21d40,0,0,0) at
  >  netbsd:kpsignal+0x7a
  >  timer_intr(0,ca920010,ca920030,cad20010,ca920010,0,a3d360,c16b0400,0,cad21da0)
  >  at netbsd:timer_intr+0x229

 I don't see how this can be the same problem. Does this one continue
 to happen if you boot the machine without the USB device and don't
 insert it? Is there anything that seems to cause it or that it seems
 to be related to doing?

  >  but I don't know if it is usefull ... I don't know why, but the dump of
  >  the kernel fails (nothin in /var/crash except 'minfree' file).

 The most common reason is that the swap device isn't big enough to
 hold the dump image. There's a newish feature for writing small dumps
 but I'm not sure if it's in 5.0.2.

  >  I have compile my kernel from sources. Now I'm waiting for a panic and I
  >  know that nm is my friend ;)

 If you've done that, ddb is usually more your friend... although the
 nm -n technique is still sometimes useful. (objdump -d /netbsd can be
 useful too sometimes if one isn't afraid of wading through assembly
 code.)

 There's also some chance that if you build a kernel from the netbsd-5
 branch (5.0_STABLE, what will eventually become 5.1) either or both of
 these problems will go away. The 5.0.x releases (netbsd-5-0 branch)
 only get critical fixes.

 -- 
 David A. Holland
 dholland@netbsd.org

State-Changed-From-To: open->closed
State-Changed-By: triaxx@NetBSD.org
State-Changed-When: Fri, 04 Jan 2019 21:37:07 +0000
State-Changed-Why:
I don't use NetBSD-5 for a long time and I don't have this disk anymore.

Thank you David for giving me the desire to stay.


>Unformatted:
Home
PR Database Search
(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.