NetBSD Problem Report #44292

From kilbi@kilbi.de  Wed Dec 29 11:40:10 2010
Return-Path: <kilbi@kilbi.de>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id AA8F163B89A
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 29 Dec 2010 11:40:10 +0000 (UTC)
Message-Id: <20101229113454.8C43F38E63@mail.kilbi.de>
Date: Wed, 29 Dec 2010 12:34:53 +0100 (MET)
From: mk@kilbi.de
Reply-To: mk@kilbi.de
To: gnats-bugs@gnats.NetBSD.org
Subject: -current kernels do not work on (my) cobalt qube 2 since one (1!) year
X-Send-Pr-Version: 3.95

>Number:         44292
>Category:       port-cobalt
>Synopsis:       -current kernels do not work on (my) cobalt qube 2 since one (1!) year
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    tsutsui
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Dec 29 11:45:00 +0000 2010
>Closed-Date:    Sat Jan 22 18:31:51 +0000 2011
>Last-Modified:  Sun Jan 23 23:10:02 +0000 2011
>Originator:     Markus W Kilbinger
>Release:        NetBSD 5.99.22
>Organization:
>Environment:


System: NetBSD qube 5.99.22 NetBSD 5.99.22 (QUBE) #5: Fri Dec 11 09:24:20 MET 2009 root@q:/usr/u/NetBSD/HEAD/src/sys/arch/cobalt/compile/QUBE cobalt
Architecture: mipsel
Machine: cobalt
>Description:
	After one year of waiting (hoping)

	  http://mail-index.netbsd.org/port-cobalt/2010/06/05/msg000425.html

	I followed Izumi's advice

	  http://mail-index.netbsd.org/port-cobalt/2010/06/05/msg000426.html

	to send-pr: Longer than one year now I cannot run a -current
	kernel on my cobalt qube 2!

	Beside some panicing in the meantime (see my older post/link
	above) now a -current kernel (compiled from yesterdays
	sources) gets stuck at:

	  [...]
	  root on wd0a dumps on wd0b
	  root file system type: ffs
	  pid 1(init): ABI set to O32 (e_flags=0x1007)

	... and no further go!

	But: The same kernel boots fine under gxemul (simulating
	cobalt hardware)!?

	Last working -current kernel on my qube:

	  NetBSD qube 5.99.22 NetBSD 5.99.22 (QUBE) #5: Fri Dec 11 09:24:20 MET 2009

>How-To-Repeat:
	Try to boot/run an actual -current kernel on a cobalt qube 2 machine.
>Fix:
	unknown

>Release-Note:

>Audit-Trail:
From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: gnats-bugs@NetBSD.org
Cc: port-cobalt-maintainer@NetBSD.org, gnats-admin@NetBSD.org,
        netbsd-bugs@NetBSD.org, tsutsui@ceres.dti.ne.jp
Subject: Re: port-cobalt/44292: -current kernels do not work on (my) cobalt
	 qube 2 since one (1!) year
Date: Sat, 1 Jan 2011 00:36:07 +0900

 FYI,

 > 	to send-pr: Longer than one year now I cannot run a -current
 > 	kernel on my cobalt qube 2!

 At least 201006290000Z GENERIC kernel seems to work:
 ftp://ftp.jp.NetBSD.org/pub/NetBSD-daily/HEAD/201006290000Z/cobalt/binary/kernel/
 so there is some newer problem than mips64 merge.

 201007300000Z GENERIC doesn't start init(8) though.
 ftp://ftp.jp.NetBSD.org/pub/NetBSD-daily/HEAD/201007300000Z/cobalt/binary/kernel/

 > 	But: The same kernel boots fine under gxemul (simulating
 > 	cobalt hardware)!?

 Emulation in gxemul is not so precise since it's desgiend to run OSes,
 rather than emulationg exact hardware. Probably Rm52xx specific quirk?
 ---
 Izumi Tsutsui

Responsible-Changed-From-To: port-cobalt-maintainer->tsutsui
Responsible-Changed-By: tsutsui@NetBSD.org
Responsible-Changed-When: Wed, 19 Jan 2011 23:36:13 +0900
Responsible-Changed-Why:


State-Changed-From-To: open->feedback
State-Changed-By: tsutsui@NetBSD.org
State-Changed-When: Wed, 19 Jan 2011 23:36:13 +0900
State-Changed-Why:
Can you try patch?


From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: gnats-bugs@NetBSD.org
Cc: port-cobalt-maintainer@NetBSD.org, gnats-admin@NetBSD.org,
        netbsd-bugs@NetBSD.org, jklos@NetBSD.org, tsutsui@ceres.dti.ne.jp
Subject: Re: port-cobalt/44292: -current kernels do not work on (my) cobalt
	 qube 2 since one (1!) year
Date: Wed, 19 Jan 2011 23:34:11 +0900

 Reverting part of src/sys/dev/ic/com.c rev 1.298 seems to fix the problem.
 (201007200000Z works, 201007210000Z doesn't)

 Needs to use device properties to enable prescaler?
 More initialization is required?
 ---

 Index: com.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/ic/com.c,v
 retrieving revision 1.298
 retrieving revision 1.297
 diff -u -p -r1.298 -r1.297
 --- com.c	20 Jul 2010 06:17:20 -0000	1.298
 +++ com.c	19 Apr 2010 18:24:26 -0000	1.297
 @@ -465,8 +465,6 @@ com_attach_subr(struct com_softc *sc)
  					sc->sc_fifolen = 0;
  				} else {
  					SET(sc->sc_hwflags, COM_HW_FLOW);
 -					SET(sc->sc_mcr, MCR_PRESCALE);
 -					sc->sc_frequency /= 4;
  					sc->sc_fifolen = 32;
  				}
  			} else

 ---
 Izumi Tsutsui

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: jklos@NetBSD.org
Cc: gnats-bugs@NetBSD.org, port-cobalt-maintainer@NetBSD.org,
        gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org,
        tsutsui@ceres.dti.ne.jp
Subject: Re: port-cobalt/44292: -current kernels do not work on (my) cobaltqube
	 2 since one (1!) year
Date: Thu, 20 Jan 2011 00:58:19 +0900

 I wrote:
 > Needs to use device properties to enable prescaler?
 > More initialization is required?

 We can't touch frequency or prescaler in the attach function
 if the device is already initialized in earlier cnattach.

 I'm not sure which variants actually require prescaler
 but I'll fix the code to disable it if comconsattached.
 ---
 Izumi Tsutsui

From: Nick Hudson <nick.hudson@gmx.co.uk>
To: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
Cc: gnats-bugs@netbsd.org,
 port-cobalt-maintainer@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org,
 jklos@netbsd.org
Subject: Re: port-cobalt/44292: -current kernels do not work on (my) cobalt qube 2 since one (1!) year
Date: Wed, 19 Jan 2011 15:59:11 +0000

 On Wednesday 19 January 2011 14:34:11 Izumi Tsutsui wrote:
 [...]
 > Index: com.c
 > ===================================================================
 > RCS file: /cvsroot/src/sys/dev/ic/com.c,v
 > retrieving revision 1.298
 > retrieving revision 1.297
 > diff -u -p -r1.298 -r1.297
 > --- com.c	20 Jul 2010 06:17:20 -0000	1.298
 > +++ com.c	19 Apr 2010 18:24:26 -0000	1.297
 > @@ -465,8 +465,6 @@ com_attach_subr(struct com_softc *sc)
 >  					sc->sc_fifolen = 0;
 >  				} else {
 >  					SET(sc->sc_hwflags, COM_HW_FLOW);
 > -					SET(sc->sc_mcr, MCR_PRESCALE);
 > -					sc->sc_frequency /= 4;
 >  					sc->sc_fifolen = 32;
 >  				}
 >  			} else
 > 

 My Cobalt RaQ 2 boots 5.99.44 with this patch.

 Nick

From: David Laight <david@l8s.co.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-cobalt/44292: -current kernels do not work on (my) cobalt qube 2 since one (1!) year
Date: Wed, 19 Jan 2011 16:19:57 +0000

 On Wed, Jan 19, 2011 at 04:00:26PM +0000, Nick Hudson wrote:
 >  > -					SET(sc->sc_mcr, MCR_PRESCALE);
 >  > -					sc->sc_frequency /= 4;

 Presumably MCR_PRESCALE is generating a divide by 4.
 So doing this more than once generates an invalid sc_frequency ??
 So should this just be dependant on whether MCR_PRESCALE is already set?

 I guess SET(a,b) is '(a) |= (b)' just to confuse things.

 	David

 -- 
 David Laight: david@l8s.co.uk

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: gnats-bugs@NetBSD.org
Cc: tsutsui@ceres.dti.ne.jp
Subject: Re: port-cobalt/44292: -current kernels do not work on (my) cobalt
	 qube 2 since one (1!) year
Date: Thu, 20 Jan 2011 01:27:05 +0900

 >  >  > -					SET(sc->sc_mcr, MCR_PRESCALE);
 >  >  > -					sc->sc_frequency /= 4;
 >  
 >  Presumably MCR_PRESCALE is generating a divide by 4.
 >  So doing this more than once generates an invalid sc_frequency ??

 It's in com_attach_subr() so only once per attach,
 but it causes a problem if com(4) is already initialized
 in cnattach(), which doesn't care prescaler.

 >  So should this just be dependant on whether MCR_PRESCALE is already set?

 Presclaer is quite device dependent, so MCR_PRESSCALE should be set
 and sc_frequency should be adjusted in MD sys/arch/amiga/dev/com_supio.c
 attachment, I think.

 >  I guess SET(a,b) is '(a) |= (b)' just to confuse things.

 It's a separate discussion. (unless you will do cleanup the whole sources)
 ---
 Izumi Tsutsui

From: Markus W Kilbinger <mk@kilbi.de>
To: tsutsui@NetBSD.org, Martin Mersberger <gremlin@portal-to-web.de>,
    gnats-bugs@NetBSD.org
Cc: port-cobalt-maintainer@netbsd.org,
    netbsd-bugs@netbsd.org,
    gnats-admin@netbsd.org,
Subject: Re: port-cobalt/44292 (-current kernels do not work on (my) cobalt qube 2 since one (1!) year) [and 1 more messages]
Date: Thu, 20 Jan 2011 21:25:11 +0100

 >>>>> "tsutsui" == tsutsui  <tsutsui@NetBSD.org> writes:

     tsutsui> State-Changed-Why: Can you try patch?

 Sorry for the delay, now got some minutes to test your patch and
 indeed it makes the kernel pass the old stuck point:

   [...]
   root on wd0a dumps on wd0b
   root file system type: ffs
   pid 1(init): ABI set to O32 (e_flags=0x1007)
   Thu Jan 20 19:50:40 GMT 2011
   Starting root file system check:
   /dev/rwd0a: file system is clean; not checking
   swapctl: adding /dev/wd0b as swap device at priority 0
   Starting file system checks:
   /dev/rwd0e: file system is clean; not checking
   /dev/rwd0f: file system is clean; not checking
   /dev/rwd0a: file system is mounted read-write on /; not checking
   Setting tty flags.
   Setting sysctl variables:
   [...]

 but later (my qube is a heavy loaded machine ;-)) it crashes (at
 starting swapping?):

   [...]
   Starting squid.
   Starting spamd.
   pid 381(squid): trap: TLB miss (store) in kernel mode
   status=0xfc03, cause=0xc, epc=0x8000152c, vaddr=0xcc54f00c tf=0xcc54ef98 ksp=0xcc54eff8 ra=0x80001528
   Stopped in pid 381.1 (squid) at netbsd:MachFPInterrupt+0xc0:    sw      ra,20(sp
   )
   db>

 ... as Martin's cobalt machine under some load:

 >>>>> "Martin" == Martin Mersberger <gremlin@portal-to-web.de> writes:

     Martin>  My cobalt with -current just crashed while building
     Martin>  pkgsrc/shells/bash

     Martin>  db> bt
     Martin>  pid 18288(conftest): trap: TLB miss (load or instr.
     Martin>  fetch) in kernel mode status=0x3, cause=0x8808,
     Martin>  epc=0x802a1b98, vaddr=0xcb0bb014 tf=0xcb0bac78
     Martin>  ksp=0xcb0bacd8 ra=0x802a20bc Stopped in pid 18288.1
     Martin>  (conftest) at netbsd:kdbrpeek+0x30: lw v0,0(v1)

 This reproducable crash(es) were the other reason for writing the pr.

 Any more info I can provide?

 Markus.

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: mk@kilbi.de
Cc: gremlin@portal-to-web.de, gnats-bugs@NetBSD.org, tsutsui@ceres.dti.ne.jp
Subject: Re: port-cobalt/44292 (-current kernels do not work on (my) cobalt
	 qube 2 since one (1!) year) [and 1 more messages]
Date: Fri, 21 Jan 2011 18:57:46 +0900

 > but later (my qube is a heavy loaded machine ;-)) it crashes (at
 > starting swapping?):
  :
 > This reproducable crash(es) were the other reason for writing the pr.

 Could you file a new PR for this TLB miss problem?
 It's a bit annoying to find necessary info from
 a long PR including multiple replies.

 The original problem (init(8) not start) was caused by
 two independent mistakes but this one seems mips pmap issue.

 > Any more info I can provide?

 - userland version
 - kernel config (GENERIC or your custom)
 - (if custom) test with GENERIC
 - test GENERIC + options DIAGNOSTIC kernel
 etc?

 ---
 Izumi Tsutsui

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: mk@kilbi.de, gremlin@portal-to-web.de
Cc: gnats-bugs@NetBSD.org, tsutsui@ceres.dti.ne.jp
Subject: Re: port-cobalt/44292 (-current kernels do not work on (my) cobaltqube
	 2 since one (1!) year) [and 1 more messages]
Date: Sat, 22 Jan 2011 01:16:59 +0900

 > Could you file a new PR for this TLB miss problem?

 Ah, never mind, could you try the following patch?

 Index: arch/mips/mips/locore.S
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/mips/mips/locore.S,v
 retrieving revision 1.173
 diff -u -p -r1.173 locore.S
 --- arch/mips/mips/locore.S	22 Dec 2010 01:34:17 -0000	1.173
 +++ arch/mips/mips/locore.S	21 Jan 2011 16:13:39 -0000
 @@ -750,7 +750,7 @@ XNESTED(MachFPTrap)
   */
  FPReturn:
  	mfc0		t0, MIPS_COP_0_STATUS
 -	REG_S		ra, CALLFRAME_RA(sp)
 +	REG_L		ra, CALLFRAME_RA(sp)
  	and		t0, t0, ~MIPS_SR_COP_1_BIT
  	mtc0		t0, MIPS_COP_0_STATUS
  	COP0_SYNC

 ---
 Izumi Tsutsui

From: Markus W Kilbinger <mk@kilbi.de>
To: gnats-bugs@NetBSD.org
Cc: tsutsui@NetBSD.org,
    gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: Re: port-cobalt/44292 (-current kernels do not work on (my) cobaltqube
	 2 since one (1!) year) [and 1 more messages]
Date: Sat, 22 Jan 2011 16:22:12 +0100

 >>>>> "Izumi" == Izumi Tsutsui <tsutsui@ceres.dti.ne.jp> writes:

     IZUMI >> Could you file a new PR for this TLB miss problem?

     Izumi>  Ah, never mind, could you try the following patch?

     Izumi>  Index: arch/mips/mips/locore.S
     Izumi>  ===================================================================
     Izumi>  RCS file: /cvsroot/src/sys/arch/mips/mips/locore.S,v
     Izumi>  retrieving revision 1.173 diff -u -p -r1.173 locore.S
     Izumi>  --- arch/mips/mips/locore.S 22 Dec 2010 01:34:17 -0000
     Izumi>      1.173
     Izumi>  +++ arch/mips/mips/locore.S 21 Jan 2011 16:13:39 -0000
     Izumi>  @@ -750,7 +750,7 @@ XNESTED(MachFPTrap)
     Izumi>    */
     Izumi>   FPReturn:
     Izumi>   	mfc0 t0, MIPS_COP_0_STATUS
     Izumi>  - REG_S ra, CALLFRAME_RA(sp)
     Izumi>  + REG_L ra, CALLFRAME_RA(sp)
     Izumi>   	and t0, t0, ~MIPS_SR_COP_1_BIT mtc0 t0,
     Izumi>   	MIPS_COP_0_STATUS COP0_SYNC

 That helped!! My qube 2 is up and running a -current kernel and now
 userland quite flawlessly. No crash/panic so far!

 Thanks a lot!

 I guess, you can close the pr.

 What about -current's cobalt64 capabilities? Worth to try?

 Markus.

From: "Izumi Tsutsui" <tsutsui@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/44292 CVS commit: src/sys/dev/ic
Date: Sat, 22 Jan 2011 16:59:27 +0000

 Module Name:	src
 Committed By:	tsutsui
 Date:		Sat Jan 22 16:59:27 UTC 2011

 Modified Files:
 	src/sys/dev/ic: com.c

 Log Message:
 Revert part of changes in rev 1.298:
  - it breaks cobalt's serial console as mentioned in PR port-cobalt/44292
  - MCR_PRESCALE doesn't affect unless EFR_EFCR is set in the EFR register
  - even if MCR_PRESCALE is enabled we should define appropriate sc_type
    variants and BRG values should be adjusted in comspeed() per sc_type
  - sc_frequency should be adjusted in MD attachment if necessary
 Tested on cobalt by several people, ok from jklos@


 To generate a diff of this commit:
 cvs rdiff -u -r1.298 -r1.299 src/sys/dev/ic/com.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Izumi Tsutsui" <tsutsui@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/44292 CVS commit: src/sys/arch/mips/mips
Date: Sat, 22 Jan 2011 17:31:32 +0000

 Module Name:	src
 Committed By:	tsutsui
 Date:		Sat Jan 22 17:31:31 UTC 2011

 Modified Files:
 	src/sys/arch/mips/mips: locore.S

 Log Message:
 Fix a fatal typo that causes TLB miss panic in MachFPInterrupt().
 Reported in followups of PR port-cobalt/44292.


 To generate a diff of this commit:
 cvs rdiff -u -r1.173 -r1.174 src/sys/arch/mips/mips/locore.S

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: feedback->closed
State-Changed-By: tsutsui@NetBSD.org
State-Changed-When: Sun, 23 Jan 2011 03:31:51 +0900
State-Changed-Why:
Now they work properly by fixing misc small but critical bugs.


From: Martin Mersberger <gremlin@portal-to-web.de>
To: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
Cc: mk@kilbi.de, gnats-bugs@NetBSD.org
Subject: Re: port-cobalt/44292 (-current kernels do not work on (my) cobaltqube
  2 since one (1!) year) [and 1 more messages]
Date: Sun, 23 Jan 2011 11:42:54 +0100

 Hi...


 >> Could you file a new PR for this TLB miss problem?
 > Ah, never mind, could you try the following patch?
 > 
 > Index: arch/mips/mips/locore.S
 > ===================================================================
 > RCS file: /cvsroot/src/sys/arch/mips/mips/locore.S,v
 > retrieving revision 1.173
 ....

 Since my RAQ runs with that patch applied, I've not seen any problems
 anymore (since 2 days..) - Thanks for your help!!
 (... I'm preparing to run a ./build.sh build on that box and check if it
 survives that one as well ;-) )


 Markus, how is your Cube? ;-)


 regards
  Martin


From: Markus W Kilbinger <mk@kilbi.de>
To: gnats-bugs@NetBSD.org,tsutsui@NetBSD.org,gnats-admin@netbsd.org,netbsd-bugs@netbsd.org
Cc: 
Subject: Re: port-cobalt/44292 (-current kernels do not work on (my) cobaltqube  2 since one (1!) year) [and 1 more messages]
Date: Mon, 24 Jan 2011 00:08:38 +0100

 With the new kernel the old and new userland and pkgs seem to have problems: Some daemons suddenly die. My mail queue stocks ramdomly. From that point quite unusable :-/
 I had to go back to a 5.1 kernel/system just to be able to write this mail.
 Maybe I will find some time to investigate further.

 Markus.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.