NetBSD Problem Report #57400

From www@netbsd.org  Wed May 10 10:59:26 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id EC8851A923B
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 10 May 2023 10:59:25 +0000 (UTC)
Message-Id: <20230510105924.5B2401A923D@mollari.NetBSD.org>
Date: Wed, 10 May 2023 10:59:24 +0000 (UTC)
From: dave_daniels@argonet.co.uk
Reply-To: dave_daniels@argonet.co.uk
To: gnats-bugs@NetBSD.org
Subject: Fatal kernel mode data abort 'translation fault' at wsevent_ibject_0x7c
X-Send-Pr-Version: www-1.0

>Number:         57400
>Category:       kern
>Synopsis:       Fatal kernel mode data abort 'translation fault' at wsevent_ibject_0x7c
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed May 10 11:00:00 +0000 2023
>Last-Modified:  Fri May 19 18:35:01 +0000 2023
>Originator:     Dave Daniels
>Release:        9.3
>Organization:
>Environment:
NetBSD bluebox.nowhere.com  9.3 NetBSD 9.3 (GENERIC) #0: Thu Aug 4 15:30:37 UTC 2022 mkrepro@mkrepro.NetBSD.org: /usr/src/sys/arch/evbarm/compile/GENERIC evbarm
>Description:
After exiting from ctwm I have had several immediate kernel panics. The messages are the same each time:

uvm_fault(ox809baa20, 0, 2) -> e
Fatal kernel mode data abort: 'Translation Fault (S)'
trapframe: 0xb955fa60
FSR=00000805 FAR=00001560 spsr=20070013
r0=00000002, R1=00000000, r2 =00001560, r3 =00001560
r4 =90b5e450, r5 =b955fab0, r6 =b955faf8, r7=00000018
R9 =800d47bc, r9=90de0000, r10=00000000, r11=b955fadc
r12=b955faf8, ssp=b955fab0, slr=00001568, pc=802b4268
Stopped in pid 0.5 (system) at netnsd:wsevent_inject+0x7c: str R0,[r1,r3]

This causes the machine to lock up completely and the only option is to switch it off and on again.

The fault is intermittent but common, that is, it happens a lot but not every time. I am doing nothing more than starting an extra 'uxterm' or 'xterm' session in ctwm before quitting from ctwm.




The machine is a Raspberry Pi 2. I have only just installed NetBSD and have not added any other applications yet so it is running 'as supplied'. 

>How-To-Repeat:
Exit from ctwm using the 'quit' option on the 'NetBSD' menu.


>Fix:

>Audit-Trail:
From: David Daniels <dave_daniels@argonet.co.uk>
To: gnat-bugs <gnats-bugs@NetBSD.org>
Cc: 
Subject: Re: kern/57400
Date: Thu, 11 May 2023 20:15:32 +0100

 I have carried out some tests and it appears that this problem is
 linked in some way to my use of a KVM switch.

 I noticed that I kept getting the following messages coming out on
 the first console (ttyE0):

 wsmouse0: detached
 ums0: detached
 uhidev1: detached
 uhidev1 at uhub2 port3 (addr 6) disconnected
 uhidev1 at uhub2 port3 configuration 1 interface 0
 uhidev1: pixart... (details of USB mouse)
 ums0: at uhidev1: 3 button and Z dir
 wsmouse at ums0 mux0

 These were coming out roughly every 55 seconds. Just to see
 what happened I connected a keyboard and mouse directly to the
 Raspberry Pi and disconnected it from the KVM. As expected, these
 messages stopped coming out. **The important thing is that I also
 stopped getting the kernel panics.** Later I reconnected the Pi to
 the KVM and I started seeing the kernel panics again. I can
 reproduce them fairly readily by switching away from the Raspberry
 Pi to a different machine, waiting a few minutes and then
 switching back to the Pi and quitting from ctwm.

 I hope that helps. As I said, I think that my use of a KVM switch
 is linked to this problem. THE KVM, by the way, is a Rytaki
 KM401B-RY-C.

 Regards,
 Dave Daniels




From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/57400
Date: Fri, 12 May 2023 14:06:34 -0000 (UTC)

 dave_daniels@argonet.co.uk (David Daniels) writes:

 > wsmouse0: detached
 > ums0: detached
 > uhidev1: detached
 > uhidev1 at uhub2 port3 (addr 6) disconnected
 > uhidev1 at uhub2 port3 configuration 1 interface 0
 > uhidev1: pixart... (details of USB mouse)
 > ums0: at uhidev1: 3 button and Z dir
 > wsmouse at ums0 mux0
 > 
 > These were coming out roughly every 55 seconds.


 That's common behaviour of hardware "mice" these days.
 They disconnect and reconnect continously unless they
 are used.

 So either, start an X server that handles the mouse
 or run wsmoused to keep the connection open.

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/57400
Date: Sun, 14 May 2023 00:46:41 +0000

 On Fri, May 12, 2023 at 02:10:01PM +0000, Michael van Elst wrote:
  >  > wsmouse0: detached
  >  > ums0: detached
  >  > uhidev1: detached
  >  > uhidev1 at uhub2 port3 (addr 6) disconnected
  >  > uhidev1 at uhub2 port3 configuration 1 interface 0
  >  > uhidev1: pixart... (details of USB mouse)
  >  > ums0: at uhidev1: 3 button and Z dir
  >  > wsmouse at ums0 mux0
  >  > 
  >  > These were coming out roughly every 55 seconds.
  >  
  >  
  >  That's common behaviour of hardware "mice" these days.
  >  They disconnect and reconnect continously unless they
  >  are used.
  >  
  >  So either, start an X server that handles the mouse
  >  or run wsmoused to keep the connection open.

 It's odd that it would happen only when connected through a KVM switch
 though.

 -- 
 David A. Holland
 dholland@netbsd.org

From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org, dave_daniels@argonet.co.uk
Subject: re: kern/57400
Date: Sun, 14 May 2023 16:46:35 +1000

 >  That's common behaviour of hardware "mice" these days.
 >  They disconnect and reconnect continously unless they
 >  are used.
 >  =

 >  So either, start an X server that handles the mouse
 >  or run wsmoused to keep the connection open.

 actually we have a method to solve this on a per-device basis:

 sys/dev/usb/usb_quirks.h:54:#define UQ_ALWAYS_ON   0x100000       /* for m=
 ice that keep disconnecting */

 sys/dev/usb/usb_quirks.c: { USB_VENDOR_CHICONY,         USB_PRODUCT_CHICON=
 Y_OPTMOUSE0939,       ANY,
 sys/dev/usb/usb_quirks.c:       { UQ_ALWAYS_ON, NULL }},

 adding this keybaord to the quirks list should fix the problem
 without having to run X or something.


 .mrg.

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/57400
Date: Sun, 14 May 2023 06:53:00 -0000 (UTC)

 mrg@eterna.com.au (matthew green) writes:

 >Y_OPTMOUSE0939,       ANY,
 >sys/dev/usb/usb_quirks.c:       { UQ_ALWAYS_ON, NULL }},

 >adding this keybaord to the quirks list should fix the problem
 >without having to run X or something.


 In another OS, that's the default behaviour.

From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org, dave_daniels@argonet.co.uk
Subject: re: kern/57400
Date: Mon, 15 May 2023 11:23:43 +1000

 >  mrg@eterna.com.au (matthew green) writes:
 >  
 >  >Y_OPTMOUSE0939,       ANY,
 >  >sys/dev/usb/usb_quirks.c:       { UQ_ALWAYS_ON, NULL }},
 >  
 >  >adding this keybaord to the quirks list should fix the problem
 >  >without having to run X or something.
 >  
 >  In another OS, that's the default behaviour.

 i think i recall talking with jmcneill about doing this normally,
 so if it just works everywhere, perhaps we just should..


 .mrg.

From: David daniels <dave_daniels@argonet.co.uk>
To: gnat-bugs <gnats-bugs@NetBSD.org>
Cc: 
Subject: Re: kern/57400
Date: Mon, 15 May 2023 12:54:32 +0100

 Thank you for all your updates. I appreciate your thoughts on
 getting rid of the 'wsmouse0 detached' messages but my main
 concern is the kernel panic I was seeing. Has anyone had any
 thoughts on this?

 Regards,
 Dave Daniels


From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/57400
Date: Mon, 15 May 2023 12:41:12 -0000 (UTC)

 dave_daniels@argonet.co.uk (David daniels) writes:

 > Thank you for all your updates. I appreciate your thoughts on
 > getting rid of the 'wsmouse0 detached' messages but my main
 > concern is the kernel panic I was seeing. Has anyone had any
 > thoughts on this?

 The crash comes from:

 Stopped in pid 0.5 (system) at netnsd:wsevent_inject+0x7c: str R0,[r1,r3]

                 we = EVARRAY(ev, ev->put);
                 we->type = events[i].type;

 where we is a NULL pointer, e.g. ev->q is a NULL pointer.
 This isn't checked in wsbell_detach, wskbd_detach, wsmouse_detach.
 But wmmux_do_ioctl for WSMUXIO_INJECTEVENT does not check ev->q.

 wskbd_deliver_event and wsmouse_input only check ev->q if compiled
 with DIAGNOSTIC (that's a bug by itself).


 So if you test a kernel with DIAGNOSTIC and instead of a crash
 you get a diagnostic message, then we know where the problem is.

 Otherwise it's probably a race with 'wsmouse detached' and
 preventing the auto-detach is a crude workaround.



From: David Daniels <dave_daniels@argonet.co.uk>
To: gnat-bugs <gnats-bugs@NetBSD.org>
Cc: 
Subject: Re: kern/57400
Date: Fri, 19 May 2023 19:11:34 +0100

 I have managed to compile a kernel with the DIAGNOSTIC option by
 adding the line 'option DIAGNOSTIC' to the configuration file for
 the kernel (RPI2). I assume this is how you do it. I have carried
 out some tests with this kernel and now get the message:

 wskbd_input: evar->q=NULL

 What I did was:

 1) Boot with the diagnostic kernel
 2) Log on and start a Xwindows session
 3) Switch to a different machine on the KVM
 4) Wait for a minute or so
 5) Switch back to the NetBSD machine
 6) Press Ctrl-Alt-F1 to go to the ttyE0 session

 The message came out twice after this whenever I pressed and
 released any key, that is, one message came out when the key
 was pressed and one came out when it was released. There was no
 kernel panic this time.

 I found that if I switched to a different machine on the
 KVM again and then switched back the keyboard started to work
 and I no longer got the wskbd_input message. There were messages
 to indicate that it had reconnected to the keyboard as follows:

 wskbd_input: evar->q=NULL
 wskbd_input: evar->q=NULL
 ukbd0: was console keyboard
 wskbd0: detached
 ukbd0: detached
 uhidev0: detached
 uhidev0: at uhub2 port 2 (addr 5) disconnected
 wsmouse0: detached
 ums0: detached
 uhidev1: detached
 uhidev1: at uhub2 port 3 (addr 6) disconnected
 uhub2: detached
 uhub2: at uhub1 port 2 (addr 4) disconnected
 uhub2 at uhub1 port 2: vendor 1a40 (0x1a40) USB 2.0 Hub [Safe] (0x101), class 9/0, rev 2.00/1.11, addr 4
 uhub2: 4 ports with 4 removable, self powered
 uhidev0 at uhub2 port 2 configuration 1 interface 0
 uhidev0: Chicony (0x4f2) USB Keyboard (0x110), rev 1.10/1.01, addr 5, iclass 3/1
 ukbd0 at uhidev0: 8 Variable keys, 6 Array codes
 wskbd0 at ukbd0: console keyboard, using wsdisplay0
 uhidev1 at uhub2 port 3 configuration 1 interface 0
 uhidev1: PixArt (0x93a) USB Optical Mouse (0x2510), rev 1.10/1.00, addr 6, iclass 3/1
 ums0 at uhidev1: 3 buttons and Z dir
 wsmouse0 at ums0 mux 0

 Hopefully this will give you some more information. I do not
 get the kernel panic when using the diagnostic kernel so I will
 continue using this for now. I have also got a workaround for when
 it starts putting out the wskbd_input messages, that is, to switch
 to another session on the KVM and back again, so I am in a
 position where the machine is usable with the KVM now.

 I would be happy to test any patch you write for this problem.

 Regards,
 Dave Daniels
Home
PR Database Search
(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.