NetBSD Problem Report #57400
From www@netbsd.org Wed May 10 10:59:26 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id EC8851A923B
for <gnats-bugs@gnats.NetBSD.org>; Wed, 10 May 2023 10:59:25 +0000 (UTC)
Message-Id: <20230510105924.5B2401A923D@mollari.NetBSD.org>
Date: Wed, 10 May 2023 10:59:24 +0000 (UTC)
From: dave_daniels@argonet.co.uk
Reply-To: dave_daniels@argonet.co.uk
To: gnats-bugs@NetBSD.org
Subject: Fatal kernel mode data abort 'translation fault' at wsevent_ibject_0x7c
X-Send-Pr-Version: www-1.0
>Number: 57400
>Category: kern
>Synopsis: Fatal kernel mode data abort 'translation fault' at wsevent_ibject_0x7c
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed May 10 11:00:00 +0000 2023
>Last-Modified: Fri May 19 18:35:01 +0000 2023
>Originator: Dave Daniels
>Release: 9.3
>Organization:
>Environment:
NetBSD bluebox.nowhere.com 9.3 NetBSD 9.3 (GENERIC) #0: Thu Aug 4 15:30:37 UTC 2022 mkrepro@mkrepro.NetBSD.org: /usr/src/sys/arch/evbarm/compile/GENERIC evbarm
>Description:
After exiting from ctwm I have had several immediate kernel panics. The messages are the same each time:
uvm_fault(ox809baa20, 0, 2) -> e
Fatal kernel mode data abort: 'Translation Fault (S)'
trapframe: 0xb955fa60
FSR=00000805 FAR=00001560 spsr=20070013
r0=00000002, R1=00000000, r2 =00001560, r3 =00001560
r4 =90b5e450, r5 =b955fab0, r6 =b955faf8, r7=00000018
R9 =800d47bc, r9=90de0000, r10=00000000, r11=b955fadc
r12=b955faf8, ssp=b955fab0, slr=00001568, pc=802b4268
Stopped in pid 0.5 (system) at netnsd:wsevent_inject+0x7c: str R0,[r1,r3]
This causes the machine to lock up completely and the only option is to switch it off and on again.
The fault is intermittent but common, that is, it happens a lot but not every time. I am doing nothing more than starting an extra 'uxterm' or 'xterm' session in ctwm before quitting from ctwm.
The machine is a Raspberry Pi 2. I have only just installed NetBSD and have not added any other applications yet so it is running 'as supplied'.
>How-To-Repeat:
Exit from ctwm using the 'quit' option on the 'NetBSD' menu.
>Fix:
>Audit-Trail:
From: David Daniels <dave_daniels@argonet.co.uk>
To: gnat-bugs <gnats-bugs@NetBSD.org>
Cc:
Subject: Re: kern/57400
Date: Thu, 11 May 2023 20:15:32 +0100
I have carried out some tests and it appears that this problem is
linked in some way to my use of a KVM switch.
I noticed that I kept getting the following messages coming out on
the first console (ttyE0):
wsmouse0: detached
ums0: detached
uhidev1: detached
uhidev1 at uhub2 port3 (addr 6) disconnected
uhidev1 at uhub2 port3 configuration 1 interface 0
uhidev1: pixart... (details of USB mouse)
ums0: at uhidev1: 3 button and Z dir
wsmouse at ums0 mux0
These were coming out roughly every 55 seconds. Just to see
what happened I connected a keyboard and mouse directly to the
Raspberry Pi and disconnected it from the KVM. As expected, these
messages stopped coming out. **The important thing is that I also
stopped getting the kernel panics.** Later I reconnected the Pi to
the KVM and I started seeing the kernel panics again. I can
reproduce them fairly readily by switching away from the Raspberry
Pi to a different machine, waiting a few minutes and then
switching back to the Pi and quitting from ctwm.
I hope that helps. As I said, I think that my use of a KVM switch
is linked to this problem. THE KVM, by the way, is a Rytaki
KM401B-RY-C.
Regards,
Dave Daniels
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57400
Date: Fri, 12 May 2023 14:06:34 -0000 (UTC)
dave_daniels@argonet.co.uk (David Daniels) writes:
> wsmouse0: detached
> ums0: detached
> uhidev1: detached
> uhidev1 at uhub2 port3 (addr 6) disconnected
> uhidev1 at uhub2 port3 configuration 1 interface 0
> uhidev1: pixart... (details of USB mouse)
> ums0: at uhidev1: 3 button and Z dir
> wsmouse at ums0 mux0
>
> These were coming out roughly every 55 seconds.
That's common behaviour of hardware "mice" these days.
They disconnect and reconnect continously unless they
are used.
So either, start an X server that handles the mouse
or run wsmoused to keep the connection open.
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57400
Date: Sun, 14 May 2023 00:46:41 +0000
On Fri, May 12, 2023 at 02:10:01PM +0000, Michael van Elst wrote:
> > wsmouse0: detached
> > ums0: detached
> > uhidev1: detached
> > uhidev1 at uhub2 port3 (addr 6) disconnected
> > uhidev1 at uhub2 port3 configuration 1 interface 0
> > uhidev1: pixart... (details of USB mouse)
> > ums0: at uhidev1: 3 button and Z dir
> > wsmouse at ums0 mux0
> >
> > These were coming out roughly every 55 seconds.
>
>
> That's common behaviour of hardware "mice" these days.
> They disconnect and reconnect continously unless they
> are used.
>
> So either, start an X server that handles the mouse
> or run wsmoused to keep the connection open.
It's odd that it would happen only when connected through a KVM switch
though.
--
David A. Holland
dholland@netbsd.org
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, dave_daniels@argonet.co.uk
Subject: re: kern/57400
Date: Sun, 14 May 2023 16:46:35 +1000
> That's common behaviour of hardware "mice" these days.
> They disconnect and reconnect continously unless they
> are used.
> =
> So either, start an X server that handles the mouse
> or run wsmoused to keep the connection open.
actually we have a method to solve this on a per-device basis:
sys/dev/usb/usb_quirks.h:54:#define UQ_ALWAYS_ON 0x100000 /* for m=
ice that keep disconnecting */
sys/dev/usb/usb_quirks.c: { USB_VENDOR_CHICONY, USB_PRODUCT_CHICON=
Y_OPTMOUSE0939, ANY,
sys/dev/usb/usb_quirks.c: { UQ_ALWAYS_ON, NULL }},
adding this keybaord to the quirks list should fix the problem
without having to run X or something.
.mrg.
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57400
Date: Sun, 14 May 2023 06:53:00 -0000 (UTC)
mrg@eterna.com.au (matthew green) writes:
>Y_OPTMOUSE0939, ANY,
>sys/dev/usb/usb_quirks.c: { UQ_ALWAYS_ON, NULL }},
>adding this keybaord to the quirks list should fix the problem
>without having to run X or something.
In another OS, that's the default behaviour.
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, dave_daniels@argonet.co.uk
Subject: re: kern/57400
Date: Mon, 15 May 2023 11:23:43 +1000
> mrg@eterna.com.au (matthew green) writes:
>
> >Y_OPTMOUSE0939, ANY,
> >sys/dev/usb/usb_quirks.c: { UQ_ALWAYS_ON, NULL }},
>
> >adding this keybaord to the quirks list should fix the problem
> >without having to run X or something.
>
> In another OS, that's the default behaviour.
i think i recall talking with jmcneill about doing this normally,
so if it just works everywhere, perhaps we just should..
.mrg.
From: David daniels <dave_daniels@argonet.co.uk>
To: gnat-bugs <gnats-bugs@NetBSD.org>
Cc:
Subject: Re: kern/57400
Date: Mon, 15 May 2023 12:54:32 +0100
Thank you for all your updates. I appreciate your thoughts on
getting rid of the 'wsmouse0 detached' messages but my main
concern is the kernel panic I was seeing. Has anyone had any
thoughts on this?
Regards,
Dave Daniels
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/57400
Date: Mon, 15 May 2023 12:41:12 -0000 (UTC)
dave_daniels@argonet.co.uk (David daniels) writes:
> Thank you for all your updates. I appreciate your thoughts on
> getting rid of the 'wsmouse0 detached' messages but my main
> concern is the kernel panic I was seeing. Has anyone had any
> thoughts on this?
The crash comes from:
Stopped in pid 0.5 (system) at netnsd:wsevent_inject+0x7c: str R0,[r1,r3]
we = EVARRAY(ev, ev->put);
we->type = events[i].type;
where we is a NULL pointer, e.g. ev->q is a NULL pointer.
This isn't checked in wsbell_detach, wskbd_detach, wsmouse_detach.
But wmmux_do_ioctl for WSMUXIO_INJECTEVENT does not check ev->q.
wskbd_deliver_event and wsmouse_input only check ev->q if compiled
with DIAGNOSTIC (that's a bug by itself).
So if you test a kernel with DIAGNOSTIC and instead of a crash
you get a diagnostic message, then we know where the problem is.
Otherwise it's probably a race with 'wsmouse detached' and
preventing the auto-detach is a crude workaround.
From: David Daniels <dave_daniels@argonet.co.uk>
To: gnat-bugs <gnats-bugs@NetBSD.org>
Cc:
Subject: Re: kern/57400
Date: Fri, 19 May 2023 19:11:34 +0100
I have managed to compile a kernel with the DIAGNOSTIC option by
adding the line 'option DIAGNOSTIC' to the configuration file for
the kernel (RPI2). I assume this is how you do it. I have carried
out some tests with this kernel and now get the message:
wskbd_input: evar->q=NULL
What I did was:
1) Boot with the diagnostic kernel
2) Log on and start a Xwindows session
3) Switch to a different machine on the KVM
4) Wait for a minute or so
5) Switch back to the NetBSD machine
6) Press Ctrl-Alt-F1 to go to the ttyE0 session
The message came out twice after this whenever I pressed and
released any key, that is, one message came out when the key
was pressed and one came out when it was released. There was no
kernel panic this time.
I found that if I switched to a different machine on the
KVM again and then switched back the keyboard started to work
and I no longer got the wskbd_input message. There were messages
to indicate that it had reconnected to the keyboard as follows:
wskbd_input: evar->q=NULL
wskbd_input: evar->q=NULL
ukbd0: was console keyboard
wskbd0: detached
ukbd0: detached
uhidev0: detached
uhidev0: at uhub2 port 2 (addr 5) disconnected
wsmouse0: detached
ums0: detached
uhidev1: detached
uhidev1: at uhub2 port 3 (addr 6) disconnected
uhub2: detached
uhub2: at uhub1 port 2 (addr 4) disconnected
uhub2 at uhub1 port 2: vendor 1a40 (0x1a40) USB 2.0 Hub [Safe] (0x101), class 9/0, rev 2.00/1.11, addr 4
uhub2: 4 ports with 4 removable, self powered
uhidev0 at uhub2 port 2 configuration 1 interface 0
uhidev0: Chicony (0x4f2) USB Keyboard (0x110), rev 1.10/1.01, addr 5, iclass 3/1
ukbd0 at uhidev0: 8 Variable keys, 6 Array codes
wskbd0 at ukbd0: console keyboard, using wsdisplay0
uhidev1 at uhub2 port 3 configuration 1 interface 0
uhidev1: PixArt (0x93a) USB Optical Mouse (0x2510), rev 1.10/1.00, addr 6, iclass 3/1
ums0 at uhidev1: 3 buttons and Z dir
wsmouse0 at ums0 mux 0
Hopefully this will give you some more information. I do not
get the kernel panic when using the diagnostic kernel so I will
continue using this for now. I have also got a workaround for when
it starts putting out the wskbd_input messages, that is, to switch
to another session on the KVM and back again, so I am in a
position where the machine is usable with the KVM now.
I would be happy to test any patch you write for this problem.
Regards,
Dave Daniels
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.