NetBSD Problem Report #53788
From dholland@macaran.eecs.harvard.edu Sat Dec 15 10:50:37 2018
Return-Path: <dholland@macaran.eecs.harvard.edu>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 515EF7A1A9
for <gnats-bugs@gnats.NetBSD.org>; Sat, 15 Dec 2018 10:50:37 +0000 (UTC)
Message-Id: <20181215093123.DAC7A6E2AE@macaran.eecs.harvard.edu>
Date: Sat, 15 Dec 2018 04:31:23 -0500 (EST)
From: dholland@eecs.harvard.edu
Reply-To: dholland@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: vt switching panic
X-Send-Pr-Version: 3.95
>Number: 53788
>Category: kern
>Synopsis: vt switching panic
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Dec 15 10:55:00 +0000 2018
>Last-Modified: Sat Dec 15 21:05:01 +0000 2018
>Originator: David Holland
>Release: NetBSD 8.99.27 (20181205)
>Organization:
>Environment:
System: NetBSD macaran 8.99.27 NetBSD 8.99.27 (MACARAN) #50: Wed Dec 5 19:34:11 EST 2018 dholland@macaran:/usr/src/sys/arch/amd64/compile/MACARAN amd64
Architecture: x86_64
Machine: amd64
>Description:
Switching from console 4 (running X :0) to console 1 (which
had formerly been running X :1, but the X server had
apparently cored a couple hours earlier without me noticing)
with ctrl-alt-F2 triggered this panic:
panic: kernel diagnostic assertion "!cpu_softintr_p()" failed: file "../../../../kern/subr_kmem.c", line 389
I got a crashdump; trace follows:
#2 0xffffffff8074db35 in kern_assert (
fmt=fmt@entry=0xffffffff8089b1f8 "kernel %sassertion \"%s\" failed: file \"%s\", line %d ") at ../../../../../../lib/libkern/kern_assert.c:51
#3 0xffffffff8051cc52 in kmem_free (p=0xffff80ea92db17f0, size=216)
at ../../../../kern/subr_kmem.c:389
#4 0xffffffff804129b2 in usl_sync_done (sd=<optimized out>)
at ../../../../dev/wscons/wsdisplay_compat_usl.c:138
#5 0xffffffff80412a40 in usl_sync_check_sig (sd=0xffff80ea92db17f0, sig=30,
flags=2) at ../../../../dev/wscons/wsdisplay_compat_usl.c:156
#6 0xffffffff80412ba2 in usl_attachproc (cookie=0xffff80ea92db17f0,
waitok=<optimized out>, callback=<optimized out>, cbarg=<optimized out>)
at ../../../../dev/wscons/wsdisplay_compat_usl.c:253
#7 0xffffffff80411eab in wsdisplay_switch2 (dv=0xffff80ec217f8c88, error=0,
waitok=0) at ../../../../dev/wscons/wsdisplay.c:1978
#8 0xffffffff80508ba6 in callout_softclock (v=<optimized out>)
at ../../../../kern/kern_timeout.c:739
#9 0xffffffff804fcc0b in softint_execute (l=<optimized out>, s=2,
si=0xffff84808adb00c0) at ../../../../kern/kern_softint.c:592
#10 softint_dispatch (pinned=<optimized out>, s=2)
at ../../../../kern/kern_softint.c:874
#11 0xffffffff8021d21f in Xsoftintr ()
>How-To-Repeat:
Unclear. Haven't seen this before, but this kernel was new
last week so there have been relatively few opportunities to
trigger it. (But I often vt switch a lot and often run
multiple X servers.)
>Fix:
Dunno. Not obvious from the trace why it doesn't do this every
time, but it clearly doesn't.
>Audit-Trail:
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/53788: vt switching panic
Date: Sat, 15 Dec 2018 11:44:14 -0000 (UTC)
dholland@eecs.harvard.edu writes:
> Switching from console 4 (running X :0) to console 1 (which
> had formerly been running X :1, but the X server had
> apparently cored a couple hours earlier without me noticing)
> with ctrl-alt-F2 triggered this panic:
> panic: kernel diagnostic assertion "!cpu_softintr_p()" failed: file "../../../../kern/subr_kmem.c", line 389
The usl_syncdata structure is usually allocated by the program that
tries to own the display and freed when it returns ownership after
being sent a signal. Both operations are done by an ioctl in process
context.
But when the owner process has died and someone tries to switch
the display with ctrl-alt-fX, the free operation is done by the
keyboard interrupt handler.
Usually you don't see this as the X server catches most signals,
even segfaults, and tries to give back ownership before it exits.
--
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, dholland@NetBSD.org
Cc:
Subject: Re: kern/53788: vt switching panic
Date: Sat, 15 Dec 2018 11:13:55 -0500
On Dec 15, 11:50am, mlelstv@serpens.de (Michael van Elst) wrote:
-- Subject: Re: kern/53788: vt switching panic
| The usl_syncdata structure is usually allocated by the program that
| tries to own the display and freed when it returns ownership after
| being sent a signal. Both operations are done by an ioctl in process
| context.
|
| But when the owner process has died and someone tries to switch
| the display with ctrl-alt-fX, the free operation is done by the
| keyboard interrupt handler.
|
| Usually you don't see this as the X server catches most signals,
| even segfaults, and tries to give back ownership before it exits.
|
So why don't we use a linked list of sync cookies to free or kmem_intr_alloc/
kmem_intr_free. This will always panic with DIAGNOSTIC when there is a timeout.
christos
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/53788: vt switching panic
Date: Sat, 15 Dec 2018 18:20:37 -0000 (UTC)
christos@zoulas.com (Christos Zoulas) writes:
>So why don't we use a linked list of sync cookies to free or kmem_intr_alloc/
>kmem_intr_free. This will always panic with DIAGNOSTIC when there is a timeout.
We can probably just replace the allocations with kmem_intr_alloc and
kmem_intr_free.
Or we could defer the switch operations to a process context, e.g. a workqueue.
--
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, dholland@NetBSD.org
Cc:
Subject: Re: kern/53788: vt switching panic
Date: Sat, 15 Dec 2018 13:53:31 -0500
On Dec 15, 6:25pm, mlelstv@serpens.de (Michael van Elst) wrote:
-- Subject: Re: kern/53788: vt switching panic
| We can probably just replace the allocations with kmem_intr_alloc and
| kmem_intr_free.
|
| Or we could defer the switch operations to a process context, e.g. a workqueue.
I'll go for the intr flavors for now.
Best,
christos
From: matthew green <mrg@eterna.com.au>
To: christos@zoulas.com (Christos Zoulas)
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, dholland@NetBSD.org
Subject: re: kern/53788: vt switching panic
Date: Sun, 16 Dec 2018 08:00:01 +1100
Christos Zoulas writes:
> On Dec 15, 6:25pm, mlelstv@serpens.de (Michael van Elst) wrote:
> -- Subject: Re: kern/53788: vt switching panic
> =
> | We can probably just replace the allocations with kmem_intr_alloc and
> | kmem_intr_free.
> | =
> | Or we could defer the switch operations to a process context, e.g. a =
workqueue.
> =
> I'll go for the intr flavors for now.
i agree -- it's likely only one small allocation per X server,
so it's also very uncommon (only start X start/end/timeout),
so this is the simplest and clearest fix that out any sort of
work on usl code itself (which seems warranted -- there are
other reports of vt switch issues.)
.mrg.
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.