NetBSD Problem Report #56867
From www@netbsd.org Tue Jun 7 02:31:30 2022
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 487B81A921F
for <gnats-bugs@gnats.NetBSD.org>; Tue, 7 Jun 2022 02:31:30 +0000 (UTC)
Message-Id: <20220607023128.9B3941A923C@mollari.NetBSD.org>
Date: Tue, 7 Jun 2022 02:31:28 +0000 (UTC)
From: tgl@sss.pgh.pa.us
Reply-To: tgl@sss.pgh.pa.us
To: gnats-bugs@NetBSD.org
Subject: hppa: intermittent SIGSEGV reports in t_ptrace_wait's stepN and setstepN test cases
X-Send-Pr-Version: www-1.0
>Number: 56867
>Category: port-hppa
>Synopsis: hppa: intermittent SIGSEGV reports in t_ptrace_wait's stepN and setstepN test cases
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-hppa-maintainer
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Jun 07 02:35:00 +0000 2022
>Closed-Date: Fri Jun 17 06:02:07 +0000 2022
>Last-Modified: Fri Jun 17 06:02:07 +0000 2022
>Originator: Tom Lane
>Release: HEAD/202206030100Z
>Organization:
PostgreSQL Global Development Group
>Environment:
NetBSD sss2.sss.pgh.pa.us 9.99.97 NetBSD 9.99.97 (SD2) #0: Fri Jun 3 12:30:06 EDT 2022 tgl@nuc1.sss.pgh.pa.us:/home/tgl/netbsd-H-202206030100Z/obj.hppa/sys/arch/hppa/compile/SD2 hppa
>Description:
After applying the fixes proposed in PRs 56864, 56865, 56866, I still see one class of failures in t_ptrace_wait and sibling test programs: the stepN and setstepN test cases frequently complain that they see SIGSEGV rather than SIGTRAP as the WSTOPSIG(status) result after an attempted step. The failure rate is near 100% if you do it via atf-run, but if you invoke these tests individually they frequently pass, so there's something nondeterministic in there.
>How-To-Repeat:
This way fails pretty reproducibly:
$ cd /usr/tests/
$ atf-run lib/libc/sys/t_ptrace_wait
This way succeeds more often than not for me, but sometimes fails with the same symptom:
$ /usr/tests/lib/libc/sys/t_ptrace_wait step1
(replace step1 with any related test case, same results)
>Fix:
I have not isolated the cause, and may not be able to because my lone HPPA machine has developed hardware issues. But I wanted to memorialize this issue just to clarify that the preceding PRs don't fully fix this test program.
Given the evident nondeterminism, the hypothesis that I was about to investigate when my machine suddenly started making weird noises is that if we get a TLB miss when trying to execute the single intended instruction, trap.c somehow misbehaves and reaches the place where it reports SIGSEGV while trying to handle the TLB miss trap. It might be something quite different though.
>Release-Note:
>Audit-Trail:
From: Tom Lane <tgl@sss.pgh.pa.us>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-hppa/56867: hppa: intermittent SIGSEGV reports in t_ptrace_wait's stepN and setstepN test cases
Date: Thu, 09 Jun 2022 01:39:23 -0400
I've managed to resurrect my NetBSD/hppa installation, and resumed
investigating this issue. My theory that there's something wrong with
ITLBMISS processing seems to be backwards: after adding some hacky
instrumentation, I found that the test passes when an ITLBMISS trap
occurs upon trying to execute the modified instruction stream, while
it fails when one does not. That led me to guess that what's missing
is a TLB flush operation, and sure enough this quick-hack patch seems
to fix it:
Index: sys/arch/hppa/hppa/pmap.c
===================================================================
RCS file: /cvsroot/src/sys/arch/hppa/hppa/pmap.c,v
retrieving revision 1.117
diff -u -r1.117 pmap.c
--- sys/arch/hppa/hppa/pmap.c 26 May 2022 05:34:04 -0000 1.117
+++ sys/arch/hppa/hppa/pmap.c 9 Jun 2022 03:36:33 -0000
@@ -1874,9 +1874,12 @@
pmap_procwr(struct proc *p, vaddr_t va, size_t len)
{
pmap_t pmap = p->p_vmspace->vm_map.pmap;
+ pa_space_t sp = pmap->pm_space;
- fdcache(pmap->pm_space, va, len);
- ficache(pmap->pm_space, va, len);
+ fdcache(sp, va, len);
+ ficache(sp, va, len);
+ pdtlb(sp, va);
+ pitlb(sp, va);
}
static inline void
This is mainly based on observing that most other calls of ficache()
are associated with pitlb() calls. I found two exceptions:
kobj_machdep.c's kobj_machdep() does ficache() but lacks pitlb().
pmap.c's pmap_syncicache_page() the same.
Perhaps those are also wrong? I lack any evidence of actual problems
with them, but I'm wondering.
Another point is that some other places in pmap.c use
#if defined(HP8000_CPU) || defined(HP8200_CPU) || \
defined(HP8500_CPU) || defined(HP8600_CPU)
around pitlb() calls, though that is far from universal. Since
I'm testing on HP8500, my results prove nothing about whether
it'd be OK to use a similar #if in pmap_procwr().
In combination with previous fixes, this brings me to a point
where t_ptrace_wait passes cleanly (modulo one expected failure).
Its siblings t_ptrace_wait* seem to have a residual problem or two.
For the record, this is with a kernel built from HEAD/202206081310Z,
but my userland is still from 202206030100Z.
regards, tom lane
From: Tom Lane <tgl@sss.pgh.pa.us>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-hppa/56867: hppa: intermittent SIGSEGV reports in t_ptrace_wait's stepN and setstepN test cases
Date: Thu, 09 Jun 2022 16:28:03 -0400
I wrote:
> In combination with previous fixes, this brings me to a point
> where t_ptrace_wait passes cleanly (modulo one expected failure).
> Its siblings t_ptrace_wait* seem to have a residual problem or two.
Oh, that was pilot error --- I'd forgotten to update those executables
after correcting the PR 56865 issue, but that one affects all of them.
With 202206081310Z + corrected fix for 56865 + Nick's committed fix
for PR 56866 + this patch, I get a clean bill of health for ptrace:
$ sudo atf-run lib/libc/sys/t_ptrace_* | atf-report
...
Summary for 7 test programs:
2428 passed test cases.
0 failed test cases.
5 expected failed test cases.
41 skipped test cases.
regards, tom lane
From: Nick Hudson <nick.hudson@gmx.co.uk>
To: gnats-bugs@netbsd.org, port-hppa-maintainer@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, tgl@sss.pgh.pa.us
Cc:
Subject: Re: port-hppa/56867: hppa: intermittent SIGSEGV reports in
t_ptrace_wait's stepN and setstepN test cases
Date: Fri, 10 Jun 2022 11:23:24 +0100
On 09/06/2022 06:40, Tom Lane wrote:
> The following reply was made to PR port-hppa/56867; it has been noted by=
GNATS.
>
> From: Tom Lane <tgl@sss.pgh.pa.us>
> To: gnats-bugs@netbsd.org
> Cc:
> Subject: Re: port-hppa/56867: hppa: intermittent SIGSEGV reports in t_pt=
race_wait's stepN and setstepN test cases
> Date: Thu, 09 Jun 2022 01:39:23 -0400
>
> I've managed to resurrect my NetBSD/hppa installation, and resumed
> investigating this issue.
YAY. :)
> My theory that there's something wrong with
> ITLBMISS processing seems to be backwards: after adding some hacky
> instrumentation, I found that the test passes when an ITLBMISS trap
> occurs upon trying to execute the modified instruction stream, while
> it fails when one does not. That led me to guess that what's missing
> is a TLB flush operation, and sure enough this quick-hack patch seems
> to fix it:
>
> Index: sys/arch/hppa/hppa/pmap.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> RCS file: /cvsroot/src/sys/arch/hppa/hppa/pmap.c,v
> retrieving revision 1.117
> diff -u -r1.117 pmap.c
> --- sys/arch/hppa/hppa/pmap.c 26 May 2022 05:34:04 -0000 1.117
> +++ sys/arch/hppa/hppa/pmap.c 9 Jun 2022 03:36:33 -0000
> @@ -1874,9 +1874,12 @@
> pmap_procwr(struct proc *p, vaddr_t va, size_t len)
> {
> pmap_t pmap =3D p->p_vmspace->vm_map.pmap;
> + pa_space_t sp =3D pmap->pm_space;
>
> - fdcache(pmap->pm_space, va, len);
> - ficache(pmap->pm_space, va, len);
> + fdcache(sp, va, len);
> + ficache(sp, va, len);
> + pdtlb(sp, va);
> + pitlb(sp, va);
> }
>
I dont this is right. The process mappings aren't changing. Will think
about it some more.
Thanks for poking at all of this.
Nick
From: Tom Lane <tgl@sss.pgh.pa.us>
To: Nick Hudson <nick.hudson@gmx.co.uk>
Cc: gnats-bugs@netbsd.org, port-hppa-maintainer@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: port-hppa/56867: hppa: intermittent SIGSEGV reports in t_ptrace_wait's stepN and setstepN test cases
Date: Fri, 10 Jun 2022 07:49:21 -0400
Nick Hudson <nick.hudson@gmx.co.uk> writes:
> On 09/06/2022 06:40, Tom Lane wrote:
>> ... I found that the test passes when an ITLBMISS trap
>> occurs upon trying to execute the modified instruction stream, while
>> it fails when one does not. That led me to guess that what's missing
>> is a TLB flush operation, and sure enough this quick-hack patch seems
>> to fix it:
> I dont this is right. The process mappings aren't changing. Will think
> about it some more.
Fair enough. One thing worth noting is that the place where the
breakpoint instructions are being dropped is inside libc (in
_lwp_kill, where the tracee process self-SIGSTOP'd). I have not
been able to figure out how it's okay at all to be modifying libc.
I guess that there's a copy-on-write happening somewhere so that
the traced process gets its own copy of this shared page, but
I couldn't find where that happens. If there isn't a copy, maybe
that is a bug in itself? If there is a copy, then the process
mapping *is* changing, although you'd think the COW code would
have dealt with it.
regards, tom lane
State-Changed-From-To: open->feedback
State-Changed-By: skrll@NetBSD.org
State-Changed-When: Thu, 16 Jun 2022 06:24:29 +0000
State-Changed-Why:
Fix applied. OK to close?
From: "Nick Hudson" <skrll@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/56867 CVS commit: src/sys/arch/hppa/hppa
Date: Thu, 16 Jun 2022 06:25:42 +0000
Module Name: src
Committed By: skrll
Date: Thu Jun 16 06:25:42 UTC 2022
Modified Files:
src/sys/arch/hppa/hppa: pmap.c
Log Message:
Re-reading the PA2.0 Cache Move-In rules tells me we do indeed need to
purge the translations from the TLBs in pmap_procwr.
PR/56867: hppa: intermittent SIGSEGV reports in t_ptrace_wait's stepN and setstepN test cases
To generate a diff of this commit:
cvs rdiff -u -r1.119 -r1.120 src/sys/arch/hppa/hppa/pmap.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: feedback->pending-pullups
State-Changed-By: skrll@NetBSD.org
State-Changed-When: Thu, 16 Jun 2022 06:26:31 +0000
State-Changed-Why:
Actually, should do a pullup
[pullup-9 #1474]
From: Tom Lane <tgl@sss.pgh.pa.us>
To: gnats-bugs@netbsd.org
Cc: port-hppa-maintainer@netbsd.org
Subject: Re: port-hppa/56867 (hppa: intermittent SIGSEGV reports in t_ptrace_wait's stepN and setstepN test cases)
Date: Thu, 16 Jun 2022 09:54:40 -0400
skrll@NetBSD.org writes:
> Fix applied. OK to close?
Committed patch looks OK to me, but as you say, need pullup first.
regards, tom lane
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/56867 CVS commit: [netbsd-9] src/sys/arch/hppa/hppa
Date: Thu, 16 Jun 2022 14:22:02 +0000
Module Name: src
Committed By: martin
Date: Thu Jun 16 14:22:02 UTC 2022
Modified Files:
src/sys/arch/hppa/hppa [netbsd-9]: pmap.c
Log Message:
Pull up following revision(s) (requested by skrll in ticket #1474):
sys/arch/hppa/hppa/pmap.c: revision 1.120
Re-reading the PA2.0 Cache Move-In rules tells me we do indeed need to
purge the translations from the TLBs in pmap_procwr.
PR/56867: hppa: intermittent SIGSEGV reports in t_ptrace_wait's stepN
and setstepN test cases
To generate a diff of this commit:
cvs rdiff -u -r1.100.20.1 -r1.100.20.2 src/sys/arch/hppa/hppa/pmap.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: skrll@NetBSD.org
State-Changed-When: Fri, 17 Jun 2022 06:02:07 +0000
State-Changed-Why:
pullup processed.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.