NetBSD Problem Report #56867

From www@netbsd.org  Tue Jun  7 02:31:30 2022
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 487B81A921F
	for <gnats-bugs@gnats.NetBSD.org>; Tue,  7 Jun 2022 02:31:30 +0000 (UTC)
Message-Id: <20220607023128.9B3941A923C@mollari.NetBSD.org>
Date: Tue,  7 Jun 2022 02:31:28 +0000 (UTC)
From: tgl@sss.pgh.pa.us
Reply-To: tgl@sss.pgh.pa.us
To: gnats-bugs@NetBSD.org
Subject: hppa: intermittent SIGSEGV reports in t_ptrace_wait's stepN and setstepN test cases
X-Send-Pr-Version: www-1.0

>Number:         56867
>Category:       port-hppa
>Synopsis:       hppa: intermittent SIGSEGV reports in t_ptrace_wait's stepN and setstepN test cases
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-hppa-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jun 07 02:35:00 +0000 2022
>Closed-Date:    Fri Jun 17 06:02:07 +0000 2022
>Last-Modified:  Fri Jun 17 06:02:07 +0000 2022
>Originator:     Tom Lane
>Release:        HEAD/202206030100Z
>Organization:
PostgreSQL Global Development Group
>Environment:
NetBSD sss2.sss.pgh.pa.us 9.99.97 NetBSD 9.99.97 (SD2) #0: Fri Jun  3 12:30:06 EDT 2022  tgl@nuc1.sss.pgh.pa.us:/home/tgl/netbsd-H-202206030100Z/obj.hppa/sys/arch/hppa/compile/SD2 hppa
>Description:
After applying the fixes proposed in PRs 56864, 56865, 56866, I still see one class of failures in t_ptrace_wait and sibling test programs: the stepN and setstepN test cases frequently complain that they see SIGSEGV rather than SIGTRAP as the WSTOPSIG(status) result after an attempted step.  The failure rate is near 100% if you do it via atf-run, but if you invoke these tests individually they frequently pass, so there's something nondeterministic in there.
>How-To-Repeat:
This way fails pretty reproducibly:

$ cd /usr/tests/
$ atf-run lib/libc/sys/t_ptrace_wait

This way succeeds more often than not for me, but sometimes fails with the same symptom:

$ /usr/tests/lib/libc/sys/t_ptrace_wait step1

(replace step1 with any related test case, same results)

>Fix:
I have not isolated the cause, and may not be able to because my lone HPPA machine has developed hardware issues.  But I wanted to memorialize this issue just to clarify that the preceding PRs don't fully fix this test program.

Given the evident nondeterminism, the hypothesis that I was about to investigate when my machine suddenly started making weird noises is that if we get a TLB miss when trying to execute the single intended instruction, trap.c somehow misbehaves and reaches the place where it reports SIGSEGV while trying to handle the TLB miss trap.  It might be something quite different though.

>Release-Note:

>Audit-Trail:
From: Tom Lane <tgl@sss.pgh.pa.us>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-hppa/56867: hppa: intermittent SIGSEGV reports in t_ptrace_wait's stepN and setstepN test cases
Date: Thu, 09 Jun 2022 01:39:23 -0400

 I've managed to resurrect my NetBSD/hppa installation, and resumed
 investigating this issue.  My theory that there's something wrong with
 ITLBMISS processing seems to be backwards: after adding some hacky
 instrumentation, I found that the test passes when an ITLBMISS trap
 occurs upon trying to execute the modified instruction stream, while
 it fails when one does not.  That led me to guess that what's missing
 is a TLB flush operation, and sure enough this quick-hack patch seems
 to fix it:

 Index: sys/arch/hppa/hppa/pmap.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/hppa/hppa/pmap.c,v
 retrieving revision 1.117
 diff -u -r1.117 pmap.c
 --- sys/arch/hppa/hppa/pmap.c   26 May 2022 05:34:04 -0000      1.117
 +++ sys/arch/hppa/hppa/pmap.c   9 Jun 2022 03:36:33 -0000
 @@ -1874,9 +1874,12 @@
  pmap_procwr(struct proc *p, vaddr_t va, size_t len)
  {
         pmap_t pmap = p->p_vmspace->vm_map.pmap;
 +       pa_space_t sp = pmap->pm_space;

 -       fdcache(pmap->pm_space, va, len);
 -       ficache(pmap->pm_space, va, len);
 +       fdcache(sp, va, len);
 +       ficache(sp, va, len);
 +       pdtlb(sp, va);
 +       pitlb(sp, va);
  }

  static inline void

 This is mainly based on observing that most other calls of ficache()
 are associated with pitlb() calls.  I found two exceptions:

 kobj_machdep.c's kobj_machdep() does ficache() but lacks pitlb().
 pmap.c's pmap_syncicache_page() the same.

 Perhaps those are also wrong?  I lack any evidence of actual problems
 with them, but I'm wondering.

 Another point is that some other places in pmap.c use

 #if defined(HP8000_CPU) || defined(HP8200_CPU) || \
     defined(HP8500_CPU) || defined(HP8600_CPU)

 around pitlb() calls, though that is far from universal.  Since
 I'm testing on HP8500, my results prove nothing about whether
 it'd be OK to use a similar #if in pmap_procwr().

 In combination with previous fixes, this brings me to a point
 where t_ptrace_wait passes cleanly (modulo one expected failure).
 Its siblings t_ptrace_wait* seem to have a residual problem or two.

 For the record, this is with a kernel built from HEAD/202206081310Z,
 but my userland is still from 202206030100Z.

 			regards, tom lane

From: Tom Lane <tgl@sss.pgh.pa.us>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-hppa/56867: hppa: intermittent SIGSEGV reports in t_ptrace_wait's stepN and setstepN test cases
Date: Thu, 09 Jun 2022 16:28:03 -0400

 I wrote:
 > In combination with previous fixes, this brings me to a point
 > where t_ptrace_wait passes cleanly (modulo one expected failure).
 > Its siblings t_ptrace_wait* seem to have a residual problem or two.

 Oh, that was pilot error --- I'd forgotten to update those executables
 after correcting the PR 56865 issue, but that one affects all of them.

 With 202206081310Z + corrected fix for 56865 + Nick's committed fix
 for PR 56866 + this patch, I get a clean bill of health for ptrace:

 $ sudo atf-run lib/libc/sys/t_ptrace_* | atf-report
 ...
 Summary for 7 test programs:
     2428 passed test cases.
     0 failed test cases.
     5 expected failed test cases.
     41 skipped test cases.

 			regards, tom lane

From: Nick Hudson <nick.hudson@gmx.co.uk>
To: gnats-bugs@netbsd.org, port-hppa-maintainer@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, tgl@sss.pgh.pa.us
Cc: 
Subject: Re: port-hppa/56867: hppa: intermittent SIGSEGV reports in
 t_ptrace_wait's stepN and setstepN test cases
Date: Fri, 10 Jun 2022 11:23:24 +0100

 On 09/06/2022 06:40, Tom Lane wrote:
 > The following reply was made to PR port-hppa/56867; it has been noted by=
  GNATS.
 >
 > From: Tom Lane <tgl@sss.pgh.pa.us>
 > To: gnats-bugs@netbsd.org
 > Cc:
 > Subject: Re: port-hppa/56867: hppa: intermittent SIGSEGV reports in t_pt=
 race_wait's stepN and setstepN test cases
 > Date: Thu, 09 Jun 2022 01:39:23 -0400
 >
 >   I've managed to resurrect my NetBSD/hppa installation, and resumed
 >   investigating this issue.

 YAY. :)


 >   My theory that there's something wrong with
 >   ITLBMISS processing seems to be backwards: after adding some hacky
 >   instrumentation, I found that the test passes when an ITLBMISS trap
 >   occurs upon trying to execute the modified instruction stream, while
 >   it fails when one does not.  That led me to guess that what's missing
 >   is a TLB flush operation, and sure enough this quick-hack patch seems
 >   to fix it:
 >
 >   Index: sys/arch/hppa/hppa/pmap.c
 >   =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
 >   RCS file: /cvsroot/src/sys/arch/hppa/hppa/pmap.c,v
 >   retrieving revision 1.117
 >   diff -u -r1.117 pmap.c
 >   --- sys/arch/hppa/hppa/pmap.c   26 May 2022 05:34:04 -0000      1.117
 >   +++ sys/arch/hppa/hppa/pmap.c   9 Jun 2022 03:36:33 -0000
 >   @@ -1874,9 +1874,12 @@
 >    pmap_procwr(struct proc *p, vaddr_t va, size_t len)
 >    {
 >           pmap_t pmap =3D p->p_vmspace->vm_map.pmap;
 >   +       pa_space_t sp =3D pmap->pm_space;
 >
 >   -       fdcache(pmap->pm_space, va, len);
 >   -       ficache(pmap->pm_space, va, len);
 >   +       fdcache(sp, va, len);
 >   +       ficache(sp, va, len);
 >   +       pdtlb(sp, va);
 >   +       pitlb(sp, va);
 >    }
 >

 I dont this is right. The process mappings aren't changing. Will think
 about it some more.

 Thanks for poking at all of this.

 Nick


From: Tom Lane <tgl@sss.pgh.pa.us>
To: Nick Hudson <nick.hudson@gmx.co.uk>
Cc: gnats-bugs@netbsd.org, port-hppa-maintainer@netbsd.org,
        netbsd-bugs@netbsd.org
Subject: Re: port-hppa/56867: hppa: intermittent SIGSEGV reports in t_ptrace_wait's stepN and setstepN test cases
Date: Fri, 10 Jun 2022 07:49:21 -0400

 Nick Hudson <nick.hudson@gmx.co.uk> writes:
 > On 09/06/2022 06:40, Tom Lane wrote:
 >> ... I found that the test passes when an ITLBMISS trap
 >> occurs upon trying to execute the modified instruction stream, while
 >> it fails when one does not.  That led me to guess that what's missing
 >> is a TLB flush operation, and sure enough this quick-hack patch seems
 >> to fix it:

 > I dont this is right. The process mappings aren't changing. Will think
 > about it some more.

 Fair enough.  One thing worth noting is that the place where the
 breakpoint instructions are being dropped is inside libc (in
 _lwp_kill, where the tracee process self-SIGSTOP'd).  I have not
 been able to figure out how it's okay at all to be modifying libc.
 I guess that there's a copy-on-write happening somewhere so that
 the traced process gets its own copy of this shared page, but
 I couldn't find where that happens.  If there isn't a copy, maybe
 that is a bug in itself?  If there is a copy, then the process
 mapping *is* changing, although you'd think the COW code would
 have dealt with it.

 			regards, tom lane

State-Changed-From-To: open->feedback
State-Changed-By: skrll@NetBSD.org
State-Changed-When: Thu, 16 Jun 2022 06:24:29 +0000
State-Changed-Why:
Fix applied. OK to close?


From: "Nick Hudson" <skrll@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56867 CVS commit: src/sys/arch/hppa/hppa
Date: Thu, 16 Jun 2022 06:25:42 +0000

 Module Name:	src
 Committed By:	skrll
 Date:		Thu Jun 16 06:25:42 UTC 2022

 Modified Files:
 	src/sys/arch/hppa/hppa: pmap.c

 Log Message:
 Re-reading the PA2.0 Cache Move-In rules tells me we do indeed need to
 purge the translations from the TLBs in pmap_procwr.

 PR/56867: hppa: intermittent SIGSEGV reports in t_ptrace_wait's stepN and setstepN test cases


 To generate a diff of this commit:
 cvs rdiff -u -r1.119 -r1.120 src/sys/arch/hppa/hppa/pmap.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: feedback->pending-pullups
State-Changed-By: skrll@NetBSD.org
State-Changed-When: Thu, 16 Jun 2022 06:26:31 +0000
State-Changed-Why:
Actually, should do a pullup
[pullup-9 #1474]


From: Tom Lane <tgl@sss.pgh.pa.us>
To: gnats-bugs@netbsd.org
Cc: port-hppa-maintainer@netbsd.org
Subject: Re: port-hppa/56867 (hppa: intermittent SIGSEGV reports in t_ptrace_wait's stepN and setstepN test cases)
Date: Thu, 16 Jun 2022 09:54:40 -0400

 skrll@NetBSD.org writes:
 > Fix applied. OK to close?

 Committed patch looks OK to me, but as you say, need pullup first.

 			regards, tom lane

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56867 CVS commit: [netbsd-9] src/sys/arch/hppa/hppa
Date: Thu, 16 Jun 2022 14:22:02 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Thu Jun 16 14:22:02 UTC 2022

 Modified Files:
 	src/sys/arch/hppa/hppa [netbsd-9]: pmap.c

 Log Message:
 Pull up following revision(s) (requested by skrll in ticket #1474):

 	sys/arch/hppa/hppa/pmap.c: revision 1.120

 Re-reading the PA2.0 Cache Move-In rules tells me we do indeed need to
 purge the translations from the TLBs in pmap_procwr.

 PR/56867: hppa: intermittent SIGSEGV reports in t_ptrace_wait's stepN
 and setstepN test cases


 To generate a diff of this commit:
 cvs rdiff -u -r1.100.20.1 -r1.100.20.2 src/sys/arch/hppa/hppa/pmap.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: skrll@NetBSD.org
State-Changed-When: Fri, 17 Jun 2022 06:02:07 +0000
State-Changed-Why:
pullup processed.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.