NetBSD Problem Report #55233

From www@netbsd.org  Tue May  5 01:09:46 2020
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id D07591A9213
	for <gnats-bugs@gnats.NetBSD.org>; Tue,  5 May 2020 01:09:46 +0000 (UTC)
Message-Id: <20200505010945.D38241A9219@mollari.NetBSD.org>
Date: Tue,  5 May 2020 01:09:45 +0000 (UTC)
From: rokuyama.rk@gmail.com
Reply-To: rokuyama.rk@gmail.com
To: gnats-bugs@NetBSD.org
Subject: Process hangs indefinitely if not calling syscalls for a while
X-Send-Pr-Version: www-1.0

>Number:         55233
>Category:       port-amiga
>Synopsis:       Process hangs indefinitely if not calling syscalls for a while
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    ad
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue May 05 01:10:00 +0000 2020
>Closed-Date:    Thu Jun 11 18:57:35 +0000 2020
>Last-Modified:  Thu Jun 11 18:57:35 +0000 2020
>Originator:     Rin Okuyama
>Release:        9.99.59
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD a1200 9.99.59 NetBSD 9.99.59 (A1200) #17: Mon May  4 22:58:45 JST 2020  rin@latipes:/build/work/work/sys/arch/amiga/compile/A1200 amiga
Amiga 1200 with 68060
>Description:
Process not calling syscalls does not accept ^C:

----
% cat loop.c
int main(void) {
	for (;;) continue;
	return 0;
}
% cc loop.c && ./a.out
^C^C^C^C
----

Then, system gets stalled at this process, and I cannot do anything but
entering DDB from console. Trace for that process is not interesting:

----
~Stopped in pid 85.85 (a.out) at netbsd:cpu_Debugger+0x6:        unlk    a6
db> ps
PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
85   >  85 7   0     40000             bf7c40              a.out
...
db> trace/a bf7c40
trace: pid 85 lid 85 at 0xbbf3db4
?(0,0,10,4f8398,0) at a
?() at bbf3f2a
db>
----

A similar failure occurs for a process which does not call syscalls for
a while. For example, this one:

----
#include <signal.h> /* for signal */
#include <string.h> /* for strstr */
#include <stdlib.h> /* for malloc */
#include <unistd.h> /* for alarm */
static void quit (int sig) { _exit (sig + 128); }

int
main ()
{

    int result = 0;
    size_t m = 1000000;
    char *haystack = (char *) malloc (2 * m + 2);
    char *needle = (char *) malloc (m + 2);
    /* Failure to compile this test due to missing alarm is okay,
       since all such platforms (mingw) also have quadratic strstr.  */
    signal (SIGALRM, quit);
    alarm (5);
    /* Check for quadratic performance.  */
    if (haystack && needle)
      {
        memset (haystack, 'A', 2 * m);
        haystack[2 * m] = 'B';
        haystack[2 * m + 1] = 0;
        memset (needle, 'A', m);
        needle[m] = 'B';
        needle[m + 1] = 0;
        if (!strstr (haystack, needle))
          result |= 1;
      }
    return result;

  ;
  return 0;
}
----

taken from "strstr works in linear time" check from configure script,
does not complete indefinitely and causes system freeze.

Note that this is on my amiga with 68060:

- amiga (68060; Amiga 1200)

whereas the failure does not occur for other m68k ports:

- sun3 (68020; TME)
- mac68k (68040; Quadra 840AV)
>How-To-Repeat:
Described above.
>Fix:
Bisectioning revealed that the failure starts with this commit:

http://www.nerv.org/netbsd/?q=id:20200326T201906Z.ba5eaa1d6fe1c81297a039db52867c4c67b65575

> Module Name:	src
> Committed By:	ad
> Date:		Thu Mar 26 20:19:06 UTC 2020
> 
> Modified Files:
> 	src/sys/kern: kern_lwp.c kern_softint.c
> 	src/sys/sys: intr.h userret.h
> 
> Log Message:
> softint_overlay() (slow case) gains ~nothing but creates potential headaches.
> In the interests of simplicity remove it and always use the kthreads.
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.229 -r1.230 src/sys/kern/kern_lwp.c
> cvs rdiff -u -r1.62 -r1.63 src/sys/kern/kern_softint.c
> cvs rdiff -u -r1.19 -r1.20 src/sys/sys/intr.h
> cvs rdiff -u -r1.32 -r1.33 src/sys/sys/userret.h

By reverting this commit, -current as of yesterday works fine without
this problem.

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: port-amiga-maintainer->ad
Responsible-Changed-By: rin@NetBSD.org
Responsible-Changed-When: Tue, 05 May 2020 01:15:10 +0000
Responsible-Changed-Why:
Over to committer; Andrew, can you please take a look?


From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@netbsd.org, Andrew Doran <ad@netbsd.org>
Cc: 
Subject: Re: port-amiga/55233: Process hangs indefinitely if not calling
 syscalls for a while
Date: Tue, 5 May 2020 03:14:02 +0200

 I reported a very similar behavior of GDB. I was frequently forced to
 kill the debugger with SIGKILL.

From: Jason Thorpe <thorpej@me.com>
To: gnats-bugs@netbsd.org
Cc: port-amiga-maintainer@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org
Subject: Re: port-amiga/55233: Process hangs indefinitely if not calling
 syscalls for a while
Date: Mon, 4 May 2020 18:33:25 -0700

 > On May 4, 2020, at 6:10 PM, rokuyama.rk@gmail.com wrote:
 >=20
 > By reverting this commit, -current as of yesterday works fine without
 > this problem.

 I suspect the root cause of this some divergence that the amiga port did =
 from the other m68k ports long ago (perhaps it's not sharing some =
 otherwise common m68k code?).  I am also going to take a guess that this =
 impacts the Atari port, as well.

 -- thorpej

State-Changed-From-To: open->feedback
State-Changed-By: ad@NetBSD.org
State-Changed-When: Tue, 05 May 2020 21:26:58 +0000
State-Changed-Why:
Did that fix it?


From: "Andrew Doran" <ad@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55233 CVS commit: src/sys/arch/amiga/amiga
Date: Tue, 5 May 2020 21:22:48 +0000

 Module Name:	src
 Committed By:	ad
 Date:		Tue May  5 21:22:48 UTC 2020

 Modified Files:
 	src/sys/arch/amiga/amiga: machdep.c

 Log Message:
 PR port-amiga/55233 Process hangs indefinitely if not calling syscalls for a while

 cpu_intr_p() is broken on amiga, fix it.

 From code inspection it looks like amiga and other m68k ports check for ASTs
 with interrupts enabled in some cases, which is racy.  Not fixed.


 To generate a diff of this commit:
 cvs rdiff -u -r1.251 -r1.252 src/sys/arch/amiga/amiga/machdep.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org
Cc: Andrew Doran <ad@NetBSD.org>, Jason Thorpe <thorpej@me.com>,
 Kamil Rytarowski <n54@gmx.com>
Subject: Re: port-amiga/55233 (Process hangs indefinitely if not calling
 syscalls for a while)
Date: Thu, 14 May 2020 17:10:30 +0900

 I'm so sorry for the late reply.

 On 2020/05/06 6:26, ad@NetBSD.org wrote:
 > Did that fix it?

 Yes! Thank you very much for the rapid fix!

 On 2020/05/06 6:22, Andrew Doran wrote:
 >  From code inspection it looks like amiga and other m68k ports check for ASTs
 > with interrupts enabled in some cases, which is racy.  Not fixed.

 This needs further works (probably worth for another PR?). So, please
 feel free to close this PR.

 On 2020/05/05 10:33, Jason Thorpe wrote:
 > I suspect the root cause of this some divergence that the amiga port did from the other m68k ports long ago (perhaps it's not sharing some otherwise common m68k code?).  I am also going to take a guess that this impacts the Atari port, as well.

 With code inspection, atari does not have this problem. However, yes,
 m68k ports especially old ones have many similar-but-slightly-different
 codes that should be unified in the future (Your work for 68040 was
 really nice!).

 On 2020/05/05 10:20, Kamil Rytarowski wrote:
 >   I reported a very similar behavior of GDB. I was frequently forced to
 >   kill the debugger with SIGKILL.

 Hmm, I do not observe such a case for GDB. This should also be worth
 for another PR...

 Thanks,
 rin

State-Changed-From-To: feedback->closed
State-Changed-By: ad@NetBSD.org
State-Changed-When: Thu, 11 Jun 2020 18:57:35 +0000
State-Changed-Why:
Fixed


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.