NetBSD Problem Report #55232

From www@netbsd.org  Tue May  5 00:45:29 2020
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id F171B1A9213
	for <gnats-bugs@gnats.NetBSD.org>; Tue,  5 May 2020 00:45:28 +0000 (UTC)
Message-Id: <20200505004527.E31D71A9219@mollari.NetBSD.org>
Date: Tue,  5 May 2020 00:45:27 +0000 (UTC)
From: rokuyama@rk.phys.keio.ac.jp
Reply-To: rokuyama@rk.phys.keio.ac.jp
To: gnats-bugs@NetBSD.org
Subject: Process hangs indefinitely if not calling syscalls for a while
X-Send-Pr-Version: www-1.0

>Number:         55232
>Category:       port-amiga
>Synopsis:       Process hangs indefinitely if not calling syscalls for a while
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-amiga-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue May 05 00:50:00 +0000 2020
>Closed-Date:    Tue May 05 01:13:20 +0000 2020
>Last-Modified:  Tue May 05 01:13:20 +0000 2020
>Originator:     Rin Okuyama
>Release:        9.99.59
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD a1200 9.99.59 NetBSD 9.99.59 (A1200) #17: Mon May  4 22:58:45 JST 2020  rin@latipes:/build/work/work/sys/arch/amiga/compile/A1200 amiga
(Amiga 1200 with 68060)
>Description:
Process not calling syscalls does not accept ^C:

----
% cat loop.c
int main(void) {
	for (;;) continue;
	return 0;
}
% cc loop.c && ./a.out
^C^C^C^C
----

Then, system gets stalled at this process, and I cannot do anything but
entering DDB from console. Trace for that process is not interesting:

----
~Stopped in pid 85.85 (a.out) at netbsd:cpu_Debugger+0x6:        unlk    a6
db> ps
PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
85   >  85 7   0     40000             bf7c40              a.out
...
db> trace/a bf7c40
trace: pid 85 lid 85 at 0xbbf3db4
?(0,0,10,4f8398,0) at a
?() at bbf3f2a
db>
----

A similar failure occurs for a process which does not call syscalls for
a while. For example, this one:

----
#include <signal.h> /* for signal */
#include <string.h> /* for strstr */
#include <stdlib.h> /* for malloc */
#include <unistd.h> /* for alarm */
static void quit (int sig) { _exit (sig + 128); }

int
main ()
{

    int result = 0;
    size_t m = 1000000;
    char *haystack = (char *) malloc (2 * m + 2);
    char *needle = (char *) malloc (m + 2);
    /* Failure to compile this test due to missing alarm is okay,
       since all such platforms (mingw) also have quadratic strstr.  */
    signal (SIGALRM, quit);
    alarm (5);
    /* Check for quadratic performance.  */
    if (haystack && needle)
      {
        memset (haystack, 'A', 2 * m);
        haystack[2 * m] = 'B';
        haystack[2 * m + 1] = 0;
        memset (needle, 'A', m);
        needle[m] = 'B';
        needle[m + 1] = 0;
        if (!strstr (haystack, needle))
          result |= 1;
      }
    return result;

  ;
  return 0;
}
----

taken from "strstr works in linear time" check from configure script,
does not complete indefinitely and causes system freeze.

Note that this is on my amiga with 68060:

- amiga (68060; Amiga 1200)

whereas the failure does not occur for other m68k ports:

- sun3 (68020; TME)
- mac68k (68040; Quadra 840AV)
>How-To-Repeat:
Described above.
>Fix:
Bisectioning revealed that the failure starts with this commit:

http://www.nerv.org/netbsd/?q=id:20200326T201906Z.ba5eaa1d6fe1c81297a039db52867c4c67b65575

> Module Name:	src
> Committed By:	ad
> Date:		Thu Mar 26 20:19:06 UTC 2020
> 
> Modified Files:
> 	src/sys/kern: kern_lwp.c kern_softint.c
> 	src/sys/sys: intr.h userret.h
> 
> Log Message:
> softint_overlay() (slow case) gains ~nothing but creates potential headaches.
> In the interests of simplicity remove it and always use the kthreads.
> 
> To generate a diff of this commit:
> cvs rdiff -u -r1.229 -r1.230 src/sys/kern/kern_lwp.c
> cvs rdiff -u -r1.62 -r1.63 src/sys/kern/kern_softint.c
> cvs rdiff -u -r1.19 -r1.20 src/sys/sys/intr.h
> cvs rdiff -u -r1.32 -r1.33 src/sys/sys/userret.h

By reverting this commit, -current as of yesterday works fine without
this problem.

>Release-Note:

>Audit-Trail:

State-Changed-From-To: open->closed
State-Changed-By: rin@NetBSD.org
State-Changed-When: Tue, 05 May 2020 01:13:20 +0000
State-Changed-Why:
Oops, duplicate with 55233. (Not delivered to netbsd-bugs?)


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.