NetBSD Problem Report #52886

From gson@gson.org  Mon Jan  1 12:37:10 2018
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id B40D67A173
	for <gnats-bugs@gnats.NetBSD.org>; Mon,  1 Jan 2018 12:37:10 +0000 (UTC)
Message-Id: <20180101123703.F2D4698C9EF@guava.gson.org>
Date: Mon,  1 Jan 2018 14:37:03 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: Tests hang at pk2_ffs_*
X-Send-Pr-Version: 3.95

>Number:         52886
>Category:       misc
>Synopsis:       Tests hang at pk2_ffs_*
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    msaitoh
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jan 01 12:40:00 +0000 2018
>Closed-Date:    Mon Jan 15 20:40:40 +0000 2018
>Last-Modified:  Mon Jan 22 12:35:00 +0000 2018
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current, source-date >= c. 2017-12-28
>Organization:

>Environment:
System: NetBSD
Architecture: sparc
Machine: sparc
>Description:

The sparc test runs on the TNF testbed have hung in ever test run for
the last few days.  First, the 2017.12.27.09.03.22 run hung in the
mtudisc_basic test case, and after that, every run has hung in a test
case whose name begins with "pk2_ffs_":

  2017.12.28.03.39.48    p2k_ffs_overwrite64k
  2017.12.28.07.46.34    p2k_ffs_extendfile
  2017.12.30.03.19.23    p2k_ffs_extendfile
  2017.12.31.11.43.42    p2k_ffs_read_fault
  2017.12.31.15.41.05    p2k_ffs_shrinkfile

There have also been similar hangs on i386 during the same time period, but
not in every test run:

  2017.12.28.17.51.49    p2k_ffs_fcntl_lock
  2017.12.28.18.41.33    p2k_ffs_tmount
  2017.12.29.14.47.09    p2k_ffs_create_nonalphanum
  2017.12.29.16.13.26    p2k_ffs_attrs
  2017.12.31.00.53.29    p2k_ffs_symlink_root
  2017.12.31.11.43.42    p2k_ffs_tfhinval

Here are links to the console output from two of the most recent hung runs:

  http://releng.netbsd.org/b5reports/i386/2017/2017.12.31.11.43.42/test.log
  http://releng.netbsd.org/b5reports/sparc/2017/2017.12.30.03.19.23/test.log

>How-To-Repeat:

>Fix:

>Release-Note:

>Audit-Trail:
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: misc/52886: Tests hang at pk2_ffs_*
Date: Wed, 3 Jan 2018 10:18:30 +0200

 This is also affecting amd64:

   http://releng.netbsd.org/b5reports/amd64/2017/2017.12.28.07.46.34/test.log
   http://releng.netbsd.org/b5reports/amd64/2017/2017.12.30.03.19.23/test.log
   http://releng.netbsd.org/b5reports/amd64/2018/2018.01.01.17.33.23/test.log

 -- 
 Andreas Gustafsson, gson@gson.org

From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org, msaitoh@NetBSD.org
Cc: 
Subject: Re: misc/52886: Tests hang at pk2_ffs_*
Date: Wed, 3 Jan 2018 22:21:22 +0200

 I did some additional sparc and amd64 test runs to bisect this, and it
 looks like it started with the following commits:

   2017.12.28.03.39.48 msaitoh src/sys/kern/kern_softint.c 1.45
   2017.12.28.03.39.48 msaitoh src/sys/kern/subr_pserialize.c 1.10
   2017.12.28.03.39.48 msaitoh src/sys/kern/subr_psref.c 1.10

 The log output from the new runs can be found around

   http://releng.netbsd.org/b5reports/sparc/commits-2017.12.html#2017.12.28.03.39.48
   http://releng.netbsd.org/b5reports/amd64/commits-2017.12.html#2017.12.28.03.39.48

 -- 
 Andreas Gustafsson, gson@gson.org

Responsible-Changed-From-To: misc-bug-people->msaitoh
Responsible-Changed-By: gson@NetBSD.org
Responsible-Changed-When: Wed, 03 Jan 2018 20:32:20 +0000
Responsible-Changed-Why:
Over to committer.


From: Andreas Gustafsson <gson@gson.org>
To: msaitoh@NetBSD.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: misc/52886: Tests hang at pk2_ffs_*
Date: Sun, 7 Jan 2018 00:08:27 +0200

 This is also affecting i386, but only in some of the runs, for example:

   http://releng.netbsd.org/b5reports/i386/2018/2018.01.05.14.22.26/test.log

 The automated tests for both amd64 and sparc have now been inoperative
 and the i386 ones partially inoperative, for more than a week because
 of this bug.

 msaitoh, please revert the following commits ASAP:

    2017.12.28.03.39.48 msaitoh src/sys/kern/kern_softint.c 1.45
    2017.12.28.03.39.48 msaitoh src/sys/kern/subr_pserialize.c 1.10
    2017.12.28.03.39.48 msaitoh src/sys/kern/subr_psref.c 1.10

 -- 
 Andreas Gustafsson, gson@gson.org

From: maya@netbsd.org
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: misc/52886: Tests hang at pk2_ffs_*
Date: Tue, 9 Jan 2018 01:32:07 +0000

 this is a rump problem as it doesn't set mp_online = true ever.

From: maya@netbsd.org
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: misc/52886: Tests hang at pk2_ffs_*
Date: Tue, 9 Jan 2018 01:38:53 +0000

 Does this help (not built tested)

 Index: ./librump/rumpkern/rump.c
 ===================================================================
 RCS file: /cvsroot/src/sys/rump/librump/rumpkern/rump.c,v
 retrieving revision 1.330
 diff -u -r1.330 rump.c
 --- ./librump/rumpkern/rump.c	21 Nov 2017 08:49:14 -0000	1.330
 +++ ./librump/rumpkern/rump.c	9 Jan 2018 01:38:05 -0000
 @@ -388,6 +388,7 @@
  		aprint_verbose("cpu%d at thinair0: rump virtual cpu\n", i);
  	}
  	ncpuonline = ncpu;
 +	mp_online = true;

  	/* Once all CPUs are detected, initialize the per-CPU cprng_fast.  */
  	cprng_fast_init();

From: Masanobu SAITOH <msaitoh@execsw.org>
To: gnats-bugs@NetBSD.org, msaitoh@NetBSD.org, gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org, Andreas Gustafsson <gson@gson.org>
Cc: msaitoh@execsw.org
Subject: Re: misc/52886: Tests hang at pk2_ffs_*
Date: Tue, 9 Jan 2018 13:48:54 +0900

 On 2018/01/09 10:40, maya@netbsd.org wrote:
 > The following reply was made to PR misc/52886; it has been noted by GNATS.
 > 
 > From: maya@netbsd.org
 > To: gnats-bugs@netbsd.org
 > Cc:
 > Subject: Re: misc/52886: Tests hang at pk2_ffs_*
 > Date: Tue, 9 Jan 2018 01:38:53 +0000
 > 
 >   Does this help (not built tested)
 >   
 >   Index: ./librump/rumpkern/rump.c
 >   ===================================================================
 >   RCS file: /cvsroot/src/sys/rump/librump/rumpkern/rump.c,v
 >   retrieving revision 1.330
 >   diff -u -r1.330 rump.c
 >   --- ./librump/rumpkern/rump.c	21 Nov 2017 08:49:14 -0000	1.330
 >   +++ ./librump/rumpkern/rump.c	9 Jan 2018 01:38:05 -0000
 >   @@ -388,6 +388,7 @@
 >    		aprint_verbose("cpu%d at thinair0: rump virtual cpu\n", i);
 >    	}
 >    	ncpuonline = ncpu;
 >   +	mp_online = true;
 >    
 >    	/* Once all CPUs are detected, initialize the per-CPU cprng_fast.  */
 >    	cprng_fast_init();
 >   
 > 

 The location is not good. At least, mp_online = true must be called after
 cprng_fast_init() to avoid some test fails. The calling order's difference
 between init_main.c::main() and rump.c::rump_init() is big and I don't know
 where is the best location.

 -- 
 -----------------------------------------------
                  SAITOH Masanobu (msaitoh@execsw.org
                                   msaitoh@netbsd.org)

From: "SAITOH Masanobu" <msaitoh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/52886 CVS commit: src/sys/rump/librump/rumpkern
Date: Tue, 9 Jan 2018 04:55:43 +0000

 Module Name:	src
 Committed By:	msaitoh
 Date:		Tue Jan  9 04:55:43 UTC 2018

 Modified Files:
 	src/sys/rump/librump/rumpkern: rump.c

 Log Message:
  Set mp_online = ture. I don't know the "best" location to set it true.
 This change might fix PR#52886.


 To generate a diff of this commit:
 cvs rdiff -u -r1.330 -r1.331 src/sys/rump/librump/rumpkern/rump.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Mon, 15 Jan 2018 20:40:40 +0000
State-Changed-Why:
The tests no longer hang with src/sys/rump/librump/rumpkern/rump.c 1.331.
Thanks msaitoh.


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/52886 CVS commit: [netbsd-8] src/sys
Date: Mon, 22 Jan 2018 12:30:20 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Mon Jan 22 12:30:20 UTC 2018

 Modified Files:
 	src/sys/kern [netbsd-8]: kern_softint.c subr_pserialize.c subr_psref.c
 	src/sys/rump/librump/rumpkern [netbsd-8]: rump.c

 Log Message:
 Pull up following revision(s) (requested by jdolecek in ticket #506):
 	sys/kern/kern_softint.c: revision 1.45
 	sys/rump/librump/rumpkern/rump.c: revision 1.331
 	sys/kern/subr_pserialize.c: revision 1.10
 	sys/kern/subr_psref.c: revision 1.10
 Prevent panic or hangup in softint_disestablish(), pserialize_perform() or
 psref_target_destroy() while mp_online == false.
  See http://mail-index.netbsd.org/tech-kern/2017/12/25/msg022829.html
 Set mp_online = true. This change might fix PR#52886.


 To generate a diff of this commit:
 cvs rdiff -u -r1.43.10.1 -r1.43.10.2 src/sys/kern/kern_softint.c
 cvs rdiff -u -r1.8.10.1 -r1.8.10.2 src/sys/kern/subr_pserialize.c
 cvs rdiff -u -r1.7.2.1 -r1.7.2.2 src/sys/kern/subr_psref.c
 cvs rdiff -u -r1.329.10.1 -r1.329.10.2 src/sys/rump/librump/rumpkern/rump.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.