NetBSD Problem Report #51412

From www@NetBSD.org  Fri Aug 12 05:39:15 2016
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id C3CA87A283
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 12 Aug 2016 05:39:14 +0000 (UTC)
Message-Id: <20160812053913.890897A2B6@mollari.NetBSD.org>
Date: Fri, 12 Aug 2016 05:39:13 +0000 (UTC)
From: ryan.brackenbury@gmail.com
Reply-To: ryan.brackenbury@gmail.com
To: gnats-bugs@NetBSD.org
Subject: Syscall I/O race condition leads to deadlock and lost interrupts
X-Send-Pr-Version: www-1.0

>Number:         51412
>Category:       kern
>Synopsis:       Syscall I/O race condition leads to deadlock and lost interrupts
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Aug 12 05:40:00 +0000 2016
>Last-Modified:  Tue Sep 06 01:00:00 +0000 2016
>Originator:     Ryan Brackenbury
>Release:        NetBSD-7.0.1
>Organization:
>Environment:
NetBSD 7.0.1 (GENERIC.201605221355Z) amd64
>Description:
Making simultaneous reads of kernel parameters can cause deadlock of some kernel I/O. Programs trying to read/write to the same kernel parameters after this time are blocked, and hang on I/O indefinitely - these show 'D' in the run status field in top/ps aux. 

Control-C's are lost after this point, and other interrupts become unpredictable or unresponsive. If 'zombie' processes waiting on kern I/O build up, this further causes a system crash.

In my situation, having two seperate users simultaneously run `envstat` caused the kern enter this state, and to drop Control-C's. The system became entirely unresponsive and required a hard reboot.
>How-To-Repeat:
Run two copies of a program that access kernel parameters at the same time, and the will cause a deadlock. May require repeating a number of times until race occurs.


Worst Case: eg., while [ 1 ]; do sysctl hw & envstat; done

>Fix:
Sometimes running `sysctl -a` will unblock the kern I/O (after a few tries), and the system will return back to normal. This might point to some lock in the kernel being faulty, but I have not investigated further. 

>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/51412: Syscall I/O race condition leads to deadlock and
 lost interrupts
Date: Fri, 26 Aug 2016 21:52:56 +0000

 On Fri, Aug 12, 2016 at 05:40:00AM +0000, ryan.brackenbury@gmail.com wrote:
  > Making simultaneous reads of kernel parameters can cause deadlock
  > of some kernel I/O. Programs trying to read/write to the same
  > kernel parameters after this time are blocked, and hang on I/O
  > indefinitely - these show 'D' in the run status field in top/ps
  > aux.

 What are "kernel parameters"? Are you talking just about envstat? Or
 sysctl? Or the combination of the two? Or...?

  > Control-C's are lost after this point, and other interrupts become
  > unpredictable or unresponsive. If 'zombie' processes waiting on
  > kern I/O build up, this further causes a system crash.

 Those aren't zombies; zombies have state 'Z'.

 -- 
 David A. Holland
 dholland@netbsd.org

From: Ryan Brackenbury <ryan.brackenbury@gmail.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/51412: Syscall I/O race condition leads to deadlock and lost interrupts
Date: Sun, 28 Aug 2016 12:57:33 -0400

 --001a1141fc1cdc6557053b24a4f8
 Content-Type: text/plain; charset=UTF-8

  >What are "kernel parameters"? Are you talking just about envstat? Or
  >sysctl? Or the combination of the two? Or...?
 My experience here comes from a sysadmin/hobbyist (not kern hacker)
 perspective, so I might be using the wrong terminology. I am referring to
 the kernel values and variables set/retrieved via ioctls from sysctl and
 envstat - such as those in the MIB for sysctl. I am not sure how sysctl and
 envstat vary about how they both interact with the kernel, so I can't
 pretend to have a more in-depth understanding than that above.

 What I do know is that using one and/or the other command can influence the
 execution of the other. For example: if retrieving values with envstat
 deadlocks on I/O, calling sysctl can be enough to unblock envstat (and vice
 versa). In my experience, also running either program concurrent with
 another copy of the same has caused these I/O blocks. So to answer your
 question - either envstat, or sysctl, or a combination of the two is
 sufficient.

 > Those aren't zombies; zombies have state 'Z'.
 Yes; it just seemed like an appropriate way to compare them, since they
 were un-killable but still using up system resources.

 --001a1141fc1cdc6557053b24a4f8
 Content-Type: text/html; charset=UTF-8
 Content-Transfer-Encoding: quoted-printable

 <div dir=3D"ltr"><span style=3D"font-size:12.8px">=C2=A0&gt;What are &quot;=
 kernel parameters&quot;? Are you talking just about envstat? Or</span><br s=
 tyle=3D"font-size:12.8px"><span style=3D"font-size:12.8px">=C2=A0&gt;sysctl=
 ? Or the combination of the two? Or...?<br></span>My experience here comes =
 from a sysadmin/hobbyist (not kern hacker) perspective, so I might be using=
  the wrong terminology. I am referring to the kernel values and variables s=
 et/retrieved via ioctls from sysctl and envstat - such as those in the MIB =
 for sysctl. I am not sure how sysctl and envstat vary about how they both i=
 nteract with the kernel, so I can&#39;t pretend to have a more in-depth und=
 erstanding than that above.=C2=A0<br><br>What I do know is that using one a=
 nd/or the other command can influence the execution of the other. For examp=
 le: if retrieving values with envstat deadlocks on I/O, calling sysctl can =
 be enough to unblock envstat (and vice versa). In my experience, also runni=
 ng either program concurrent with another copy of the same has caused these=
  I/O blocks. So to answer your question - either envstat, or sysctl, or a c=
 ombination of the two is sufficient.<br><br>&gt;=C2=A0<span style=3D"font-s=
 ize:12.8px">Those aren&#39;t zombies; zombies have state &#39;Z&#39;.<br></=
 span>Yes; it just seemed like an appropriate way to compare them, since the=
 y were un-killable but still using up system resources.<br><div class=3D"gm=
 ail_extra"><div class=3D"gmail_quote"><br></div></div></div>

 --001a1141fc1cdc6557053b24a4f8--

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/51412: Syscall I/O race condition leads to deadlock and
 lost interrupts
Date: Mon, 5 Sep 2016 18:53:03 +0000

 On Sun, Aug 28, 2016 at 05:00:01PM +0000, Ryan Brackenbury wrote:
  >   >What are "kernel parameters"? Are you talking just about envstat? Or
  >   >sysctl? Or the combination of the two? Or...?
  >
  >  My experience here comes from a sysadmin/hobbyist (not kern hacker)
  >  perspective, so I might be using the wrong terminology.

 That's fine -- just need to be clear what we're talking about.

  >  I am referring to
  >  the kernel values and variables set/retrieved via ioctls from sysctl and
  >  envstat - such as those in the MIB for sysctl. I am not sure how sysctl and
  >  envstat vary about how they both interact with the kernel, so I can't
  >  pretend to have a more in-depth understanding than that above.

 They're quite different systems and it's strange that they should be
 coupled in this way. (though, given the code behind envstat, maybe
 it's not that surprising...)

  >  What I do know is that using one and/or the other command can influence the
  >  execution of the other. For example: if retrieving values with envstat
  >  deadlocks on I/O, calling sysctl can be enough to unblock envstat (and vice
  >  versa). In my experience, also running either program concurrent with
  >  another copy of the same has caused these I/O blocks. So to answer your
  >  question - either envstat, or sysctl, or a combination of the two is
  >  sufficient.

 This is odd, but we'll certainly look into it.

  >  > Those aren't zombies; zombies have state 'Z'.
  >
  >  Yes; it just seemed like an appropriate way to compare them, since they
  >  were un-killable but still using up system resources.

 Right, it's just confusing to reuse a technical term with a different
 specific meaning :-)

 -- 
 David A. Holland
 dholland@netbsd.org

From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc: Paul Goyette <paul@whooppee.com>
Subject: Re: kern/51412: Syscall I/O race condition leads to deadlock and lost interrupts
Date: Tue, 06 Sep 2016 07:58:04 +0700

     Date:        Mon,  5 Sep 2016 18:55:01 +0000 (UTC)
     From:        David Holland <dholland-bugs@netbsd.org>
     Message-ID:  <20160905185501.8A7777A283@mollari.NetBSD.org>


   |   >  What I do know is that using one and/or the other command can influence the
   |   >  execution of the other. 
   |  
   |  This is odd, but we'll certainly look into it.

 I wonder if this is related to the "blast from the past" message Paul
 just posted about (on tech-kern).   Locking problems with sysctl() seem
 to be related to both ... 

 See:
 	https://mail-index.netbsd.org/current-users/2015/10/27/msg028285.html
 	https://mail-index.netbsd.org/tech-kern/2016/09/05/msg021028.html
 	https://mail-index.netbsd.org/tech-kern/2016/09/05/msg021029.html

 kre

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.