NetBSD Problem Report #51412
From www@NetBSD.org Fri Aug 12 05:39:15 2016
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id C3CA87A283
for <gnats-bugs@gnats.NetBSD.org>; Fri, 12 Aug 2016 05:39:14 +0000 (UTC)
Message-Id: <20160812053913.890897A2B6@mollari.NetBSD.org>
Date: Fri, 12 Aug 2016 05:39:13 +0000 (UTC)
From: ryan.brackenbury@gmail.com
Reply-To: ryan.brackenbury@gmail.com
To: gnats-bugs@NetBSD.org
Subject: Syscall I/O race condition leads to deadlock and lost interrupts
X-Send-Pr-Version: www-1.0
>Number: 51412
>Category: kern
>Synopsis: Syscall I/O race condition leads to deadlock and lost interrupts
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Aug 12 05:40:00 +0000 2016
>Last-Modified: Tue Sep 06 01:00:00 +0000 2016
>Originator: Ryan Brackenbury
>Release: NetBSD-7.0.1
>Organization:
>Environment:
NetBSD 7.0.1 (GENERIC.201605221355Z) amd64
>Description:
Making simultaneous reads of kernel parameters can cause deadlock of some kernel I/O. Programs trying to read/write to the same kernel parameters after this time are blocked, and hang on I/O indefinitely - these show 'D' in the run status field in top/ps aux.
Control-C's are lost after this point, and other interrupts become unpredictable or unresponsive. If 'zombie' processes waiting on kern I/O build up, this further causes a system crash.
In my situation, having two seperate users simultaneously run `envstat` caused the kern enter this state, and to drop Control-C's. The system became entirely unresponsive and required a hard reboot.
>How-To-Repeat:
Run two copies of a program that access kernel parameters at the same time, and the will cause a deadlock. May require repeating a number of times until race occurs.
Worst Case: eg., while [ 1 ]; do sysctl hw & envstat; done
>Fix:
Sometimes running `sysctl -a` will unblock the kern I/O (after a few tries), and the system will return back to normal. This might point to some lock in the kernel being faulty, but I have not investigated further.
>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/51412: Syscall I/O race condition leads to deadlock and
lost interrupts
Date: Fri, 26 Aug 2016 21:52:56 +0000
On Fri, Aug 12, 2016 at 05:40:00AM +0000, ryan.brackenbury@gmail.com wrote:
> Making simultaneous reads of kernel parameters can cause deadlock
> of some kernel I/O. Programs trying to read/write to the same
> kernel parameters after this time are blocked, and hang on I/O
> indefinitely - these show 'D' in the run status field in top/ps
> aux.
What are "kernel parameters"? Are you talking just about envstat? Or
sysctl? Or the combination of the two? Or...?
> Control-C's are lost after this point, and other interrupts become
> unpredictable or unresponsive. If 'zombie' processes waiting on
> kern I/O build up, this further causes a system crash.
Those aren't zombies; zombies have state 'Z'.
--
David A. Holland
dholland@netbsd.org
From: Ryan Brackenbury <ryan.brackenbury@gmail.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/51412: Syscall I/O race condition leads to deadlock and lost interrupts
Date: Sun, 28 Aug 2016 12:57:33 -0400
--001a1141fc1cdc6557053b24a4f8
Content-Type: text/plain; charset=UTF-8
>What are "kernel parameters"? Are you talking just about envstat? Or
>sysctl? Or the combination of the two? Or...?
My experience here comes from a sysadmin/hobbyist (not kern hacker)
perspective, so I might be using the wrong terminology. I am referring to
the kernel values and variables set/retrieved via ioctls from sysctl and
envstat - such as those in the MIB for sysctl. I am not sure how sysctl and
envstat vary about how they both interact with the kernel, so I can't
pretend to have a more in-depth understanding than that above.
What I do know is that using one and/or the other command can influence the
execution of the other. For example: if retrieving values with envstat
deadlocks on I/O, calling sysctl can be enough to unblock envstat (and vice
versa). In my experience, also running either program concurrent with
another copy of the same has caused these I/O blocks. So to answer your
question - either envstat, or sysctl, or a combination of the two is
sufficient.
> Those aren't zombies; zombies have state 'Z'.
Yes; it just seemed like an appropriate way to compare them, since they
were un-killable but still using up system resources.
--001a1141fc1cdc6557053b24a4f8
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
<div dir=3D"ltr"><span style=3D"font-size:12.8px">=C2=A0>What are "=
kernel parameters"? Are you talking just about envstat? Or</span><br s=
tyle=3D"font-size:12.8px"><span style=3D"font-size:12.8px">=C2=A0>sysctl=
? Or the combination of the two? Or...?<br></span>My experience here comes =
from a sysadmin/hobbyist (not kern hacker) perspective, so I might be using=
the wrong terminology. I am referring to the kernel values and variables s=
et/retrieved via ioctls from sysctl and envstat - such as those in the MIB =
for sysctl. I am not sure how sysctl and envstat vary about how they both i=
nteract with the kernel, so I can't pretend to have a more in-depth und=
erstanding than that above.=C2=A0<br><br>What I do know is that using one a=
nd/or the other command can influence the execution of the other. For examp=
le: if retrieving values with envstat deadlocks on I/O, calling sysctl can =
be enough to unblock envstat (and vice versa). In my experience, also runni=
ng either program concurrent with another copy of the same has caused these=
I/O blocks. So to answer your question - either envstat, or sysctl, or a c=
ombination of the two is sufficient.<br><br>>=C2=A0<span style=3D"font-s=
ize:12.8px">Those aren't zombies; zombies have state 'Z'.<br></=
span>Yes; it just seemed like an appropriate way to compare them, since the=
y were un-killable but still using up system resources.<br><div class=3D"gm=
ail_extra"><div class=3D"gmail_quote"><br></div></div></div>
--001a1141fc1cdc6557053b24a4f8--
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/51412: Syscall I/O race condition leads to deadlock and
lost interrupts
Date: Mon, 5 Sep 2016 18:53:03 +0000
On Sun, Aug 28, 2016 at 05:00:01PM +0000, Ryan Brackenbury wrote:
> >What are "kernel parameters"? Are you talking just about envstat? Or
> >sysctl? Or the combination of the two? Or...?
>
> My experience here comes from a sysadmin/hobbyist (not kern hacker)
> perspective, so I might be using the wrong terminology.
That's fine -- just need to be clear what we're talking about.
> I am referring to
> the kernel values and variables set/retrieved via ioctls from sysctl and
> envstat - such as those in the MIB for sysctl. I am not sure how sysctl and
> envstat vary about how they both interact with the kernel, so I can't
> pretend to have a more in-depth understanding than that above.
They're quite different systems and it's strange that they should be
coupled in this way. (though, given the code behind envstat, maybe
it's not that surprising...)
> What I do know is that using one and/or the other command can influence the
> execution of the other. For example: if retrieving values with envstat
> deadlocks on I/O, calling sysctl can be enough to unblock envstat (and vice
> versa). In my experience, also running either program concurrent with
> another copy of the same has caused these I/O blocks. So to answer your
> question - either envstat, or sysctl, or a combination of the two is
> sufficient.
This is odd, but we'll certainly look into it.
> > Those aren't zombies; zombies have state 'Z'.
>
> Yes; it just seemed like an appropriate way to compare them, since they
> were un-killable but still using up system resources.
Right, it's just confusing to reuse a technical term with a different
specific meaning :-)
--
David A. Holland
dholland@netbsd.org
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@NetBSD.org
Cc: Paul Goyette <paul@whooppee.com>
Subject: Re: kern/51412: Syscall I/O race condition leads to deadlock and lost interrupts
Date: Tue, 06 Sep 2016 07:58:04 +0700
Date: Mon, 5 Sep 2016 18:55:01 +0000 (UTC)
From: David Holland <dholland-bugs@netbsd.org>
Message-ID: <20160905185501.8A7777A283@mollari.NetBSD.org>
| > What I do know is that using one and/or the other command can influence the
| > execution of the other.
|
| This is odd, but we'll certainly look into it.
I wonder if this is related to the "blast from the past" message Paul
just posted about (on tech-kern). Locking problems with sysctl() seem
to be related to both ...
See:
https://mail-index.netbsd.org/current-users/2015/10/27/msg028285.html
https://mail-index.netbsd.org/tech-kern/2016/09/05/msg021028.html
https://mail-index.netbsd.org/tech-kern/2016/09/05/msg021029.html
kre
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.