NetBSD Problem Report #53576
From server@omniscient.com.au Thu Sep 6 06:24:19 2018
Return-Path: <server@omniscient.com.au>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 4CAF47A18A
for <gnats-bugs@gnats.NetBSD.org>; Thu, 6 Sep 2018 06:24:19 +0000 (UTC)
Message-Id: <20180906062414.5A2B74C7D90@dagonet.omniscient.local>
Date: Thu, 6 Sep 2018 16:24:14 +1000 (AEST)
From: michael@emte.net.au
Reply-To: michael@emte.net.au
To: gnats-bugs@NetBSD.org
Subject: lang/erlang 21.0nb1 freezes with rebar3
X-Send-Pr-Version: 3.95
>Number: 53576
>Category: kern
>Synopsis: EV_EOF is edge triggered instead of level triggered (was: lang/erlang 21.0nb1 freezes with rebar3)
>Confidential: no
>Severity: non-critical
>Priority: low
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Sep 06 06:25:00 +0000 2018
>Last-Modified: Wed Sep 19 02:45:01 +0000 2018
>Originator: Michael Taylor
>Release: NetBSD 8.0
>Organization:
>Environment:
System: NetBSD dagonet.omniscient.local 8.0 NetBSD 8.0 (GENERIC) #0: Tue Jul 17 14:59:51 UTC 2018 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
This started with Erlang tool rebar3 (http://www.rebar3.org/) freezing with
Erlang 21 where it did not on Erlang 20.3
After investiagtion I have found this happens since Erlang 21's implementation
of ports on NetBSD's kqueue can fail to receive the exit_status of external
processes spawned/forked with erlang:open_port/2.
Since Erlang 21 seems to already have an existing solution to the problem for
implemetations of kqueue on OpenBSD, I have submitted a report to Erlang/OTP:
https://bugs.erlang.org/browse/ERL-725
(ports fail to send exit_status on NetBSD)
I am also posting here for two purposes:
1. Since this issue relates to NetBSD's implementation of kqueue, perhaps
someone with more knowledge can confirm the proposed solution is the most
appropriate for Erlang on NetBSD.
2. My proposed patch may be added to pkgsrc whilst the Erlang team works on the
issue from their end.
My understanding of Erlang's logic condensed to the bare essentials is:
- a child process is fork()ed and joined with pipe()s
- the child is marked as alive = 1
- the child process is monitored by SIGCHLD to obtain the exit status
- the output pipe of the child process is added to a kqueue()
- if the exit status arrives via SIGCHLD
- the exit status is recorded
- set alive = 0
- if read() initiated by a EVFILT_READ event returns 0 (EOF)
- if alive == 0 then the eof/status pair are returned
- if alive == 1 then re-add/re-enable output pipe to kqueue()
This logic allows the SIGCHLD and EOF to arrive in any order whilst having only
one completion path (returning the eof/status pair).
The above has two implementations: EV_DISPATCH and EV_ONESHOT. EV_DISPATCH is
used if available except for on OpenBSD. This pr suggests that EV_DISPATCH not
be used on NetBSD either. The two implementations are distinguished by:
EV_ONESHOT:
- add fd to kqueue
EV_SET(&ev, fd, EVFILT_READ, EV_ADD|EV_ONESHOT, 0, 0, 0);
- re-add fd to kqueue
EV_SET(&ev, fd, EVFILT_READ, EV_ADD|EV_ONESHOT, 0, 0, 0);
EV_DISPATCH:
- add fd to kqueue
EV_SET(&ev, fd, EVFILT_READ, EV_ADD|EV_ENABLE|EV_DISPATCH, 0, 0, 0);
- re-enable fd in kqueue
EV_SET(&ev, fd, EVFILT_READ, EV_ENABLE|EV_DISPATCH, 0, 0, 0);
In the EV_DISPATCH case, an EOF event is not returned a second time after
re-enabling the EVFILT_READ.
>How-To-Repeat:
Build and install lang/erlang 21.0nb1, you may need to apply solutions to
pkg/53567 (toolchain/53567) to do this.
Create the following file: erl-725.escript
NOTE: This escript is a reduction of the logic in rebar3
----
#!escript
main(_) ->
Opts = [exit_status, {line, 16384}, use_stdio, stderr_to_stdout, hide, eof],
Exec = {spawn, "/bin/echo hello"},
Port = erlang:open_port(Exec, Opts),
data(Port),
erlang:port_close(Port).
data(Port) ->
receive
{Port, {data, Data}} ->
io:format("data: ~p~n", [Data]),
data(Port);
{Port, eof} ->
exit_status(Port)
end.
exit_status(Port) ->
receive
{Port, {exit_status, ExitStatus}} ->
io:format("exit status: ~p~n", [ExitStatus])
end.
----
Execute the escript:
----
$ escript erl-725.escript
data: {eol,"hello"}
----
The escript will freeze.
This compares with a working system (Erlang 20.3 from pkgsrc) that executes
and completes almost immediately:
----
$ escript erl-725.escript
data: {eol,"hello"}
exit status: 0
$
----
>Fix:
The following patch was suggested in the ERL-725 bug report:
----
--- erts/emulator/sys/common/erl_poll.c.orig 2018-09-04 19:31:46.151738848 +1000
+++ erts/emulator/sys/common/erl_poll.c 2018-09-04 19:32:37.383828393 +1000
@@ -803,8 +803,8 @@
struct kevent evts[2];
struct timespec ts = {0, 0};
-#if defined(EV_DISPATCH) && !defined(__OpenBSD__)
- /* If we have EV_DISPATCH we use it, unless we are on OpenBSD as the
+#if defined(EV_DISPATCH) && !(defined(__OpenBSD__) || defined(__NetBSD__))
+ /* If we have EV_DISPATCH we use it, unless we are on Open/NetBSD as the
behavior of EV_EOF seems to be edge triggered there and we need it
to be level triggered.
----
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: pkg-manager->kern-bug-people
Responsible-Changed-By: maya@NetBSD.org
Responsible-Changed-When: Tue, 18 Sep 2018 04:14:37 +0000
Responsible-Changed-Why:
Erlang now uses the workaround, now we need to consider how to modify the kernel side for this problem.
From: "Maya Rashish" <maya@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53576 CVS commit: pkgsrc/lang/erlang
Date: Tue, 18 Sep 2018 04:12:04 +0000
Module Name: pkgsrc
Committed By: maya
Date: Tue Sep 18 04:12:04 UTC 2018
Modified Files:
pkgsrc/lang/erlang: distinfo
Added Files:
pkgsrc/lang/erlang/patches: patch-erts_emulator_sys_common_erl__poll.c
Log Message:
erlang: Use existing workaround to deal with netbsd's kqueue
implementation limitation.
From Michael Taylor in PR pkg/53576, also in upstream ERL-725
To generate a diff of this commit:
cvs rdiff -u -r1.62 -r1.63 pkgsrc/lang/erlang/distinfo
cvs rdiff -u -r0 -r1.3 \
pkgsrc/lang/erlang/patches/patch-erts_emulator_sys_common_erl__poll.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/53576 (lang/erlang 21.0nb1 freezes with rebar3)
Date: Tue, 18 Sep 2018 09:34:23 +0200
On Tue, Sep 18, 2018 at 04:14:37AM +0000, maya@NetBSD.org wrote:
> Erlang now uses the workaround, now we need to consider how to modify
> the kernel side for this problem.
What exactly are you talking about here?
Maybe it would be better to file a new PR and describe the problem there?
Martin
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/53576 (lang/erlang 21.0nb1 freezes with rebar3)
Date: Tue, 18 Sep 2018 11:32:42 +0200
Ooops, I twidled some digits when looking up the PR. This is fine,
but a C example showing the effect (and usable as a best for a later
test case) would be great.
Martin
From: Michael Taylor <michael@emte.net.au>
To: gnats-bugs@netbsd.org, martin@duskware.de
Cc:
Subject: Re: kern/53576 (lang/erlang 21.0nb1 freezes with rebar3)
Date: Wed, 19 Sep 2018 12:41:56 +1000
Martin,
I prepared some self contained examples representing the use of
EV_DISPATCH and
EV_ONESHOT by Erlang. You can find them attached as dispatch.c and
oneshot.c at
https://bugs.erlang.org/browse/ERL-725
The dispatch example never terminates and oneshot shows the ability to
receive
an EOF event a second time after waiting for the SIGCHLD.
Michael.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.