NetBSD Problem Report #53576

From server@omniscient.com.au  Thu Sep  6 06:24:19 2018
Return-Path: <server@omniscient.com.au>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 4CAF47A18A
	for <gnats-bugs@gnats.NetBSD.org>; Thu,  6 Sep 2018 06:24:19 +0000 (UTC)
Message-Id: <20180906062414.5A2B74C7D90@dagonet.omniscient.local>
Date: Thu,  6 Sep 2018 16:24:14 +1000 (AEST)
From: michael@emte.net.au
Reply-To: michael@emte.net.au
To: gnats-bugs@NetBSD.org
Subject: lang/erlang 21.0nb1 freezes with rebar3
X-Send-Pr-Version: 3.95

>Number:         53576
>Category:       kern
>Synopsis:       EV_EOF is edge triggered instead of level triggered (was: lang/erlang 21.0nb1 freezes with rebar3)
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Sep 06 06:25:00 +0000 2018
>Last-Modified:  Wed Sep 19 02:45:01 +0000 2018
>Originator:     Michael Taylor
>Release:        NetBSD 8.0
>Organization:
>Environment:
System: NetBSD dagonet.omniscient.local 8.0 NetBSD 8.0 (GENERIC) #0: Tue Jul 17 14:59:51 UTC 2018 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64

>Description:

This started with Erlang tool rebar3 (http://www.rebar3.org/) freezing with
Erlang 21 where it did not on Erlang 20.3

After investiagtion I have found this happens since Erlang 21's implementation
of ports on NetBSD's kqueue can fail to receive the exit_status of external
processes spawned/forked with erlang:open_port/2.

Since Erlang 21 seems to already have an existing solution to the problem for
implemetations of kqueue on OpenBSD, I have submitted a report to Erlang/OTP:
  https://bugs.erlang.org/browse/ERL-725
  (ports fail to send exit_status on NetBSD)

I am also posting here for two purposes:

1. Since this issue relates to NetBSD's implementation of kqueue, perhaps
   someone with more knowledge can confirm the proposed solution is the most
   appropriate for Erlang on NetBSD.

2. My proposed patch may be added to pkgsrc whilst the Erlang team works on the
   issue from their end.

My understanding of Erlang's logic condensed to the bare essentials is:
- a child process is fork()ed and joined with pipe()s
- the child is marked as alive = 1
- the child process is monitored by SIGCHLD to obtain the exit status
- the output pipe of the child process is added to a kqueue()
- if the exit status arrives via SIGCHLD
  - the exit status is recorded
  - set alive = 0
- if read() initiated by a EVFILT_READ event returns 0 (EOF)
  - if alive == 0 then the eof/status pair are returned
  - if alive == 1 then re-add/re-enable output pipe to kqueue()

This logic allows the SIGCHLD and EOF to arrive in any order whilst having only
one completion path (returning the eof/status pair).

The above has two implementations: EV_DISPATCH and EV_ONESHOT. EV_DISPATCH is
used if available except for on OpenBSD. This pr suggests that EV_DISPATCH not
be used on NetBSD either. The two implementations are distinguished by:

EV_ONESHOT:
- add fd to kqueue
  EV_SET(&ev, fd, EVFILT_READ, EV_ADD|EV_ONESHOT, 0, 0, 0);
- re-add fd to kqueue
  EV_SET(&ev, fd, EVFILT_READ, EV_ADD|EV_ONESHOT, 0, 0, 0);

EV_DISPATCH:
- add fd to kqueue
  EV_SET(&ev, fd, EVFILT_READ, EV_ADD|EV_ENABLE|EV_DISPATCH, 0, 0, 0);
- re-enable fd in kqueue
  EV_SET(&ev, fd, EVFILT_READ, EV_ENABLE|EV_DISPATCH, 0, 0, 0);

In the EV_DISPATCH case, an EOF event is not returned a second time after
re-enabling the EVFILT_READ.

>How-To-Repeat:

Build and install lang/erlang 21.0nb1, you may need to apply solutions to
pkg/53567 (toolchain/53567) to do this.

Create the following file: erl-725.escript
NOTE: This escript is a reduction of the logic in rebar3
----
#!escript

main(_) ->
    Opts = [exit_status, {line, 16384}, use_stdio, stderr_to_stdout, hide, eof],
    Exec = {spawn, "/bin/echo hello"},
    Port = erlang:open_port(Exec, Opts),
    data(Port),
    erlang:port_close(Port).

data(Port) ->
    receive
        {Port, {data, Data}} ->
            io:format("data: ~p~n", [Data]),
            data(Port);
        {Port, eof} ->
            exit_status(Port)
    end.

exit_status(Port) ->
    receive
        {Port, {exit_status, ExitStatus}} ->
            io:format("exit status: ~p~n", [ExitStatus])
    end.
----

Execute the escript:
----
$ escript erl-725.escript
data: {eol,"hello"}
----
The escript will freeze.

This compares with a working system (Erlang 20.3 from pkgsrc) that executes
and completes almost immediately:
----
$ escript erl-725.escript
data: {eol,"hello"}
exit status: 0
$
----

>Fix:

The following patch was suggested in the ERL-725 bug report:
----
--- erts/emulator/sys/common/erl_poll.c.orig    2018-09-04 19:31:46.151738848 +1000
+++ erts/emulator/sys/common/erl_poll.c 2018-09-04 19:32:37.383828393 +1000
@@ -803,8 +803,8 @@
     struct kevent evts[2];
     struct timespec ts = {0, 0};

-#if defined(EV_DISPATCH) && !defined(__OpenBSD__)
-    /* If we have EV_DISPATCH we use it, unless we are on OpenBSD as the
+#if defined(EV_DISPATCH) && !(defined(__OpenBSD__) || defined(__NetBSD__))
+    /* If we have EV_DISPATCH we use it, unless we are on Open/NetBSD as the
        behavior of EV_EOF seems to be edge triggered there and we need it
        to be level triggered.

----

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: pkg-manager->kern-bug-people
Responsible-Changed-By: maya@NetBSD.org
Responsible-Changed-When: Tue, 18 Sep 2018 04:14:37 +0000
Responsible-Changed-Why:
Erlang now uses the workaround, now we need to consider how to modify the kernel side for this problem.


From: "Maya Rashish" <maya@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53576 CVS commit: pkgsrc/lang/erlang
Date: Tue, 18 Sep 2018 04:12:04 +0000

 Module Name:	pkgsrc
 Committed By:	maya
 Date:		Tue Sep 18 04:12:04 UTC 2018

 Modified Files:
 	pkgsrc/lang/erlang: distinfo
 Added Files:
 	pkgsrc/lang/erlang/patches: patch-erts_emulator_sys_common_erl__poll.c

 Log Message:
 erlang: Use existing workaround to deal with netbsd's kqueue
 implementation limitation.

 From Michael Taylor in PR pkg/53576, also in upstream ERL-725


 To generate a diff of this commit:
 cvs rdiff -u -r1.62 -r1.63 pkgsrc/lang/erlang/distinfo
 cvs rdiff -u -r0 -r1.3 \
     pkgsrc/lang/erlang/patches/patch-erts_emulator_sys_common_erl__poll.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53576 (lang/erlang 21.0nb1 freezes with rebar3)
Date: Tue, 18 Sep 2018 09:34:23 +0200

 On Tue, Sep 18, 2018 at 04:14:37AM +0000, maya@NetBSD.org wrote:
 > Erlang now uses the workaround, now we need to consider how to modify
 > the kernel side for this problem.

 What exactly are you talking about here?
 Maybe it would be better to file a new PR and describe the problem there?

 Martin

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53576 (lang/erlang 21.0nb1 freezes with rebar3)
Date: Tue, 18 Sep 2018 11:32:42 +0200

 Ooops, I twidled some digits when looking up the PR. This is fine,
 but a C example showing the effect (and usable as a best for a later
 test case) would be great.

 Martin

From: Michael Taylor <michael@emte.net.au>
To: gnats-bugs@netbsd.org, martin@duskware.de
Cc: 
Subject: Re: kern/53576 (lang/erlang 21.0nb1 freezes with rebar3)
Date: Wed, 19 Sep 2018 12:41:56 +1000

 Martin,

 I prepared some self contained examples representing the use of 
 EV_DISPATCH and
 EV_ONESHOT by Erlang. You can find them attached as dispatch.c and 
 oneshot.c at
    https://bugs.erlang.org/browse/ERL-725

 The dispatch example never terminates and oneshot shows the ability to 
 receive
 an EOF event a second time after waiting for the SIGCHLD.

 Michael.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.