NetBSD Problem Report #55704

From www@netbsd.org  Thu Oct  8 09:53:16 2020
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 879EC1A9239
	for <gnats-bugs@gnats.NetBSD.org>; Thu,  8 Oct 2020 09:53:16 +0000 (UTC)
Message-Id: <20201008095315.7AC0A1A923A@mollari.NetBSD.org>
Date: Thu,  8 Oct 2020 09:53:15 +0000 (UTC)
From: rokuyama.rk@gmail.com
Reply-To: rokuyama.rk@gmail.com
To: gnats-bugs@NetBSD.org
Subject: multi-threaded applications for earmv[45]{,hf} freeze on COMPAT_NETBSD32 of aarch64
X-Send-Pr-Version: www-1.0

>Number:         55704
>Category:       port-arm
>Synopsis:       multi-threaded applications for earmv[45]{,hf} freeze on COMPAT_NETBSD32 of aarch64
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-arm-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Oct 08 09:55:00 +0000 2020
>Last-Modified:  Fri Oct 09 23:40:01 +0000 2020
>Originator:     Rin Okuyama
>Release:        9.99.73
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD rpi 9.99.73 NetBSD 9.99.73 (GENERIC64) #38: Wed Oct  7 17:28:52 JST 2020  rin@latipes:/sys/arch/evbarm/compile/GENERIC64 evbarm aarch64
>Description:
Multi-threaded applications on userland for earmv[45]{,hf} freeze
indefinitely on COMPAT_NETBSD32 of aarch64, if more than one CPU
cores are online. For example, ctfmerge(1) freezes almost every time
during build of pkgsrc/pkgtools/cwrappers:

----
# uname -p
aarch64
# file /emul/netbsd32/bin/sh
/bin/sh: ELF 32-bit LSB pie executable, ARM, EABI5 version 1 (SYSV), dynamically linked, interpreter /libexec/ld.elf_so, for NetBSD 9.99.73, compiled for: earmv5, not stripped
# chroot /emul/netbsd32 su -
# cd /usr/pkgsrc/pkgtools/cwrappers && make MAKE_JOBS=1
...
ctfmerge -t -g -L VERSION -o c++-wrapper alloc.o cleanup-cc.o common.o reorder-cc.o generic-transform-cc.o normalise-cc.o c++-wrapper.o transform-cc.o
(then stalls here eternally)
----

GDB shows that it is sleeping in lwp_park(2):

----
# fg
make MAKE_JOBS=1
^Z[1] + Suspended               make MAKE_JOBS=1
# bg
[1] make MAKE_JOBS=1
# gdb -p `pgrep ctfmerge`
...
Thread 1 "" received signal SIGCONT, Continued.
[Switching to LWP 3419 of process 3245]
0xf3a3c4c4 in ___lwp_park60 () from /usr/libexec/ld.elf_so
(gdb) bt
#0  0xf3a3c4c4 in ___lwp_park60 () from /usr/libexec/ld.elf_so
#1  0xf3a31e6c in _rtld_exclusive_enter (mask=mask@entry=0xf73fff90)
    at /usr/src/libexec/ld.elf_so/rtld.c:1766
#2  0xf3a39e60 in _rtld_tls_get_addr (tls=0xf796f000, idx=2, offset=0)
    at /usr/src/libexec/ld.elf_so/tls.c:68
#3  0xf7ac9e48 in __cxa_thread_run_atexit ()
    at /usr/src/lib/libc/stdlib/cxa_thread_atexit.c:55
#4  0xf7c1bc1c in pthread_exit (retval=0x0)
    at /usr/src/lib/libpthread/pthread.c:629
#5  0xf7c1bd18 in pthread__create_tramp (cookie=0xf7b79000)
    at /usr/src/lib/libpthread/pthread.c:562
#6  0xf7af99f4 in __mknod50 () from /usr/lib/libc.so.12
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
(gdb)
----

If only one CPU core is online by cpuctl(8), ctfmerge(1) works without
problems for COMPAT_NETBSD32. This strongly suggests that there may be
some problems for earmv[45]{,hf} userland on multi-processor machines.
>How-To-Repeat:
Described above.
>Fix:
I'm not sure whether we can fix this problem without modifying userland
binaries for earmv[45]{,hf}. While arm variants prior to v6 realize
atomic_ops(3) by swp instruction (we emulate it for COMPAT_NETBSD32),
they does not have membar_ops(3), since they are not intended for
multi-processor machines. Actually, you can see our membar_ops(3) are
no-op for arm processors prior to v6:

https://nxr.netbsd.org/xref/src/common/lib/libc/arch/arm/atomic/membar_ops.S#33

>Audit-Trail:
From: Tobias Nygren <tnn@NetBSD.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-arm/55704: multi-threaded applications for earmv[45]{,hf}
 freeze on COMPAT_NETBSD32 of aarch64
Date: Thu, 8 Oct 2020 13:00:19 +0200

 > I'm not sure whether we can fix this problem without modifying userland
 > binaries for earmv[45]{,hf}. While arm variants prior to v6 realize
 > atomic_ops(3) by swp instruction (we emulate it for COMPAT_NETBSD32),
 > they does not have membar_ops(3), since they are not intended for
 > multi-processor machines. Actually, you can see our membar_ops(3) are
 > no-op for arm processors prior to v6:

 There were, in another context, discussions about need of pinning
 processes to the CPU they were launched on in some situations. There
 was some (don't recall the model) Samsung big.LITTLE SoC that only
 implements crypto instruction set extensions on the big cores.

 Can the compat layer detect at exec time if the binary is v[45] or v6+?
 for a purely userland solution you might be able to pin the process
 in ld.elf_so or with an LD_PRELOAD library.
 The problem with that is only root has permision to use pset(3).

From: Rin Okuyama <rokuyama.rk@gmail.com>
To: Tobias Nygren <tnn@NetBSD.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: port-arm/55704: multi-threaded applications for earmv[45]{,hf}
 freeze on COMPAT_NETBSD32 of aarch64
Date: Sat, 10 Oct 2020 08:36:58 +0900

 On 2020/10/08 20:05, Tobias Nygren wrote:
 >   > I'm not sure whether we can fix this problem without modifying userland
 >   > binaries for earmv[45]{,hf}. While arm variants prior to v6 realize
 >   > atomic_ops(3) by swp instruction (we emulate it for COMPAT_NETBSD32),
 >   > they does not have membar_ops(3), since they are not intended for
 >   > multi-processor machines. Actually, you can see our membar_ops(3) are
 >   > no-op for arm processors prior to v6:
 >   
 >   There were, in another context, discussions about need of pinning
 >   processes to the CPU they were launched on in some situations. There
 >   was some (don't recall the model) Samsung big.LITTLE SoC that only
 >   implements crypto instruction set extensions on the big cores.
 >   
 >   Can the compat layer detect at exec time if the binary is v[45] or v6+?
 >   for a purely userland solution you might be able to pin the process
 >   in ld.elf_so or with an LD_PRELOAD library.
 >   The problem with that is only root has permision to use pset(3).

 Yes, kernel recognizes for which architecture a binary is built,
 from ELF note embedded in it.

 	https://nxr.netbsd.org/xref/src/sys/kern/exec_elf.c#1047

 Kernel saves it in an MD field of struct proc,

 	https://nxr.netbsd.org/xref/src/sys/arch/aarch64/aarch64/exec_machdep.c#89

 in order to export it as hw.machine sysctl node.

 	https://nxr.netbsd.org/xref/src/sys/arch/aarch64/include/netbsd32_machdep.h#122
 	https://nxr.netbsd.org/xref/src/sys/compat/netbsd32/netbsd32_sysctl.c#107

 Therefore, kernel, in principle, can detect a v[45] process to pin
 it onto a CPU core, without needing to bother userland with
 permission of pset(3), or sched_setaffinity(2).

 However, unfortunately, there still remain problems. (1) Scheduling
 for v[45] processes should becomes poor. Also, (2) there is still
 no memory coherence guaranteed between threads belonging to
 *different* processes; consider if they share some memory region
 with MAP_SHARED mapping.

 (2) may be pretty rare cases although...

 Thanks,
 rin

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.