NetBSD Problem Report #57437
From www@netbsd.org Thu May 25 07:43:58 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id E2BB11A9238
for <gnats-bugs@gnats.NetBSD.org>; Thu, 25 May 2023 07:43:57 +0000 (UTC)
Message-Id: <20230525074326.B40A81A9239@mollari.NetBSD.org>
Date: Thu, 25 May 2023 07:43:26 +0000 (UTC)
From: Vasily.Dybala@kaspersky.com
Reply-To: Vasily.Dybala@kaspersky.com
To: gnats-bugs@NetBSD.org
Subject: arm/aarch64: pthread_mutex is not efficient on spinlock
X-Send-Pr-Version: www-1.0
>Number: 57437
>Category: port-arm
>Synopsis: arm/aarch64: pthread_mutex is not efficient on spinlock
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: port-arm-maintainer
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu May 25 07:45:00 +0000 2023
>Closed-Date: Sat Aug 05 16:29:33 +0000 2023
>Last-Modified: Sat Aug 05 16:29:33 +0000 2023
>Originator: Vasily Dybala
>Release: all versions
>Organization:
OOO Kaspersky
>Environment:
>Description:
pthread_mutex_lock - tries to use some optimizations before goes sleep to kernel.
It just use short spinlock in hope that mutex was locked for a short time and another thread can capture it faster than switch to kernel and return back to userspace.
It is ok, such optimization is used by many implementations on different architectures and operating systems.
Lets dig into spinlock implementation from:
http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libpthread/pthread_mutex.c?annotate=1.83&only_with_tag=MAIN
NOINLINE static void *
254: pthread__mutex_spin(pthread_mutex_t *ptm, pthread_t owner)
255: {
256: pthread_t thread;
257: unsigned int count, i;
258:
259: for (count = 2;; owner = ptm->ptm_owner) {
260: thread = (pthread_t)MUTEX_OWNER(owner);
261: if (thread == NULL)
262: break;
1.66 ad 263: if (thread->pt_lwpctl->lc_curcpu == LWPCTL_CPU_NONE)
1.44 ad 264: break;
1.83 ! riastrad 265: if (count < 128)
1.44 ad 266: count += count;
267: for (i = count; i != 0; i--)
268: pthread__mutex_pause();
269: }
1.2 thorpej 270:
1.44 ad 271: return owner;
272: }
look at lines 265-266 and loop at line 267-268.
this means that if we unable to capture mutex, we do loop over mutex_pause and if we are unable for a long time, we will do loop for more and more iterations (2, 4, 8, 16, ...)
On the most architectures is is ok, because we "sleep" on NOP instruction, so we wait for some cpu ticks and try to check mutex state again.
But on arm/aarch64 architectures it works in completely different way:
pthread__mutex_pause() is expanded to WFE instruction
http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libpthread/arch/aarch64/pthread_md.h?annotate=1.1&only_with_tag=MAIN
which have different meaning. It will go sleep cpu core till somebody send SEV instruction.
Let's look at loop again:
267: for (i = count; i != 0; i--)
268: pthread__mutex_pause();
so we will sleep at WFE instruction 2,4,8,16 times, and wait next mutex status check after somebody 2,4,8 send SEV instruction (2,4,8 times send mutex__smt_wake().
It is completely wrong. So we need to check mutex status after each smt_wake() call.
Another issue is WFE can "sleep" for infinity time, it broke behavior when we need to capture mutex during some timeout.
>How-To-Repeat:
1) create main thread with locking some job:
pthread_mutex_lock()
for (int i = 1; i < 100000; i++)
do NOP instruction without sleep in kernel
pthread_mutex_unlock()
2) launch other threads with the same implementation.
pthread_mutex_lock()
for (int i = 1; i < 100000; i++)
do NOP instruction without sleep in kernel
pthread_mutex_unlock()
3) check that all of threads are in userspace and nobody will sleep in kernel
>Fix:
My suggestion is replace pthread__smt_pause/wake() for arm/aarch64 to NOP instruction.
>Release-Note:
>Audit-Trail:
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57437 CVS commit: src/lib/libpthread
Date: Thu, 25 May 2023 14:29:45 +0000
Module Name: src
Committed By: riastradh
Date: Thu May 25 14:29:45 UTC 2023
Modified Files:
src/lib/libpthread: pthread_int.h
src/lib/libpthread/arch/arm: pthread_md.h
Log Message:
libpthread: Use __nothing, not /* nothing */, for empty macros.
No functional change intended -- just safer to do it this way in case
the macros are used in if branches or comma expressions.
PR port-arm/57437 (pthread__smt_pause/wake issue)
XXX pullup-10
To generate a diff of this commit:
cvs rdiff -u -r1.109 -r1.110 src/lib/libpthread/pthread_int.h
cvs rdiff -u -r1.11 -r1.12 src/lib/libpthread/arch/arm/pthread_md.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Thu, 25 May 2023 14:30:58 +0000
State-Changed-Why:
Candidate fix committed -- can you try it now?
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57437 CVS commit: src/lib/libpthread
Date: Thu, 25 May 2023 14:30:03 +0000
Module Name: src
Committed By: riastradh
Date: Thu May 25 14:30:03 UTC 2023
Modified Files:
src/lib/libpthread: pthread_int.h pthread_spin.c
src/lib/libpthread/arch/aarch64: pthread_md.h
src/lib/libpthread/arch/arm: pthread_md.h
src/lib/libpthread/arch/i386: pthread_md.h
src/lib/libpthread/arch/x86_64: pthread_md.h
Log Message:
libpthread: New pthread__smt_wait to put CPU in low power for spin.
This is now distinct from pthread__smt_pause, which is for spin lock
backoff with no paired wakeup.
On Arm, there is a single-bit event register per CPU, and there are two
instructions to manage it:
- wfe, wait for event -- if event register is clear, enter low power
mode and wait until event register is set; then exit low power mode
and clear event register
- sev, signal event -- sets event register on all CPUs (other
circumstances like interrupts also set the event register and cause
wfe to wake)
These can be used to reduce the power consumption of spinning for a
lock, but only if they are actually paired -- if there's no sev, wfe
might hang indefinitely. Currently only pthread_spin(3) actually
pairs them; the other lock primitives (internal lock, mutex, rwlock)
do not -- they have spin lock backoff loops, but no corresponding
wakeup to cancel a wfe.
It may be worthwhile to teach the other lock primitives to pair
wfe/sev, but that requires some performance measurement to verify
it's actually worthwhile. So for now, we just make sure not to use
wfe when there's no sev, and keep everything else the same -- this
should fix severe performance degredation in libpthread on Arm
without hurting anything else.
No change in the generated code on amd64 and i386. No change in the
generated code for pthread_spin.c on arm and aarch64 -- changes only
the generated code for pthread_lock.c, pthread_mutex.c, and
pthread_rwlock.c, as intended.
PR port-arm/57437
XXX pullup-10
To generate a diff of this commit:
cvs rdiff -u -r1.110 -r1.111 src/lib/libpthread/pthread_int.h
cvs rdiff -u -r1.10 -r1.11 src/lib/libpthread/pthread_spin.c
cvs rdiff -u -r1.1 -r1.2 src/lib/libpthread/arch/aarch64/pthread_md.h
cvs rdiff -u -r1.12 -r1.13 src/lib/libpthread/arch/arm/pthread_md.h
cvs rdiff -u -r1.20 -r1.21 src/lib/libpthread/arch/i386/pthread_md.h
cvs rdiff -u -r1.12 -r1.13 src/lib/libpthread/arch/x86_64/pthread_md.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Vasily Dybala <Vasily.Dybala@kaspersky.com>
To: "gnats-bugs@netbsd.org" <gnats-bugs@netbsd.org>,
"port-arm-maintainer@netbsd.org" <port-arm-maintainer@netbsd.org>,
"gnats-admin@netbsd.org" <gnats-admin@netbsd.org>, "netbsd-bugs@netbsd.org"
<netbsd-bugs@netbsd.org>
Cc:
Subject: RE: PR/57437 CVS commit: src/lib/libpthread
Date: Fri, 26 May 2023 08:03:18 +0000
Hello,
Thanks for quick fix.
Fix looks ok, but pthread__smt_wake() call from pthread_mutex_unlock() wil=
l call SEV instruction which does not have paired WFE in pthread_mutex_loc=
k().
I will check patch during next week.
-----Original Message-----
From: Taylor R Campbell <riastradh@netbsd.org>=20
Sent: Thursday, May 25, 2023 17:35
To: port-arm-maintainer@netbsd.org; gnats-admin@netbsd.org; netbsd-bugs@net=
bsd.org; Vasily Dybala <Vasily.Dybala@kaspersky.com>
Subject: PR/57437 CVS commit: src/lib/libpthread
Caution: This is an external email. Be cautious while opening links or atta=
chments.
The following reply was made to PR port-arm/57437; it has been noted by GNA=
TS.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57437 CVS commit: src/lib/libpthread
Date: Thu, 25 May 2023 14:30:03 +0000
Module Name: src
Committed By: riastradh
Date: Thu May 25 14:30:03 UTC 2023
Modified Files:
src/lib/libpthread: pthread_int.h pthread_spin.c
src/lib/libpthread/arch/aarch64: pthread_md.h
src/lib/libpthread/arch/arm: pthread_md.h
src/lib/libpthread/arch/i386: pthread_md.h
src/lib/libpthread/arch/x86_64: pthread_md.h
Log Message:
libpthread: New pthread__smt_wait to put CPU in low power for spin.
This is now distinct from pthread__smt_pause, which is for spin lock back=
off with no paired wakeup.
On Arm, there is a single-bit event register per CPU, and there are two i=
nstructions to manage it:
- wfe, wait for event -- if event register is clear, enter low power
mode and wait until event register is set; then exit low power mode
and clear event register
- sev, signal event -- sets event register on all CPUs (other
circumstances like interrupts also set the event register and cause
wfe to wake)
These can be used to reduce the power consumption of spinning for a lock,=
but only if they are actually paired -- if there's no sev, wfe might hang=
indefinitely. Currently only pthread_spin(3) actually pairs them; the ot=
her lock primitives (internal lock, mutex, rwlock) do not -- they have spi=
n lock backoff loops, but no corresponding wakeup to cancel a wfe.
It may be worthwhile to teach the other lock primitives to pair wfe/sev, =
but that requires some performance measurement to verify it's actually wor=
thwhile. So for now, we just make sure not to use wfe when there's no sev=
, and keep everything else the same -- this should fix severe performance =
degredation in libpthread on Arm without hurting anything else.
No change in the generated code on amd64 and i386. No change in the gene=
rated code for pthread_spin.c on arm and aarch64 -- changes only the gener=
ated code for pthread_lock.c, pthread_mutex.c, and pthread_rwlock.c, as in=
tended.
PR port-arm/57437
XXX pullup-10
To generate a diff of this commit:
cvs rdiff -u -r1.110 -r1.111 src/lib/libpthread/pthread_int.h cvs rdiff -=
u -r1.10 -r1.11 src/lib/libpthread/pthread_spin.c cvs rdiff -u -r1.1 -r1.2=
src/lib/libpthread/arch/aarch64/pthread_md.h
cvs rdiff -u -r1.12 -r1.13 src/lib/libpthread/arch/arm/pthread_md.h
cvs rdiff -u -r1.20 -r1.21 src/lib/libpthread/arch/i386/pthread_md.h
cvs rdiff -u -r1.12 -r1.13 src/lib/libpthread/arch/x86_64/pthread_md.h
Please note that diffs are not public domain; they are subject to the cop=
yright notices on the relevant files.
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Vasily Dybala <Vasily.Dybala@kaspersky.com>
Cc: gnats-bugs@netbsd.org,
port-arm-maintainer@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: PR/57437 CVS commit: src/lib/libpthread
Date: Fri, 26 May 2023 10:29:54 +0000
> Date: Fri, 26 May 2023 08:03:18 +0000
> From: Vasily Dybala <Vasily.Dybala@kaspersky.com>
>
> Fix looks ok, but pthread__smt_wake() call from
> pthread_mutex_unlock() will call SEV instruction which does not have
> paired WFE in pthread_mutex_lock().
We could add a call to pthread__smt_wake in pthread_mutex_unlock to
match a pthread__smt_wait in pthread__mutex_pause, but that's not
enough on its own, because currently pthread__mutex_spin calls
pthread__mutex_pause in a loop and might just WFE again without
realizing it can make progress after it has been woken by SEV.
So we would need to restructure pthread__mutex_spin so that it calls
pthread__smt_wait exactly once after it examines the owner and
determines the owner is still running on a CPU.
That may be worthwhile, but it's a little more involved to implement,
because the surrounding logic has to be conditionalized on which style
of spin loop. So I picked the less-risky and less-involved option to
commit now. We can try the WFE/SEV approach for pthread_mutex (and
rwlock and, if it's still relevant, simple lock) later.
> I will check patch during next week.
Great, thanks!
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Vasily Dybala <Vasily.Dybala@kaspersky.com>
Cc: gnats-bugs@netbsd.org, port-arm-maintainer@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: PR/57437 CVS commit: src/lib/libpthread
Date: Fri, 26 May 2023 11:07:54 +0000
This is a multi-part message in MIME format.
--=_doTlMKKkYbLBtme6fab/51tYZGWhbizH
> Date: Fri, 26 May 2023 10:29:54 +0000
> From: Taylor R Campbell <riastradh@NetBSD.org>
>
> We could add a call to pthread__smt_wake in pthread_mutex_unlock to
> match a pthread__smt_wait in pthread__mutex_pause, but that's not
> enough on its own, because currently pthread__mutex_spin calls
> pthread__mutex_pause in a loop and might just WFE again without
> realizing it can make progress after it has been woken by SEV.
>
> So we would need to restructure pthread__mutex_spin so that it calls
> pthread__smt_wait exactly once after it examines the owner and
> determines the owner is still running on a CPU.
The attached patch does just this for Arm, and leaves non-Arm alone.
It can be tested independently for pthread_mutex and pthread_rwlock by
defining or undefining the appropriate macro in pthread_md.h. I don't
have the time to do performance measurements right now, but if you
feel so inclined, we would certainly consider it!
--=_doTlMKKkYbLBtme6fab/51tYZGWhbizH
Content-Type: text/plain; charset="ISO-8859-1"; name="pr57437-mutexrwlockwaitwake"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="pr57437-mutexrwlockwaitwake.patch"
From 81aff4df7bf94b3fc8a88db755de2451711ed040 Mon Sep 17 00:00:00 2001
From: Taylor R Campbell <riastradh@NetBSD.org>
Date: Fri, 26 May 2023 10:56:00 +0000
Subject: [PATCH] WIP: Use WFE/SEV judiciously in pthread_mutex and
pthread_rwlock.
Can be turned on or off by defining or undefining the appropriate
macro in pthread_md.h:
PTHREAD__MUTEX_SPIN_WAIT_WAKE
PTHREAD__RWLOCK_SPIN_WAIT_WAKE
PR port-arm/57437
XXX pullup-10
---
lib/libpthread/arch/aarch64/pthread_md.h | 3 +++
lib/libpthread/arch/arm/pthread_md.h | 6 +++++-
lib/libpthread/pthread_mutex.c | 17 +++++++++++++++--
lib/libpthread/pthread_rwlock.c | 22 ++++++++++++++++++++++
4 files changed, 45 insertions(+), 3 deletions(-)
diff --git a/lib/libpthread/arch/aarch64/pthread_md.h b/lib/libpthread/arch=
/aarch64/pthread_md.h
index 7b79e6d347ea..c8c0b4265211 100644
--- a/lib/libpthread/arch/aarch64/pthread_md.h
+++ b/lib/libpthread/arch/aarch64/pthread_md.h
@@ -42,6 +42,9 @@ pthread__sp(void)
return ret;
}
=20
+#define PTHREAD__MUTEX_SPIN_WAIT_WAKE
+#define PTHREAD__RWLOCK_SPIN_WAIT_WAKE
+
#define pthread__smt_wait() __asm __volatile("wfe") /* wfe */
#define pthread__smt_wake() __asm __volatile("sev") /* sev */
=20
diff --git a/lib/libpthread/arch/arm/pthread_md.h b/lib/libpthread/arch/arm=
/pthread_md.h
index 1ed6a4914cae..cc04ed4a788d 100644
--- a/lib/libpthread/arch/arm/pthread_md.h
+++ b/lib/libpthread/arch/arm/pthread_md.h
@@ -50,12 +50,16 @@ pthread__sp(void)
}
=20
#if defined(__thumb__) && defined(_ARM_ARCH_6)
+#define PTHREAD__MUTEX_SPIN_WAIT_WAKE
+#define PTHREAD__RWLOCK_SPIN_WAIT_WAKE
#define pthread__smt_wait() __asm __volatile(".inst.n 0xbf20") /* wfe */
#define pthread__smt_wake() __asm __volatile(".inst.n 0xbf40") /* sev */
#elif !defined(__thumb__)
+#define PTHREAD__MUTEX_SPIN_WAIT_WAKE
+#define PTHREAD__RWLOCK_SPIN_WAIT_WAKE
#define pthread__smt_wait() __asm __volatile(".inst 0xe320f002") /* wfe */
#define pthread__smt_wake() __asm __volatile(".inst 0xe320f004") /* sev */
-#else
+#else /* !defined(__thumb__) && !defined(_ARM_ARCH_6) */
#define pthread__smt_wait() __nothing
#define pthread__smt_wake() __nothing
#endif
diff --git a/lib/libpthread/pthread_mutex.c b/lib/libpthread/pthread_mutex.c
index 00005c9eac6b..77f402ff1d12 100644
--- a/lib/libpthread/pthread_mutex.c
+++ b/lib/libpthread/pthread_mutex.c
@@ -103,7 +103,9 @@ struct waiter {
static void pthread__mutex_wakeup(pthread_t, struct pthread__waiter *);
static int pthread__mutex_lock_slow(pthread_mutex_t *,
const struct timespec *);
+#ifndef PTHREAD__MUTEX_SPIN_WAIT_WAKE
static void pthread__mutex_pause(void);
+#endif
=20
int _pthread_mutex_held_np(pthread_mutex_t *);
pthread_t _pthread_mutex_owner_np(pthread_mutex_t *);
@@ -238,6 +240,7 @@ pthread_mutex_timedlock(pthread_mutex_t* ptm, const str=
uct timespec *ts)
return pthread__mutex_lock_slow(ptm, ts);
}
=20
+#ifndef PTHREAD__MUTEX_SPIN_WAIT_WAKE
/* We want function call overhead. */
NOINLINE static void
pthread__mutex_pause(void)
@@ -245,6 +248,7 @@ pthread__mutex_pause(void)
=20
pthread__smt_pause();
}
+#endif
=20
/*
* Spin while the holder is running. 'lwpctl' gives us the true
@@ -254,18 +258,24 @@ NOINLINE static void *
pthread__mutex_spin(pthread_mutex_t *ptm, pthread_t owner)
{
pthread_t thread;
- unsigned int count, i;
+#ifndef PTHREAD__MUTEX_SPIN_WAIT_WAKE
+ unsigned int count =3D 2, i;
+#endif
=20
- for (count =3D 2;; owner =3D ptm->ptm_owner) {
+ for (;; owner =3D ptm->ptm_owner) {
thread =3D (pthread_t)MUTEX_OWNER(owner);
if (thread =3D=3D NULL)
break;
if (thread->pt_lwpctl->lc_curcpu =3D=3D LWPCTL_CPU_NONE)
break;
+#ifdef PTHREAD__MUTEX_SPIN_WAIT_WAKE
+ pthread__smt_wait();
+#else
if (count < 128)
count +=3D count;
for (i =3D count; i !=3D 0; i--)
pthread__mutex_pause();
+#endif
}
=20
return owner;
@@ -501,6 +511,9 @@ pthread_mutex_unlock(pthread_mutex_t *ptm)
*/
#ifndef PTHREAD__ATOMIC_IS_MEMBAR
membar_enter();
+#endif
+#ifdef PTHREAD__MUTEX_SPIN_WAIT_WAKE
+ pthread__smt_wake();
#endif
if (MUTEX_OWNER(newval) =3D=3D 0 && ptm->ptm_waiters !=3D NULL) {
pthread__mutex_wakeup(self,
diff --git a/lib/libpthread/pthread_rwlock.c b/lib/libpthread/pthread_rwloc=
k.c
index 90e48a14f3ab..a5c112bbdf5c 100644
--- a/lib/libpthread/pthread_rwlock.c
+++ b/lib/libpthread/pthread_rwlock.c
@@ -127,6 +127,7 @@ pthread_rwlock_destroy(pthread_rwlock_t *ptr)
return 0;
}
=20
+#ifndef PTHREAD__RWLOCK_SPIN_WAIT_WAKE
/* We want function call overhead. */
NOINLINE static void
pthread__rwlock_pause(void)
@@ -134,12 +135,15 @@ pthread__rwlock_pause(void)
=20
pthread__smt_pause();
}
+#endif
=20
NOINLINE static int
pthread__rwlock_spin(uintptr_t owner)
{
pthread_t thread;
+#ifndef PTHREAD__RWLOCK_SPIN_WAIT_WAKE
unsigned int i;
+#endif
=20
if ((owner & ~RW_THREAD) !=3D RW_WRITE_LOCKED)
return 0;
@@ -149,8 +153,12 @@ pthread__rwlock_spin(uintptr_t owner)
thread->pt_lwpctl->lc_curcpu =3D=3D LWPCTL_CPU_NONE)
return 0;
=20
+#ifdef PTHREAD__RWLOCK_SPIN_WAIT_WAKE
+ pthread__smt_wait();
+#else
for (i =3D 128; i !=3D 0; i--)
pthread__rwlock_pause();
+#endif
return 1;
}
=20
@@ -487,6 +495,9 @@ pthread_rwlock_unlock(pthread_rwlock_t *ptr)
next =3D rw_cas(ptr, owner, new);
if (owner =3D=3D next) {
/* Released! */
+#ifdef PTHREAD__RWLOCK_SPIN_WAIT_WAKE
+ pthread__smt_wake();
+#endif
return 0;
}
continue;
@@ -559,6 +570,17 @@ pthread_rwlock_unlock(pthread_rwlock_t *ptr)
ptr->ptr_nreaders =3D 0;
pthread__unpark_all(&ptr->ptr_rblocked, self,
interlock);
+
+ /*
+ * If we're doing CPU wait/wake instructions
+ * for spin loops, there may also be other CPUs
+ * spinning for a read lock -- wake them
+ * promptly too; no reason to make them hang
+ * when they could be making progress.
+ */
+#ifdef PTHREAD__RWLOCK_SPIN_WAIT_WAKE
+ pthread__smt_wake();
+#endif
}
pthread_mutex_unlock(interlock);
=20
--=_doTlMKKkYbLBtme6fab/51tYZGWhbizH--
From: Vasily Dybala <Vasily.Dybala@kaspersky.com>
To: "gnats-bugs@netbsd.org" <gnats-bugs@netbsd.org>,
"port-arm-maintainer@netbsd.org" <port-arm-maintainer@netbsd.org>,
"gnats-admin@netbsd.org" <gnats-admin@netbsd.org>, "netbsd-bugs@netbsd.org"
<netbsd-bugs@netbsd.org>
Cc:
Subject: RE: PR/57437 CVS commit: src/lib/libpthread
Date: Tue, 30 May 2023 08:03:24 +0000
Hello,
I checked fix.
I confirm, issue is fixed. No more extra delays.
-----Original Message-----
From: Taylor R Campbell <riastradh@NetBSD.org>=20
Sent: Friday, May 26, 2023 13:30
To: port-arm-maintainer@netbsd.org; gnats-admin@netbsd.org; netbsd-bugs@net=
bsd.org; Vasily Dybala <Vasily.Dybala@kaspersky.com>
Subject: Re: PR/57437 CVS commit: src/lib/libpthread
> I will check patch during next week.
Great, thanks!
State-Changed-From-To: feedback->needs-pullups
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 05 Jun 2023 05:39:19 +0000
State-Changed-Why:
confirmed fixed, needs to get at least into -10
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57437 CVS commit: [netbsd-10] src/lib/libpthread
Date: Tue, 1 Aug 2023 16:21:07 +0000
Module Name: src
Committed By: martin
Date: Tue Aug 1 16:21:07 UTC 2023
Modified Files:
src/lib/libpthread [netbsd-10]: pthread_int.h pthread_spin.c
src/lib/libpthread/arch/aarch64 [netbsd-10]: pthread_md.h
src/lib/libpthread/arch/arm [netbsd-10]: pthread_md.h
src/lib/libpthread/arch/i386 [netbsd-10]: pthread_md.h
src/lib/libpthread/arch/x86_64 [netbsd-10]: pthread_md.h
Log Message:
Pull up following revision(s) (requested by riastradh in ticket #296):
lib/libpthread/arch/x86_64/pthread_md.h: revision 1.13
lib/libpthread/pthread_int.h: revision 1.110
lib/libpthread/pthread_int.h: revision 1.111
lib/libpthread/arch/i386/pthread_md.h: revision 1.21
lib/libpthread/arch/arm/pthread_md.h: revision 1.12
lib/libpthread/arch/arm/pthread_md.h: revision 1.13
lib/libpthread/pthread_spin.c: revision 1.11
lib/libpthread/arch/aarch64/pthread_md.h: revision 1.2
libpthread: Use __nothing, not /* nothing */, for empty macros.
No functional change intended -- just safer to do it this way in case
the macros are used in if branches or comma expressions.
PR port-arm/57437 (pthread__smt_pause/wake issue)
libpthread: New pthread__smt_wait to put CPU in low power for spin.
This is now distinct from pthread__smt_pause, which is for spin lock
backoff with no paired wakeup.
On Arm, there is a single-bit event register per CPU, and there are two
instructions to manage it:
- wfe, wait for event -- if event register is clear, enter low power
mode and wait until event register is set; then exit low power mode
and clear event register
- sev, signal event -- sets event register on all CPUs (other
circumstances like interrupts also set the event register and cause
wfe to wake)
These can be used to reduce the power consumption of spinning for a
lock, but only if they are actually paired -- if there's no sev, wfe
might hang indefinitely. Currently only pthread_spin(3) actually
pairs them; the other lock primitives (internal lock, mutex, rwlock)
do not -- they have spin lock backoff loops, but no corresponding
wakeup to cancel a wfe.
It may be worthwhile to teach the other lock primitives to pair
wfe/sev, but that requires some performance measurement to verify
it's actually worthwhile. So for now, we just make sure not to use
wfe when there's no sev, and keep everything else the same -- this
should fix severe performance degredation in libpthread on Arm
without hurting anything else.
No change in the generated code on amd64 and i386. No change in the
generated code for pthread_spin.c on arm and aarch64 -- changes only
the generated code for pthread_lock.c, pthread_mutex.c, and
pthread_rwlock.c, as intended.
PR port-arm/57437
To generate a diff of this commit:
cvs rdiff -u -r1.109 -r1.109.2.1 src/lib/libpthread/pthread_int.h
cvs rdiff -u -r1.10 -r1.10.2.1 src/lib/libpthread/pthread_spin.c
cvs rdiff -u -r1.1 -r1.1.36.1 src/lib/libpthread/arch/aarch64/pthread_md.h
cvs rdiff -u -r1.11 -r1.11.10.1 src/lib/libpthread/arch/arm/pthread_md.h
cvs rdiff -u -r1.20 -r1.20.42.1 src/lib/libpthread/arch/i386/pthread_md.h
cvs rdiff -u -r1.12 -r1.12.54.1 src/lib/libpthread/arch/x86_64/pthread_md.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: needs-pullups->pending-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 02 Aug 2023 13:32:59 +0000
State-Changed-Why:
pullup-10 #296, pullup-9 #1700, pullup-8 #something
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57437 CVS commit: [netbsd-9] src/lib/libpthread
Date: Fri, 4 Aug 2023 13:04:04 +0000
Module Name: src
Committed By: martin
Date: Fri Aug 4 13:04:04 UTC 2023
Modified Files:
src/lib/libpthread [netbsd-9]: pthread_int.h pthread_spin.c
src/lib/libpthread/arch/aarch64 [netbsd-9]: pthread_md.h
src/lib/libpthread/arch/arm [netbsd-9]: pthread_md.h
src/lib/libpthread/arch/i386 [netbsd-9]: pthread_md.h
src/lib/libpthread/arch/x86_64 [netbsd-9]: pthread_md.h
Log Message:
Pull up following revision(s) (requested by riastradh in ticket #1700):
lib/libpthread/arch/x86_64/pthread_md.h: revision 1.13
lib/libpthread/pthread_int.h: revision 1.110
lib/libpthread/pthread_int.h: revision 1.111
lib/libpthread/arch/i386/pthread_md.h: revision 1.21
lib/libpthread/arch/arm/pthread_md.h: revision 1.12
lib/libpthread/arch/arm/pthread_md.h: revision 1.13
lib/libpthread/pthread_spin.c: revision 1.11
lib/libpthread/arch/aarch64/pthread_md.h: revision 1.2
libpthread: Use __nothing, not /* nothing */, for empty macros.
No functional change intended -- just safer to do it this way in case
the macros are used in if branches or comma expressions.
PR port-arm/57437 (pthread__smt_pause/wake issue)
libpthread: New pthread__smt_wait to put CPU in low power for spin.
This is now distinct from pthread__smt_pause, which is for spin lock
backoff with no paired wakeup.
On Arm, there is a single-bit event register per CPU, and there are two
instructions to manage it:
- wfe, wait for event -- if event register is clear, enter low power
mode and wait until event register is set; then exit low power mode
and clear event register
- sev, signal event -- sets event register on all CPUs (other
circumstances like interrupts also set the event register and cause
wfe to wake)
These can be used to reduce the power consumption of spinning for a
lock, but only if they are actually paired -- if there's no sev, wfe
might hang indefinitely. Currently only pthread_spin(3) actually
pairs them; the other lock primitives (internal lock, mutex, rwlock)
do not -- they have spin lock backoff loops, but no corresponding
wakeup to cancel a wfe.
It may be worthwhile to teach the other lock primitives to pair
wfe/sev, but that requires some performance measurement to verify
it's actually worthwhile. So for now, we just make sure not to use
wfe when there's no sev, and keep everything else the same -- this
should fix severe performance degredation in libpthread on Arm
without hurting anything else.
No change in the generated code on amd64 and i386. No change in the
generated code for pthread_spin.c on arm and aarch64 -- changes only
the generated code for pthread_lock.c, pthread_mutex.c, and
pthread_rwlock.c, as intended.
PR port-arm/57437
To generate a diff of this commit:
cvs rdiff -u -r1.95.2.2 -r1.95.2.3 src/lib/libpthread/pthread_int.h
cvs rdiff -u -r1.6 -r1.6.34.1 src/lib/libpthread/pthread_spin.c
cvs rdiff -u -r1.1 -r1.1.28.1 src/lib/libpthread/arch/aarch64/pthread_md.h
cvs rdiff -u -r1.11 -r1.11.2.1 src/lib/libpthread/arch/arm/pthread_md.h
cvs rdiff -u -r1.20 -r1.20.34.1 src/lib/libpthread/arch/i386/pthread_md.h
cvs rdiff -u -r1.12 -r1.12.46.1 src/lib/libpthread/arch/x86_64/pthread_md.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57437 CVS commit: [netbsd-8] src/lib/libpthread
Date: Fri, 4 Aug 2023 13:06:59 +0000
Module Name: src
Committed By: martin
Date: Fri Aug 4 13:06:59 UTC 2023
Modified Files:
src/lib/libpthread [netbsd-8]: pthread_int.h pthread_spin.c
src/lib/libpthread/arch/aarch64 [netbsd-8]: pthread_md.h
src/lib/libpthread/arch/arm [netbsd-8]: pthread_md.h
src/lib/libpthread/arch/i386 [netbsd-8]: pthread_md.h
src/lib/libpthread/arch/x86_64 [netbsd-8]: pthread_md.h
Log Message:
Pull up following revision(s) (requested by riastradh in ticket #1878):
lib/libpthread/arch/x86_64/pthread_md.h: revision 1.13
lib/libpthread/pthread_int.h: revision 1.110
lib/libpthread/pthread_int.h: revision 1.111
lib/libpthread/arch/i386/pthread_md.h: revision 1.21
lib/libpthread/arch/arm/pthread_md.h: revision 1.12
lib/libpthread/arch/arm/pthread_md.h: revision 1.13
lib/libpthread/pthread_spin.c: revision 1.11
lib/libpthread/arch/aarch64/pthread_md.h: revision 1.2
libpthread: Use __nothing, not /* nothing */, for empty macros.
No functional change intended -- just safer to do it this way in case
the macros are used in if branches or comma expressions.
PR port-arm/57437 (pthread__smt_pause/wake issue)
libpthread: New pthread__smt_wait to put CPU in low power for spin.
This is now distinct from pthread__smt_pause, which is for spin lock
backoff with no paired wakeup.
On Arm, there is a single-bit event register per CPU, and there are two
instructions to manage it:
- wfe, wait for event -- if event register is clear, enter low power
mode and wait until event register is set; then exit low power mode
and clear event register
- sev, signal event -- sets event register on all CPUs (other
circumstances like interrupts also set the event register and cause
wfe to wake)
These can be used to reduce the power consumption of spinning for a
lock, but only if they are actually paired -- if there's no sev, wfe
might hang indefinitely. Currently only pthread_spin(3) actually
pairs them; the other lock primitives (internal lock, mutex, rwlock)
do not -- they have spin lock backoff loops, but no corresponding
wakeup to cancel a wfe.
It may be worthwhile to teach the other lock primitives to pair
wfe/sev, but that requires some performance measurement to verify
it's actually worthwhile. So for now, we just make sure not to use
wfe when there's no sev, and keep everything else the same -- this
should fix severe performance degredation in libpthread on Arm
without hurting anything else.
No change in the generated code on amd64 and i386. No change in the
generated code for pthread_spin.c on arm and aarch64 -- changes only
the generated code for pthread_lock.c, pthread_mutex.c, and
pthread_rwlock.c, as intended.
PR port-arm/57437
To generate a diff of this commit:
cvs rdiff -u -r1.93.4.1 -r1.93.4.2 src/lib/libpthread/pthread_int.h
cvs rdiff -u -r1.6 -r1.6.24.1 src/lib/libpthread/pthread_spin.c
cvs rdiff -u -r1.1 -r1.1.18.1 src/lib/libpthread/arch/aarch64/pthread_md.h
cvs rdiff -u -r1.9 -r1.9.18.1 src/lib/libpthread/arch/arm/pthread_md.h
cvs rdiff -u -r1.20 -r1.20.24.1 src/lib/libpthread/arch/i386/pthread_md.h
cvs rdiff -u -r1.12 -r1.12.36.1 src/lib/libpthread/arch/x86_64/pthread_md.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: pending-pullups->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Sat, 05 Aug 2023 16:29:33 +0000
State-Changed-Why:
fixed in HEAD, pulled up to 10 and 9
(follow up if this matters in 8)
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.