NetBSD Problem Report #57437

From www@netbsd.org  Thu May 25 07:43:58 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id E2BB11A9238
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 25 May 2023 07:43:57 +0000 (UTC)
Message-Id: <20230525074326.B40A81A9239@mollari.NetBSD.org>
Date: Thu, 25 May 2023 07:43:26 +0000 (UTC)
From: Vasily.Dybala@kaspersky.com
Reply-To: Vasily.Dybala@kaspersky.com
To: gnats-bugs@NetBSD.org
Subject: arm/aarch64: pthread_mutex is not efficient on spinlock
X-Send-Pr-Version: www-1.0

>Number:         57437
>Category:       port-arm
>Synopsis:       arm/aarch64: pthread_mutex is not efficient on spinlock
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    port-arm-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu May 25 07:45:00 +0000 2023
>Closed-Date:    Sat Aug 05 16:29:33 +0000 2023
>Last-Modified:  Sat Aug 05 16:29:33 +0000 2023
>Originator:     Vasily Dybala
>Release:        all versions
>Organization:
OOO Kaspersky
>Environment:
>Description:
pthread_mutex_lock - tries to use some optimizations before goes sleep to kernel.

It just use short spinlock in hope that mutex was locked for a short time and another thread can capture it faster than switch to kernel and return back to userspace.

It is ok, such optimization is used by many implementations on different architectures and operating systems.

Lets dig into spinlock implementation from: 
http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libpthread/pthread_mutex.c?annotate=1.83&only_with_tag=MAIN

NOINLINE static void *
                    254: pthread__mutex_spin(pthread_mutex_t *ptm, pthread_t owner)
                    255: {
                    256:        pthread_t thread;
                    257:        unsigned int count, i;
                    258:
                    259:        for (count = 2;; owner = ptm->ptm_owner) {
                    260:                thread = (pthread_t)MUTEX_OWNER(owner);
                    261:                if (thread == NULL)
                    262:                        break;
1.66      ad        263:                if (thread->pt_lwpctl->lc_curcpu == LWPCTL_CPU_NONE)
1.44      ad        264:                        break;
1.83    ! riastrad  265:                if (count < 128)
1.44      ad        266:                        count += count;
                    267:                for (i = count; i != 0; i--)
                    268:                        pthread__mutex_pause();
                    269:        }
1.2       thorpej   270:
1.44      ad        271:        return owner;
                    272: }

look at lines 265-266 and loop at line 267-268.

this means that if we unable to capture mutex, we do loop over mutex_pause and if we are unable for a long time, we will do loop for more and more iterations (2, 4, 8, 16, ...)

On the most architectures is is ok, because we "sleep" on NOP instruction, so we wait for some cpu ticks and try to check mutex state again.

But on arm/aarch64 architectures it works in completely different way:
pthread__mutex_pause() is expanded to WFE instruction
http://cvsweb.netbsd.org/bsdweb.cgi/src/lib/libpthread/arch/aarch64/pthread_md.h?annotate=1.1&only_with_tag=MAIN

which have different meaning. It will go sleep cpu core till somebody send SEV instruction.

Let's look at loop again:

                    267:                for (i = count; i != 0; i--)
                    268:                        pthread__mutex_pause();

so we will sleep at WFE instruction 2,4,8,16 times, and wait next mutex status check after somebody 2,4,8 send SEV instruction (2,4,8 times send mutex__smt_wake().

It is completely wrong. So we need to check mutex status after each smt_wake() call. 

Another issue is WFE can "sleep" for infinity time, it broke behavior when we need to capture mutex during some timeout. 

>How-To-Repeat:
1) create main thread with locking some job:

pthread_mutex_lock()

for (int i = 1; i < 100000; i++)
  do NOP instruction without sleep in kernel

pthread_mutex_unlock()


2) launch other threads with the same implementation.
pthread_mutex_lock()

for (int i = 1; i < 100000; i++)
  do NOP instruction without sleep in kernel

pthread_mutex_unlock()


3) check that all of threads are in userspace and nobody will sleep in kernel
>Fix:
My suggestion is replace pthread__smt_pause/wake() for arm/aarch64 to NOP instruction.

>Release-Note:

>Audit-Trail:
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57437 CVS commit: src/lib/libpthread
Date: Thu, 25 May 2023 14:29:45 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Thu May 25 14:29:45 UTC 2023

 Modified Files:
 	src/lib/libpthread: pthread_int.h
 	src/lib/libpthread/arch/arm: pthread_md.h

 Log Message:
 libpthread: Use __nothing, not /* nothing */, for empty macros.

 No functional change intended -- just safer to do it this way in case
 the macros are used in if branches or comma expressions.

 PR port-arm/57437 (pthread__smt_pause/wake issue)

 XXX pullup-10


 To generate a diff of this commit:
 cvs rdiff -u -r1.109 -r1.110 src/lib/libpthread/pthread_int.h
 cvs rdiff -u -r1.11 -r1.12 src/lib/libpthread/arch/arm/pthread_md.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Thu, 25 May 2023 14:30:58 +0000
State-Changed-Why:
Candidate fix committed -- can you try it now?


From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57437 CVS commit: src/lib/libpthread
Date: Thu, 25 May 2023 14:30:03 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Thu May 25 14:30:03 UTC 2023

 Modified Files:
 	src/lib/libpthread: pthread_int.h pthread_spin.c
 	src/lib/libpthread/arch/aarch64: pthread_md.h
 	src/lib/libpthread/arch/arm: pthread_md.h
 	src/lib/libpthread/arch/i386: pthread_md.h
 	src/lib/libpthread/arch/x86_64: pthread_md.h

 Log Message:
 libpthread: New pthread__smt_wait to put CPU in low power for spin.

 This is now distinct from pthread__smt_pause, which is for spin lock
 backoff with no paired wakeup.

 On Arm, there is a single-bit event register per CPU, and there are two
 instructions to manage it:

 - wfe, wait for event -- if event register is clear, enter low power
   mode and wait until event register is set; then exit low power mode
   and clear event register

 - sev, signal event -- sets event register on all CPUs (other
   circumstances like interrupts also set the event register and cause
   wfe to wake)

 These can be used to reduce the power consumption of spinning for a
 lock, but only if they are actually paired -- if there's no sev, wfe
 might hang indefinitely.  Currently only pthread_spin(3) actually
 pairs them; the other lock primitives (internal lock, mutex, rwlock)
 do not -- they have spin lock backoff loops, but no corresponding
 wakeup to cancel a wfe.

 It may be worthwhile to teach the other lock primitives to pair
 wfe/sev, but that requires some performance measurement to verify
 it's actually worthwhile.  So for now, we just make sure not to use
 wfe when there's no sev, and keep everything else the same -- this
 should fix severe performance degredation in libpthread on Arm
 without hurting anything else.

 No change in the generated code on amd64 and i386.  No change in the
 generated code for pthread_spin.c on arm and aarch64 -- changes only
 the generated code for pthread_lock.c, pthread_mutex.c, and
 pthread_rwlock.c, as intended.

 PR port-arm/57437

 XXX pullup-10


 To generate a diff of this commit:
 cvs rdiff -u -r1.110 -r1.111 src/lib/libpthread/pthread_int.h
 cvs rdiff -u -r1.10 -r1.11 src/lib/libpthread/pthread_spin.c
 cvs rdiff -u -r1.1 -r1.2 src/lib/libpthread/arch/aarch64/pthread_md.h
 cvs rdiff -u -r1.12 -r1.13 src/lib/libpthread/arch/arm/pthread_md.h
 cvs rdiff -u -r1.20 -r1.21 src/lib/libpthread/arch/i386/pthread_md.h
 cvs rdiff -u -r1.12 -r1.13 src/lib/libpthread/arch/x86_64/pthread_md.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Vasily Dybala <Vasily.Dybala@kaspersky.com>
To: "gnats-bugs@netbsd.org" <gnats-bugs@netbsd.org>,
	"port-arm-maintainer@netbsd.org" <port-arm-maintainer@netbsd.org>,
	"gnats-admin@netbsd.org" <gnats-admin@netbsd.org>, "netbsd-bugs@netbsd.org"
	<netbsd-bugs@netbsd.org>
Cc: 
Subject: RE: PR/57437 CVS commit: src/lib/libpthread
Date: Fri, 26 May 2023 08:03:18 +0000

 Hello,
 Thanks for quick fix.
 Fix looks ok, but pthread__smt_wake() call from  pthread_mutex_unlock() wil=
 l call SEV  instruction which does not have paired WFE in pthread_mutex_loc=
 k().

 I will check patch during next week.


 -----Original Message-----
 From: Taylor R Campbell <riastradh@netbsd.org>=20
 Sent: Thursday, May 25, 2023 17:35
 To: port-arm-maintainer@netbsd.org; gnats-admin@netbsd.org; netbsd-bugs@net=
 bsd.org; Vasily Dybala <Vasily.Dybala@kaspersky.com>
 Subject: PR/57437 CVS commit: src/lib/libpthread

 Caution: This is an external email. Be cautious while opening links or atta=
 chments.



 The following reply was made to PR port-arm/57437; it has been noted by GNA=
 TS.

 From: "Taylor R Campbell" <riastradh@netbsd.org>
 To: gnats-bugs@gnats.NetBSD.org
 Cc:
 Subject: PR/57437 CVS commit: src/lib/libpthread
 Date: Thu, 25 May 2023 14:30:03 +0000

  Module Name:   src
  Committed By:  riastradh
  Date:          Thu May 25 14:30:03 UTC 2023

  Modified Files:
         src/lib/libpthread: pthread_int.h pthread_spin.c
         src/lib/libpthread/arch/aarch64: pthread_md.h
         src/lib/libpthread/arch/arm: pthread_md.h
         src/lib/libpthread/arch/i386: pthread_md.h
         src/lib/libpthread/arch/x86_64: pthread_md.h

  Log Message:
  libpthread: New pthread__smt_wait to put CPU in low power for spin.

  This is now distinct from pthread__smt_pause, which is for spin lock  back=
 off with no paired wakeup.

  On Arm, there is a single-bit event register per CPU, and there are two  i=
 nstructions to manage it:

  - wfe, wait for event -- if event register is clear, enter low power
    mode and wait until event register is set; then exit low power mode
    and clear event register

  - sev, signal event -- sets event register on all CPUs (other
    circumstances like interrupts also set the event register and cause
    wfe to wake)

  These can be used to reduce the power consumption of spinning for a  lock,=
  but only if they are actually paired -- if there's no sev, wfe  might hang=
  indefinitely.  Currently only pthread_spin(3) actually  pairs them; the ot=
 her lock primitives (internal lock, mutex, rwlock)  do not -- they have spi=
 n lock backoff loops, but no corresponding  wakeup to cancel a wfe.

  It may be worthwhile to teach the other lock primitives to pair  wfe/sev, =
 but that requires some performance measurement to verify  it's actually wor=
 thwhile.  So for now, we just make sure not to use  wfe when there's no sev=
 , and keep everything else the same -- this  should fix severe performance =
 degredation in libpthread on Arm  without hurting anything else.

  No change in the generated code on amd64 and i386.  No change in the  gene=
 rated code for pthread_spin.c on arm and aarch64 -- changes only  the gener=
 ated code for pthread_lock.c, pthread_mutex.c, and  pthread_rwlock.c, as in=
 tended.

  PR port-arm/57437

  XXX pullup-10


  To generate a diff of this commit:
  cvs rdiff -u -r1.110 -r1.111 src/lib/libpthread/pthread_int.h  cvs rdiff -=
 u -r1.10 -r1.11 src/lib/libpthread/pthread_spin.c  cvs rdiff -u -r1.1 -r1.2=
  src/lib/libpthread/arch/aarch64/pthread_md.h
  cvs rdiff -u -r1.12 -r1.13 src/lib/libpthread/arch/arm/pthread_md.h
  cvs rdiff -u -r1.20 -r1.21 src/lib/libpthread/arch/i386/pthread_md.h
  cvs rdiff -u -r1.12 -r1.13 src/lib/libpthread/arch/x86_64/pthread_md.h

  Please note that diffs are not public domain; they are subject to the  cop=
 yright notices on the relevant files.

From: Taylor R Campbell <riastradh@NetBSD.org>
To: Vasily Dybala <Vasily.Dybala@kaspersky.com>
Cc: gnats-bugs@netbsd.org,
	port-arm-maintainer@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: PR/57437 CVS commit: src/lib/libpthread
Date: Fri, 26 May 2023 10:29:54 +0000

 > Date: Fri, 26 May 2023 08:03:18 +0000
 > From: Vasily Dybala <Vasily.Dybala@kaspersky.com>
 > 
 > Fix looks ok, but pthread__smt_wake() call from
 > pthread_mutex_unlock() will call SEV instruction which does not have
 > paired WFE in pthread_mutex_lock().

 We could add a call to pthread__smt_wake in pthread_mutex_unlock to
 match a pthread__smt_wait in pthread__mutex_pause, but that's not
 enough on its own, because currently pthread__mutex_spin calls
 pthread__mutex_pause in a loop and might just WFE again without
 realizing it can make progress after it has been woken by SEV.

 So we would need to restructure pthread__mutex_spin so that it calls
 pthread__smt_wait exactly once after it examines the owner and
 determines the owner is still running on a CPU.

 That may be worthwhile, but it's a little more involved to implement,
 because the surrounding logic has to be conditionalized on which style
 of spin loop.  So I picked the less-risky and less-involved option to
 commit now.  We can try the WFE/SEV approach for pthread_mutex (and
 rwlock and, if it's still relevant, simple lock) later.

 > I will check patch during next week.

 Great, thanks!

From: Taylor R Campbell <riastradh@NetBSD.org>
To: Vasily Dybala <Vasily.Dybala@kaspersky.com>
Cc: gnats-bugs@netbsd.org, port-arm-maintainer@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: PR/57437 CVS commit: src/lib/libpthread
Date: Fri, 26 May 2023 11:07:54 +0000

 This is a multi-part message in MIME format.
 --=_doTlMKKkYbLBtme6fab/51tYZGWhbizH

 > Date: Fri, 26 May 2023 10:29:54 +0000
 > From: Taylor R Campbell <riastradh@NetBSD.org>
 > 
 > We could add a call to pthread__smt_wake in pthread_mutex_unlock to
 > match a pthread__smt_wait in pthread__mutex_pause, but that's not
 > enough on its own, because currently pthread__mutex_spin calls
 > pthread__mutex_pause in a loop and might just WFE again without
 > realizing it can make progress after it has been woken by SEV.
 > 
 > So we would need to restructure pthread__mutex_spin so that it calls
 > pthread__smt_wait exactly once after it examines the owner and
 > determines the owner is still running on a CPU.

 The attached patch does just this for Arm, and leaves non-Arm alone.
 It can be tested independently for pthread_mutex and pthread_rwlock by
 defining or undefining the appropriate macro in pthread_md.h.  I don't
 have the time to do performance measurements right now, but if you
 feel so inclined, we would certainly consider it!

 --=_doTlMKKkYbLBtme6fab/51tYZGWhbizH
 Content-Type: text/plain; charset="ISO-8859-1"; name="pr57437-mutexrwlockwaitwake"
 Content-Transfer-Encoding: quoted-printable
 Content-Disposition: attachment; filename="pr57437-mutexrwlockwaitwake.patch"

 From 81aff4df7bf94b3fc8a88db755de2451711ed040 Mon Sep 17 00:00:00 2001
 From: Taylor R Campbell <riastradh@NetBSD.org>
 Date: Fri, 26 May 2023 10:56:00 +0000
 Subject: [PATCH] WIP: Use WFE/SEV judiciously in pthread_mutex and
  pthread_rwlock.

 Can be turned on or off by defining or undefining the appropriate
 macro in pthread_md.h:

 PTHREAD__MUTEX_SPIN_WAIT_WAKE
 PTHREAD__RWLOCK_SPIN_WAIT_WAKE

 PR port-arm/57437

 XXX pullup-10
 ---
  lib/libpthread/arch/aarch64/pthread_md.h |  3 +++
  lib/libpthread/arch/arm/pthread_md.h     |  6 +++++-
  lib/libpthread/pthread_mutex.c           | 17 +++++++++++++++--
  lib/libpthread/pthread_rwlock.c          | 22 ++++++++++++++++++++++
  4 files changed, 45 insertions(+), 3 deletions(-)

 diff --git a/lib/libpthread/arch/aarch64/pthread_md.h b/lib/libpthread/arch=
 /aarch64/pthread_md.h
 index 7b79e6d347ea..c8c0b4265211 100644
 --- a/lib/libpthread/arch/aarch64/pthread_md.h
 +++ b/lib/libpthread/arch/aarch64/pthread_md.h
 @@ -42,6 +42,9 @@ pthread__sp(void)
  	return ret;
  }
 =20
 +#define	PTHREAD__MUTEX_SPIN_WAIT_WAKE
 +#define	PTHREAD__RWLOCK_SPIN_WAIT_WAKE
 +
  #define pthread__smt_wait()	__asm __volatile("wfe") /* wfe */
  #define pthread__smt_wake()	__asm __volatile("sev") /* sev */
 =20
 diff --git a/lib/libpthread/arch/arm/pthread_md.h b/lib/libpthread/arch/arm=
 /pthread_md.h
 index 1ed6a4914cae..cc04ed4a788d 100644
 --- a/lib/libpthread/arch/arm/pthread_md.h
 +++ b/lib/libpthread/arch/arm/pthread_md.h
 @@ -50,12 +50,16 @@ pthread__sp(void)
  }
 =20
  #if defined(__thumb__) && defined(_ARM_ARCH_6)
 +#define	PTHREAD__MUTEX_SPIN_WAIT_WAKE
 +#define	PTHREAD__RWLOCK_SPIN_WAIT_WAKE
  #define pthread__smt_wait()	__asm __volatile(".inst.n 0xbf20") /* wfe */
  #define pthread__smt_wake()	__asm __volatile(".inst.n 0xbf40") /* sev */
  #elif !defined(__thumb__)
 +#define	PTHREAD__MUTEX_SPIN_WAIT_WAKE
 +#define	PTHREAD__RWLOCK_SPIN_WAIT_WAKE
  #define pthread__smt_wait()	__asm __volatile(".inst 0xe320f002") /* wfe */
  #define pthread__smt_wake()	__asm __volatile(".inst 0xe320f004") /* sev */
 -#else
 +#else  /* !defined(__thumb__) && !defined(_ARM_ARCH_6) */
  #define pthread__smt_wait()	__nothing
  #define pthread__smt_wake()	__nothing
  #endif
 diff --git a/lib/libpthread/pthread_mutex.c b/lib/libpthread/pthread_mutex.c
 index 00005c9eac6b..77f402ff1d12 100644
 --- a/lib/libpthread/pthread_mutex.c
 +++ b/lib/libpthread/pthread_mutex.c
 @@ -103,7 +103,9 @@ struct waiter {
  static void	pthread__mutex_wakeup(pthread_t, struct pthread__waiter *);
  static int	pthread__mutex_lock_slow(pthread_mutex_t *,
      const struct timespec *);
 +#ifndef PTHREAD__MUTEX_SPIN_WAIT_WAKE
  static void	pthread__mutex_pause(void);
 +#endif
 =20
  int		_pthread_mutex_held_np(pthread_mutex_t *);
  pthread_t	_pthread_mutex_owner_np(pthread_mutex_t *);
 @@ -238,6 +240,7 @@ pthread_mutex_timedlock(pthread_mutex_t* ptm, const str=
 uct timespec *ts)
  	return pthread__mutex_lock_slow(ptm, ts);
  }
 =20
 +#ifndef PTHREAD__MUTEX_SPIN_WAIT_WAKE
  /* We want function call overhead. */
  NOINLINE static void
  pthread__mutex_pause(void)
 @@ -245,6 +248,7 @@ pthread__mutex_pause(void)
 =20
  	pthread__smt_pause();
  }
 +#endif
 =20
  /*
   * Spin while the holder is running.  'lwpctl' gives us the true
 @@ -254,18 +258,24 @@ NOINLINE static void *
  pthread__mutex_spin(pthread_mutex_t *ptm, pthread_t owner)
  {
  	pthread_t thread;
 -	unsigned int count, i;
 +#ifndef PTHREAD__MUTEX_SPIN_WAIT_WAKE
 +	unsigned int count =3D 2, i;
 +#endif
 =20
 -	for (count =3D 2;; owner =3D ptm->ptm_owner) {
 +	for (;; owner =3D ptm->ptm_owner) {
  		thread =3D (pthread_t)MUTEX_OWNER(owner);
  		if (thread =3D=3D NULL)
  			break;
  		if (thread->pt_lwpctl->lc_curcpu =3D=3D LWPCTL_CPU_NONE)
  			break;
 +#ifdef PTHREAD__MUTEX_SPIN_WAIT_WAKE
 +		pthread__smt_wait();
 +#else
  		if (count < 128)
  			count +=3D count;
  		for (i =3D count; i !=3D 0; i--)
  			pthread__mutex_pause();
 +#endif
  	}
 =20
  	return owner;
 @@ -501,6 +511,9 @@ pthread_mutex_unlock(pthread_mutex_t *ptm)
  	 */
  #ifndef PTHREAD__ATOMIC_IS_MEMBAR
  	membar_enter();
 +#endif
 +#ifdef PTHREAD__MUTEX_SPIN_WAIT_WAKE
 +	pthread__smt_wake();
  #endif
  	if (MUTEX_OWNER(newval) =3D=3D 0 && ptm->ptm_waiters !=3D NULL) {
  		pthread__mutex_wakeup(self,
 diff --git a/lib/libpthread/pthread_rwlock.c b/lib/libpthread/pthread_rwloc=
 k.c
 index 90e48a14f3ab..a5c112bbdf5c 100644
 --- a/lib/libpthread/pthread_rwlock.c
 +++ b/lib/libpthread/pthread_rwlock.c
 @@ -127,6 +127,7 @@ pthread_rwlock_destroy(pthread_rwlock_t *ptr)
  	return 0;
  }
 =20
 +#ifndef PTHREAD__RWLOCK_SPIN_WAIT_WAKE
  /* We want function call overhead. */
  NOINLINE static void
  pthread__rwlock_pause(void)
 @@ -134,12 +135,15 @@ pthread__rwlock_pause(void)
 =20
  	pthread__smt_pause();
  }
 +#endif
 =20
  NOINLINE static int
  pthread__rwlock_spin(uintptr_t owner)
  {
  	pthread_t thread;
 +#ifndef PTHREAD__RWLOCK_SPIN_WAIT_WAKE
  	unsigned int i;
 +#endif
 =20
  	if ((owner & ~RW_THREAD) !=3D RW_WRITE_LOCKED)
  		return 0;
 @@ -149,8 +153,12 @@ pthread__rwlock_spin(uintptr_t owner)
  	    thread->pt_lwpctl->lc_curcpu =3D=3D LWPCTL_CPU_NONE)
  		return 0;
 =20
 +#ifdef PTHREAD__RWLOCK_SPIN_WAIT_WAKE
 +	pthread__smt_wait();
 +#else
  	for (i =3D 128; i !=3D 0; i--)
  		pthread__rwlock_pause();
 +#endif
  	return 1;
  }
 =20
 @@ -487,6 +495,9 @@ pthread_rwlock_unlock(pthread_rwlock_t *ptr)
  			next =3D rw_cas(ptr, owner, new);
  			if (owner =3D=3D next) {
  				/* Released! */
 +#ifdef PTHREAD__RWLOCK_SPIN_WAIT_WAKE
 +				pthread__smt_wake();
 +#endif
  				return 0;
  			}
  			continue;
 @@ -559,6 +570,17 @@ pthread_rwlock_unlock(pthread_rwlock_t *ptr)
  			ptr->ptr_nreaders =3D 0;
  			pthread__unpark_all(&ptr->ptr_rblocked, self,
  			    interlock);
 +
 +			/*
 +			 * If we're doing CPU wait/wake instructions
 +			 * for spin loops, there may also be other CPUs
 +			 * spinning for a read lock -- wake them
 +			 * promptly too; no reason to make them hang
 +			 * when they could be making progress.
 +			 */
 +#ifdef PTHREAD__RWLOCK_SPIN_WAIT_WAKE
 +			pthread__smt_wake();
 +#endif
  		}
  		pthread_mutex_unlock(interlock);
 =20

 --=_doTlMKKkYbLBtme6fab/51tYZGWhbizH--

From: Vasily Dybala <Vasily.Dybala@kaspersky.com>
To: "gnats-bugs@netbsd.org" <gnats-bugs@netbsd.org>,
	"port-arm-maintainer@netbsd.org" <port-arm-maintainer@netbsd.org>,
	"gnats-admin@netbsd.org" <gnats-admin@netbsd.org>, "netbsd-bugs@netbsd.org"
	<netbsd-bugs@netbsd.org>
Cc: 
Subject: RE: PR/57437 CVS commit: src/lib/libpthread
Date: Tue, 30 May 2023 08:03:24 +0000

 Hello,
 I checked fix.

 I confirm, issue is fixed. No more extra delays.

 -----Original Message-----
 From: Taylor R Campbell <riastradh@NetBSD.org>=20
 Sent: Friday, May 26, 2023 13:30
 To: port-arm-maintainer@netbsd.org; gnats-admin@netbsd.org; netbsd-bugs@net=
 bsd.org; Vasily Dybala <Vasily.Dybala@kaspersky.com>
 Subject: Re: PR/57437 CVS commit: src/lib/libpthread


  > I will check patch during next week.

  Great, thanks!

State-Changed-From-To: feedback->needs-pullups
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 05 Jun 2023 05:39:19 +0000
State-Changed-Why:
confirmed fixed, needs to get at least into -10


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57437 CVS commit: [netbsd-10] src/lib/libpthread
Date: Tue, 1 Aug 2023 16:21:07 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Tue Aug  1 16:21:07 UTC 2023

 Modified Files:
 	src/lib/libpthread [netbsd-10]: pthread_int.h pthread_spin.c
 	src/lib/libpthread/arch/aarch64 [netbsd-10]: pthread_md.h
 	src/lib/libpthread/arch/arm [netbsd-10]: pthread_md.h
 	src/lib/libpthread/arch/i386 [netbsd-10]: pthread_md.h
 	src/lib/libpthread/arch/x86_64 [netbsd-10]: pthread_md.h

 Log Message:
 Pull up following revision(s) (requested by riastradh in ticket #296):

 	lib/libpthread/arch/x86_64/pthread_md.h: revision 1.13
 	lib/libpthread/pthread_int.h: revision 1.110
 	lib/libpthread/pthread_int.h: revision 1.111
 	lib/libpthread/arch/i386/pthread_md.h: revision 1.21
 	lib/libpthread/arch/arm/pthread_md.h: revision 1.12
 	lib/libpthread/arch/arm/pthread_md.h: revision 1.13
 	lib/libpthread/pthread_spin.c: revision 1.11
 	lib/libpthread/arch/aarch64/pthread_md.h: revision 1.2

 libpthread: Use __nothing, not /* nothing */, for empty macros.

 No functional change intended -- just safer to do it this way in case
 the macros are used in if branches or comma expressions.

 PR port-arm/57437 (pthread__smt_pause/wake issue)

 libpthread: New pthread__smt_wait to put CPU in low power for spin.

 This is now distinct from pthread__smt_pause, which is for spin lock
 backoff with no paired wakeup.

 On Arm, there is a single-bit event register per CPU, and there are two
 instructions to manage it:
 - wfe, wait for event -- if event register is clear, enter low power
   mode and wait until event register is set; then exit low power mode
   and clear event register
 - sev, signal event -- sets event register on all CPUs (other
   circumstances like interrupts also set the event register and cause
   wfe to wake)

 These can be used to reduce the power consumption of spinning for a
 lock, but only if they are actually paired -- if there's no sev, wfe
 might hang indefinitely.  Currently only pthread_spin(3) actually
 pairs them; the other lock primitives (internal lock, mutex, rwlock)
 do not -- they have spin lock backoff loops, but no corresponding
 wakeup to cancel a wfe.

 It may be worthwhile to teach the other lock primitives to pair
 wfe/sev, but that requires some performance measurement to verify
 it's actually worthwhile.  So for now, we just make sure not to use
 wfe when there's no sev, and keep everything else the same -- this
 should fix severe performance degredation in libpthread on Arm
 without hurting anything else.

 No change in the generated code on amd64 and i386.  No change in the
 generated code for pthread_spin.c on arm and aarch64 -- changes only
 the generated code for pthread_lock.c, pthread_mutex.c, and
 pthread_rwlock.c, as intended.

 PR port-arm/57437


 To generate a diff of this commit:
 cvs rdiff -u -r1.109 -r1.109.2.1 src/lib/libpthread/pthread_int.h
 cvs rdiff -u -r1.10 -r1.10.2.1 src/lib/libpthread/pthread_spin.c
 cvs rdiff -u -r1.1 -r1.1.36.1 src/lib/libpthread/arch/aarch64/pthread_md.h
 cvs rdiff -u -r1.11 -r1.11.10.1 src/lib/libpthread/arch/arm/pthread_md.h
 cvs rdiff -u -r1.20 -r1.20.42.1 src/lib/libpthread/arch/i386/pthread_md.h
 cvs rdiff -u -r1.12 -r1.12.54.1 src/lib/libpthread/arch/x86_64/pthread_md.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: needs-pullups->pending-pullups
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 02 Aug 2023 13:32:59 +0000
State-Changed-Why:
pullup-10 #296, pullup-9 #1700, pullup-8 #something


From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57437 CVS commit: [netbsd-9] src/lib/libpthread
Date: Fri, 4 Aug 2023 13:04:04 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Fri Aug  4 13:04:04 UTC 2023

 Modified Files:
 	src/lib/libpthread [netbsd-9]: pthread_int.h pthread_spin.c
 	src/lib/libpthread/arch/aarch64 [netbsd-9]: pthread_md.h
 	src/lib/libpthread/arch/arm [netbsd-9]: pthread_md.h
 	src/lib/libpthread/arch/i386 [netbsd-9]: pthread_md.h
 	src/lib/libpthread/arch/x86_64 [netbsd-9]: pthread_md.h

 Log Message:
 Pull up following revision(s) (requested by riastradh in ticket #1700):

 	lib/libpthread/arch/x86_64/pthread_md.h: revision 1.13
 	lib/libpthread/pthread_int.h: revision 1.110
 	lib/libpthread/pthread_int.h: revision 1.111
 	lib/libpthread/arch/i386/pthread_md.h: revision 1.21
 	lib/libpthread/arch/arm/pthread_md.h: revision 1.12
 	lib/libpthread/arch/arm/pthread_md.h: revision 1.13
 	lib/libpthread/pthread_spin.c: revision 1.11
 	lib/libpthread/arch/aarch64/pthread_md.h: revision 1.2

 libpthread: Use __nothing, not /* nothing */, for empty macros.

 No functional change intended -- just safer to do it this way in case
 the macros are used in if branches or comma expressions.
 PR port-arm/57437 (pthread__smt_pause/wake issue)

 libpthread: New pthread__smt_wait to put CPU in low power for spin.

 This is now distinct from pthread__smt_pause, which is for spin lock
 backoff with no paired wakeup.

 On Arm, there is a single-bit event register per CPU, and there are two
 instructions to manage it:
 - wfe, wait for event -- if event register is clear, enter low power
   mode and wait until event register is set; then exit low power mode
   and clear event register
 - sev, signal event -- sets event register on all CPUs (other
   circumstances like interrupts also set the event register and cause
   wfe to wake)

 These can be used to reduce the power consumption of spinning for a
 lock, but only if they are actually paired -- if there's no sev, wfe
 might hang indefinitely.  Currently only pthread_spin(3) actually
 pairs them; the other lock primitives (internal lock, mutex, rwlock)
 do not -- they have spin lock backoff loops, but no corresponding
 wakeup to cancel a wfe.

 It may be worthwhile to teach the other lock primitives to pair
 wfe/sev, but that requires some performance measurement to verify
 it's actually worthwhile.  So for now, we just make sure not to use
 wfe when there's no sev, and keep everything else the same -- this
 should fix severe performance degredation in libpthread on Arm
 without hurting anything else.

 No change in the generated code on amd64 and i386.  No change in the
 generated code for pthread_spin.c on arm and aarch64 -- changes only
 the generated code for pthread_lock.c, pthread_mutex.c, and
 pthread_rwlock.c, as intended.
 PR port-arm/57437


 To generate a diff of this commit:
 cvs rdiff -u -r1.95.2.2 -r1.95.2.3 src/lib/libpthread/pthread_int.h
 cvs rdiff -u -r1.6 -r1.6.34.1 src/lib/libpthread/pthread_spin.c
 cvs rdiff -u -r1.1 -r1.1.28.1 src/lib/libpthread/arch/aarch64/pthread_md.h
 cvs rdiff -u -r1.11 -r1.11.2.1 src/lib/libpthread/arch/arm/pthread_md.h
 cvs rdiff -u -r1.20 -r1.20.34.1 src/lib/libpthread/arch/i386/pthread_md.h
 cvs rdiff -u -r1.12 -r1.12.46.1 src/lib/libpthread/arch/x86_64/pthread_md.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57437 CVS commit: [netbsd-8] src/lib/libpthread
Date: Fri, 4 Aug 2023 13:06:59 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Fri Aug  4 13:06:59 UTC 2023

 Modified Files:
 	src/lib/libpthread [netbsd-8]: pthread_int.h pthread_spin.c
 	src/lib/libpthread/arch/aarch64 [netbsd-8]: pthread_md.h
 	src/lib/libpthread/arch/arm [netbsd-8]: pthread_md.h
 	src/lib/libpthread/arch/i386 [netbsd-8]: pthread_md.h
 	src/lib/libpthread/arch/x86_64 [netbsd-8]: pthread_md.h

 Log Message:
 Pull up following revision(s) (requested by riastradh in ticket #1878):

 	lib/libpthread/arch/x86_64/pthread_md.h: revision 1.13
 	lib/libpthread/pthread_int.h: revision 1.110
 	lib/libpthread/pthread_int.h: revision 1.111
 	lib/libpthread/arch/i386/pthread_md.h: revision 1.21
 	lib/libpthread/arch/arm/pthread_md.h: revision 1.12
 	lib/libpthread/arch/arm/pthread_md.h: revision 1.13
 	lib/libpthread/pthread_spin.c: revision 1.11
 	lib/libpthread/arch/aarch64/pthread_md.h: revision 1.2

 libpthread: Use __nothing, not /* nothing */, for empty macros.

 No functional change intended -- just safer to do it this way in case
 the macros are used in if branches or comma expressions.
 PR port-arm/57437 (pthread__smt_pause/wake issue)

 libpthread: New pthread__smt_wait to put CPU in low power for spin.

 This is now distinct from pthread__smt_pause, which is for spin lock
 backoff with no paired wakeup.

 On Arm, there is a single-bit event register per CPU, and there are two
 instructions to manage it:
 - wfe, wait for event -- if event register is clear, enter low power
   mode and wait until event register is set; then exit low power mode
   and clear event register
 - sev, signal event -- sets event register on all CPUs (other
   circumstances like interrupts also set the event register and cause
   wfe to wake)

 These can be used to reduce the power consumption of spinning for a
 lock, but only if they are actually paired -- if there's no sev, wfe
 might hang indefinitely.  Currently only pthread_spin(3) actually
 pairs them; the other lock primitives (internal lock, mutex, rwlock)
 do not -- they have spin lock backoff loops, but no corresponding
 wakeup to cancel a wfe.

 It may be worthwhile to teach the other lock primitives to pair
 wfe/sev, but that requires some performance measurement to verify
 it's actually worthwhile.  So for now, we just make sure not to use
 wfe when there's no sev, and keep everything else the same -- this
 should fix severe performance degredation in libpthread on Arm
 without hurting anything else.

 No change in the generated code on amd64 and i386.  No change in the
 generated code for pthread_spin.c on arm and aarch64 -- changes only
 the generated code for pthread_lock.c, pthread_mutex.c, and
 pthread_rwlock.c, as intended.
 PR port-arm/57437


 To generate a diff of this commit:
 cvs rdiff -u -r1.93.4.1 -r1.93.4.2 src/lib/libpthread/pthread_int.h
 cvs rdiff -u -r1.6 -r1.6.24.1 src/lib/libpthread/pthread_spin.c
 cvs rdiff -u -r1.1 -r1.1.18.1 src/lib/libpthread/arch/aarch64/pthread_md.h
 cvs rdiff -u -r1.9 -r1.9.18.1 src/lib/libpthread/arch/arm/pthread_md.h
 cvs rdiff -u -r1.20 -r1.20.24.1 src/lib/libpthread/arch/i386/pthread_md.h
 cvs rdiff -u -r1.12 -r1.12.36.1 src/lib/libpthread/arch/x86_64/pthread_md.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Sat, 05 Aug 2023 16:29:33 +0000
State-Changed-Why:
fixed in HEAD, pulled up to 10 and 9
(follow up if this matters in 8)


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.