NetBSD Problem Report #49816
From yamt@NetBSD.org Mon Apr 6 09:16:32 2015
Return-Path: <yamt@NetBSD.org>
Received: by mollari.NetBSD.org (Postfix, from userid 1270)
id E68FBA654B; Mon, 6 Apr 2015 09:16:32 +0000 (UTC)
Message-Id: <20150406091632.E68FBA654B@mollari.NetBSD.org>
Date: Mon, 6 Apr 2015 09:16:32 +0000 (UTC)
From: yamt@NetBSD.org
Reply-To: yamt@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: rtld internal lock vs fork
X-Send-Pr-Version: 3.95
>Number: 49816
>Category: lib
>Synopsis: rtld internal lock vs fork
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: lib-bug-people
>State: feedback
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Apr 06 09:20:00 +0000 2015
>Closed-Date:
>Last-Modified: Mon Jul 04 16:17:42 +0000 2022
>Originator: YAMAMOTO Takashi
>Release: NetBSD current
>Organization:
>Environment:
>Description:
when a thread does fork(2), some other thread might hold _rtld_mutex.
in that case, the child process will likely deadlock soon because
of non-zero _rtld_mutex. i've observed the problem with open vswitch.
>How-To-Repeat:
configure OVS master with --enable-shared and "gmake -j32 check".
>Fix:
>Release-Note:
>Audit-Trail:
From: "YAMAMOTO Takashi" <yamt@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/49816 CVS commit: src/libexec/ld.elf_so
Date: Mon, 6 Apr 2015 09:34:15 +0000
Module Name: src
Committed By: yamt
Date: Mon Apr 6 09:34:15 UTC 2015
Modified Files:
src/libexec/ld.elf_so: rtld.c
Log Message:
Fix membars around rtld internal mutex.
This fixes the most of lockups i observed with Open vSwitch
on NetBSD/amd64. ("most of" because it still occasionally
locks up because of other problems. see PR/49816)
To generate a diff of this commit:
cvs rdiff -u -r1.176 -r1.177 src/libexec/ld.elf_so/rtld.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: lib/49816: rtld internal lock vs fork
Date: Mon, 6 Apr 2015 11:57:09 +0200
Out of curiosity: which async-signal-safe functions is the child calling
that involve ld.elf_so internal actions at that stage?
Martin
From: yamt@netbsd.org (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: lib-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
yamt@NetBSD.org
Subject: Re: lib/49816: rtld internal lock vs fork
Date: Mon, 6 Apr 2015 11:27:22 +0000 (UTC)
> Out of curiosity: which async-signal-safe functions is the child calling
> that involve ld.elf_so internal actions at that stage?
>
> Martin
it seems that what's stuck in the child process is
an ordinary _rtld_bind_start.
YAMAMOTO Takashi
From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: PR/49816 CVS commit: src/libexec/ld.elf_so
Date: Mon, 6 Apr 2015 13:33:30 +0200
On Mon, Apr 06, 2015 at 09:35:00AM +0000, YAMAMOTO Takashi wrote:
> Log Message:
> Fix membars around rtld internal mutex.
>
> This fixes the most of lockups i observed with Open vSwitch
> on NetBSD/amd64. ("most of" because it still occasionally
> locks up because of other problems. see PR/49816)
None of those should matter on amd64? CAS has an implicit total memory
barrier, so this seems to just add a lot of overhead for no reason.
Joerg
From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: lib/49816: rtld internal lock vs fork
Date: Mon, 6 Apr 2015 15:50:34 +0200
On Mon, Apr 06, 2015 at 09:20:00AM +0000, yamt@NetBSD.org wrote:
> >Description:
> when a thread does fork(2), some other thread might hold _rtld_mutex.
> in that case, the child process will likely deadlock soon because
> of non-zero _rtld_mutex. i've observed the problem with open vswitch.
Non-zero _rtld_mutex by itself should not be problem. The problem exists
if another thread requires the exclusive lock. I do plan to rewrite rtld
at some point to never require an exclusive lock for symbol look up, but
that's far from trivial. In the old world before locking, you would just
hit race conditions or not.
Joerg
From: yamt@netbsd.org (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: lib-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
yamt@NetBSD.org
Subject: Re: PR/49816 CVS commit: src/libexec/ld.elf_so
Date: Mon, 6 Apr 2015 16:09:29 +0000 (UTC)
> The following reply was made to PR lib/49816; it has been noted by GNATS.
>
> From: Joerg Sonnenberger <joerg@britannica.bec.de>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: PR/49816 CVS commit: src/libexec/ld.elf_so
> Date: Mon, 6 Apr 2015 13:33:30 +0200
>
> On Mon, Apr 06, 2015 at 09:35:00AM +0000, YAMAMOTO Takashi wrote:
> > Log Message:
> > Fix membars around rtld internal mutex.
> >
> > This fixes the most of lockups i observed with Open vSwitch
> > on NetBSD/amd64. ("most of" because it still occasionally
> > locks up because of other problems. see PR/49816)
>
> None of those should matter on amd64? CAS has an implicit total memory
> barrier, so this seems to just add a lot of overhead for no reason.
except that:
* _rtld_exclusive_exit doesn't use CAS
* this code is MI
i agree that something like PTHREAD__ATOMIC_IS_MEMBAR
would be a nice optimization, though.
YAMAMOTO Takashi
>
> Joerg
From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc: lib-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, yamt@NetBSD.org
Subject: Re: PR/49816 CVS commit: src/libexec/ld.elf_so
Date: Mon, 6 Apr 2015 18:22:21 +0200
On Mon, Apr 06, 2015 at 04:10:02PM +0000, YAMAMOTO Takashi wrote:
> except that:
> * _rtld_exclusive_exit doesn't use CAS
> * this code is MI
>
> i agree that something like PTHREAD__ATOMIC_IS_MEMBAR
> would be a nice optimization, though.
So which platform are you worried about that doesn't have TSO and
doesn't implicit membars for CAS? I'm asking because the only reason
those changes should help your problem is if they massively penalize the
operation.
Joerg
From: yamt@netbsd.org (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: lib-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
yamt@NetBSD.org
Subject: Re: lib/49816: rtld internal lock vs fork
Date: Mon, 6 Apr 2015 16:23:38 +0000 (UTC)
> > when a thread does fork(2), some other thread might hold _rtld_mutex.
> > in that case, the child process will likely deadlock soon because
> > of non-zero _rtld_mutex. i've observed the problem with open vswitch.
>
> Non-zero _rtld_mutex by itself should not be problem. The problem exists
> if another thread requires the exclusive lock. I do plan to rewrite rtld
> at some point to never require an exclusive lock for symbol look up, but
> that's far from trivial. In the old world before locking, you would just
> hit race conditions or not.
i agree on all points.
and, at least in my case, "another thread requires the exclusive lock"
actually happens.
YAMAMOTO Takashi
From: yamt@netbsd.org (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: lib-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
yamt@NetBSD.org
Subject: Re: PR/49816 CVS commit: src/libexec/ld.elf_so
Date: Mon, 6 Apr 2015 16:41:01 +0000 (UTC)
> The following reply was made to PR lib/49816; it has been noted by GNATS.
>
> From: Joerg Sonnenberger <joerg@britannica.bec.de>
> To: gnats-bugs@NetBSD.org
> Cc: lib-bug-people@netbsd.org, gnats-admin@netbsd.org,
> netbsd-bugs@netbsd.org, yamt@NetBSD.org
> Subject: Re: PR/49816 CVS commit: src/libexec/ld.elf_so
> Date: Mon, 6 Apr 2015 18:22:21 +0200
>
> On Mon, Apr 06, 2015 at 04:10:02PM +0000, YAMAMOTO Takashi wrote:
> > except that:
> > * _rtld_exclusive_exit doesn't use CAS
> > * this code is MI
> >
> > i agree that something like PTHREAD__ATOMIC_IS_MEMBAR
> > would be a nice optimization, though.
>
> So which platform are you worried about that doesn't have TSO and
> doesn't implicit membars for CAS? I'm asking because the only reason
> those changes should help your problem is if they massively penalize the
> operation.
well, can you explain why _rtld_exclusive_exit is safe
without cas or barrier?
YAMAMOTO Takashi
>
> Joerg
From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: PR/49816 CVS commit: src/libexec/ld.elf_so
Date: Mon, 6 Apr 2015 20:16:22 +0200
On Mon, Apr 06, 2015 at 04:45:01PM +0000, YAMAMOTO Takashi wrote:
> The following reply was made to PR lib/49816; it has been noted by GNATS.
>
> From: yamt@netbsd.org (YAMAMOTO Takashi)
> To: gnats-bugs@NetBSD.org
> Cc: lib-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
> yamt@NetBSD.org
> Subject: Re: PR/49816 CVS commit: src/libexec/ld.elf_so
> Date: Mon, 6 Apr 2015 16:41:01 +0000 (UTC)
>
> > The following reply was made to PR lib/49816; it has been noted by GNATS.
> >
> > From: Joerg Sonnenberger <joerg@britannica.bec.de>
> > To: gnats-bugs@NetBSD.org
> > Cc: lib-bug-people@netbsd.org, gnats-admin@netbsd.org,
> > netbsd-bugs@netbsd.org, yamt@NetBSD.org
> > Subject: Re: PR/49816 CVS commit: src/libexec/ld.elf_so
> > Date: Mon, 6 Apr 2015 18:22:21 +0200
> >
> > On Mon, Apr 06, 2015 at 04:10:02PM +0000, YAMAMOTO Takashi wrote:
> > > except that:
> > > * _rtld_exclusive_exit doesn't use CAS
> > > * this code is MI
> > >
> > > i agree that something like PTHREAD__ATOMIC_IS_MEMBAR
> > > would be a nice optimization, though.
> >
> > So which platform are you worried about that doesn't have TSO and
> > doesn't implicit membars for CAS? I'm asking because the only reason
> > those changes should help your problem is if they massively penalize the
> > operation.
>
> well, can you explain why _rtld_exclusive_exit is safe
> without cas or barrier?
All sane MP platforms at least implement Total Store Ordering. So all
unrelated stores are visible no later than the reset of the mutex.
That's why I am surprised that it changes anything at all for you.
Joerg
From: yamt@netbsd.org (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: lib-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
yamt@NetBSD.org
Subject: Re: PR/49816 CVS commit: src/libexec/ld.elf_so
Date: Tue, 7 Apr 2015 01:51:39 +0000 (UTC)
> The following reply was made to PR lib/49816; it has been noted by GNATS.
>
> From: Joerg Sonnenberger <joerg@britannica.bec.de>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: PR/49816 CVS commit: src/libexec/ld.elf_so
> Date: Mon, 6 Apr 2015 20:16:22 +0200
>
> On Mon, Apr 06, 2015 at 04:45:01PM +0000, YAMAMOTO Takashi wrote:
> > The following reply was made to PR lib/49816; it has been noted by GNATS.
> >
> > From: yamt@netbsd.org (YAMAMOTO Takashi)
> > To: gnats-bugs@NetBSD.org
> > Cc: lib-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
> > yamt@NetBSD.org
> > Subject: Re: PR/49816 CVS commit: src/libexec/ld.elf_so
> > Date: Mon, 6 Apr 2015 16:41:01 +0000 (UTC)
> >
> > > The following reply was made to PR lib/49816; it has been noted by GNATS.
> > >
> > > From: Joerg Sonnenberger <joerg@britannica.bec.de>
> > > To: gnats-bugs@NetBSD.org
> > > Cc: lib-bug-people@netbsd.org, gnats-admin@netbsd.org,
> > > netbsd-bugs@netbsd.org, yamt@NetBSD.org
> > > Subject: Re: PR/49816 CVS commit: src/libexec/ld.elf_so
> > > Date: Mon, 6 Apr 2015 18:22:21 +0200
> > >
> > > On Mon, Apr 06, 2015 at 04:10:02PM +0000, YAMAMOTO Takashi wrote:
> > > > except that:
> > > > * _rtld_exclusive_exit doesn't use CAS
> > > > * this code is MI
> > > >
> > > > i agree that something like PTHREAD__ATOMIC_IS_MEMBAR
> > > > would be a nice optimization, though.
> > >
> > > So which platform are you worried about that doesn't have TSO and
> > > doesn't implicit membars for CAS? I'm asking because the only reason
> > > those changes should help your problem is if they massively penalize the
> > > operation.
> >
> > well, can you explain why _rtld_exclusive_exit is safe
> > without cas or barrier?
>
> All sane MP platforms at least implement Total Store Ordering. So all
> unrelated stores are visible no later than the reset of the mutex.
> That's why I am surprised that it changes anything at all for you.
>
> Joerg
the intel's manual says: (8.2.2)
Reads may be reordered with older writes to different locations
but not with older writes to the same location.
see also 8.2.3.4 Example 8-3.
isn't it the case for _rtld_exclusive_exit?
YAMAMOTO Takashi
From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: PR/49816 CVS commit: src/libexec/ld.elf_so
Date: Tue, 7 Apr 2015 13:03:14 +0200
On Tue, Apr 07, 2015 at 01:55:01AM +0000, YAMAMOTO Takashi wrote:
> the intel's manual says: (8.2.2)
>
> Reads may be reordered with older writes to different locations
> but not with older writes to the same location.
>
> see also 8.2.3.4 Example 8-3.
>
> isn't it the case for _rtld_exclusive_exit?
I don't think that's relevant, this is about global visiblity of
changes. E.g. if another thread tries to get the mutex, it must see all
changes from the current thread when successfully acquired the mutex.
Joerg
From: Masao Uebayashi <uebayasi@gmail.com>
To: gnats-bugs@netbsd.org
Cc: lib-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
YAMAMOTO Takashi <yamt@netbsd.org>
Subject: Re: PR/49816 CVS commit: src/libexec/ld.elf_so
Date: Wed, 8 Apr 2015 12:42:28 +0900
> I don't think that's relevant, this is about global visiblity of
> changes. E.g. if another thread tries to get the mutex, it must see all
> changes from the current thread when successfully acquired the mutex.
>
> Joerg
I can't follow... Your first message was:
> None of those should matter on amd64? CAS has an implicit total memory
> barrier, so this seems to just add a lot of overhead for no reason.
If you're talking about atomic vs. memory barrier, isn't it something
addressed by the new C11 atomic API? Kernel mutex code has
MUTEX_RECEIVE()/MUTEX_GIVE() for that purpose (IIUC).
From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc: lib-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, yamt@NetBSD.org
Subject: Re: PR/49816 CVS commit: src/libexec/ld.elf_so
Date: Wed, 8 Apr 2015 11:39:06 +0200
On Wed, Apr 08, 2015 at 03:45:01AM +0000, Masao Uebayashi wrote:
> The following reply was made to PR lib/49816; it has been noted by GNATS.
>
> From: Masao Uebayashi <uebayasi@gmail.com>
> To: gnats-bugs@netbsd.org
> Cc: lib-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
> YAMAMOTO Takashi <yamt@netbsd.org>
> Subject: Re: PR/49816 CVS commit: src/libexec/ld.elf_so
> Date: Wed, 8 Apr 2015 12:42:28 +0900
>
> > I don't think that's relevant, this is about global visiblity of
> > changes. E.g. if another thread tries to get the mutex, it must see all
> > changes from the current thread when successfully acquired the mutex.
> >
> > Joerg
>
> I can't follow... Your first message was:
The diff commited by yamt-san should not make any difference on
architectures with Total Store Ordering, including x86, which he says he
is running.
> > None of those should matter on amd64? CAS has an implicit total memory
> > barrier, so this seems to just add a lot of overhead for no reason.
>
> If you're talking about atomic vs. memory barrier, isn't it something
> addressed by the new C11 atomic API? Kernel mutex code has
> MUTEX_RECEIVE()/MUTEX_GIVE() for that purpose (IIUC).
Yes, except we don't have it available. I'm not sure if there even is
support for it in gcc-4.8. For Clang, it would be just a question of
installing stdatomic.h somewhere.
Joerg
From: yamt@netbsd.org (YAMAMOTO Takashi)
To: gnats-bugs@NetBSD.org
Cc: lib-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
yamt@NetBSD.org
Subject: Re: PR/49816 CVS commit: src/libexec/ld.elf_so
Date: Wed, 29 Apr 2015 01:45:51 +0000 (UTC)
> The following reply was made to PR lib/49816; it has been noted by GNATS.
>
> From: Joerg Sonnenberger <joerg@britannica.bec.de>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: PR/49816 CVS commit: src/libexec/ld.elf_so
> Date: Tue, 7 Apr 2015 13:03:14 +0200
>
> On Tue, Apr 07, 2015 at 01:55:01AM +0000, YAMAMOTO Takashi wrote:
> > the intel's manual says: (8.2.2)
> >
> > Reads may be reordered with older writes to different locations
> > but not with older writes to the same location.
> >
> > see also 8.2.3.4 Example 8-3.
> >
> > isn't it the case for _rtld_exclusive_exit?
>
> I don't think that's relevant, this is about global visiblity of
> changes. E.g. if another thread tries to get the mutex, it must see all
> changes from the current thread when successfully acquired the mutex.
>
> Joerg
sorry for late reply.
_rtld_exclusive_exit
_rtld_mutex = 0; // older write to different location
waiter = _rtld_waiter_exclusive // read
my interpretation of the manual text is that these two accesses
can be reordered.
if it happens, _rtld_exclusive_enter can block on now unlocked mutex.
YAMAMOTO Takashi
State-Changed-From-To: open->feedback
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Mon, 04 Jul 2022 16:17:42 +0000
State-Changed-Why:
Was this addressed by the following commits?
https://mail-index.netbsd.org/source-changes/2020/04/16/msg116256.html
https://mail-index.netbsd.org/source-changes/2020/04/19/msg116337.html
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.