NetBSD Problem Report #56412

From www@netbsd.org  Mon Sep 20 15:22:16 2021
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 691FA1A921F
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 20 Sep 2021 15:22:16 +0000 (UTC)
Message-Id: <20210920152215.3AC9A1A923A@mollari.NetBSD.org>
Date: Mon, 20 Sep 2021 15:22:15 +0000 (UTC)
From: thorpej@me.com
Reply-To: thorpej@me.com
To: gnats-bugs@NetBSD.org
Subject: lwp_dtor() causes cross-call storm
X-Send-Pr-Version: www-1.0

>Number:         56412
>Category:       kern
>Synopsis:       lwp_dtor() causes cross-call storm
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Sep 20 15:25:00 +0000 2021
>Originator:     Jason Thorpe
>Release:        NetBSD 9.99.82 (and many releases prior)
>Organization:
RISCy Business
>Environment:
NetBSD the-ripe-vessel 9.99.82 NetBSD 9.99.82 (GENERIC) #0: Tue May 18 17:05:45 UTC 2021  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
The pool cache infrastructure provides a weak memory type-stability model for objects that use it. This feature is relied upon by kern_mutex.c and kern_rwlock.c in order to check if a lock owner is currently running on a CPU.

In order to ensure that mutex_oncpu() and rw_oncpu() are no longer referencing an LWP object that is about to be freed back to the system (and thus lose its type-stable property), lwp_dtor() performs an xc_barrier(0).

The problem is that lwp_dtor() is called once for each LWP in a page that's being released back to the system.  This is necessary in order to properly tear down the LWP object, but is NOT needed to ensure the type stability relied upon by mutex_oncpu() and rw_oncpu(); only **one** xc_barrier() is needed before then calling the destructor for each LWP object.

The upshot of the current implementation is that freeing a page that just happened to back LWP objects causes a brief cross-call storm.  On systems with a small number of CPU, this is probably not very noticeable.  However, on a system with a large number of CPUs, this could constitute an intermittent performance problem whenever the system comes under even slight memory pressure.
>How-To-Repeat:
This was noticed during code inspection; constructing a reproducer is left as an exercise for the reader.
>Fix:
Provide a mechanism to register a pre-DTOR hook for the pool cache layer to invoke.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.