NetBSD Problem Report #51632

From www@NetBSD.org  Thu Nov 17 01:48:39 2016
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 3B3D77A2F1
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 17 Nov 2016 01:48:39 +0000 (UTC)
Message-Id: <20161117014837.99BFB7A2F3@mollari.NetBSD.org>
Date: Thu, 17 Nov 2016 01:48:37 +0000 (UTC)
From: ozaki-r@netbsd.org
Reply-To: ozaki-r@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: Fix a race condition of low priority xcall
X-Send-Pr-Version: www-1.0

>Number:         51632
>Category:       kern
>Synopsis:       Fix a race condition of low priority xcall
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Nov 17 01:50:00 +0000 2016
>Closed-Date:    Fri Jul 07 00:55:24 +0000 2017
>Last-Modified:  Fri Jul 07 00:55:24 +0000 2017
>Originator:     Ryota Ozaki
>Release:        6, 7, -current (since xcall appeared)
>Organization:
IIJ
>Environment:
NetBSD kvm 7.99.42 NetBSD 7.99.42 (KVM) #456: Wed Nov 16 17:57:19 JST 2016  ozaki-r@rangeley:(hidden) amd64
>Description:
xc_lowpri and xc_thread are racy and xc_wait may return during/before
executing all xcall callbacks, resulting in a kernel panic at worst.

xc_lowpri serializes multiple jobs by a mutex and a cv. If all xcall callbacks
are done, xc_wait returns and also xc_lowpri accepts a next job.

The problem is that a counter that counts the number of finished xcall
callbacks is incremented *before* actually executing a xcall callback
(see xc_tailp++ in xc_thread). So xc_lowpri accepts a next job before
all xcall callbacks complete and a next job begins to run its xcall callbacks.

Even worse the counter is global and shared between jobs, so if a xcall
callback of the next job completes, the shared counter is incremented,
which confuses wc_wait of the previous job as all xcall callbacks of the
previous job are done and wc_wait of the previous job returns
during/before executing its xcall callbacks.

In psref_target_destroy case, arguments of a xcall callback are local
variables of a function that calls xc_broadcast and xc_wait. So early
return of xc_wait (auto-)deallocates the variables, which leads dangling
dereferences by a xcall callback resulting in say a kernel panic.

One example of kernel panicks is:
  panic: kernel diagnostic assertion "(target->prt_class == class)" failed: file "(hidden)/sys/kern/subr_psref.c", line 485 mismatched psref target class: 0x0 (ref) != 0x2 (expected)

>How-To-Repeat:
I encountered the issue with a modified kernel that introduces
psref_target_destroy, which uses low priority xcall, for rtentries. It allows
parallel executions of psref_target_destroy for a rtentry and an ifaddr.

Nonetheless the issue theoretically happens if users of low priority xcall
for any targets run in parallel.

I can reproduce the issue by letting destructions of a rtentry and a
ifaddr occur in parallel, for example by the following steps:
- boot a kernel (modified) with NET_MPSAFE enabled
- setup IP forwarding
- send traffic over the forwarding path
- repeat assigning and deassigning IP addresses on the interfaces
- wait for several minutes

>Fix:
There are two counters that count the number of finished xcall callbacks for low priority xcall
for historical reasons (I guess): xc_tailp and xc_low_pri.xc_donep. xc_low_pri.xc_donep is
incremented correctly while xc_tailp is incremented wrongly, i.e., before executing a xcall
callback.

We can fix the issue by dropping xc_tailp and using only xc_low_pri.xc_donep.

diff --git a/sys/kern/subr_xcall.c b/sys/kern/subr_xcall.c
index fb4630f..77996d6 100644
--- a/sys/kern/subr_xcall.c
+++ b/sys/kern/subr_xcall.c
@@ -105,7 +105,6 @@ typedef struct {

 /* Low priority xcall structures. */
 static xc_state_t	xc_low_pri	__cacheline_aligned;
-static uint64_t		xc_tailp	__cacheline_aligned;

 /* High priority xcall structures. */
 static xc_state_t	xc_high_pri	__cacheline_aligned;
@@ -134,7 +133,6 @@ xc_init(void)
 	memset(xclo, 0, sizeof(xc_state_t));
 	mutex_init(&xclo->xc_lock, MUTEX_DEFAULT, IPL_NONE);
 	cv_init(&xclo->xc_busy, "xclocv");
-	xc_tailp = 0;

 	memset(xchi, 0, sizeof(xc_state_t));
 	mutex_init(&xchi->xc_lock, MUTEX_DEFAULT, IPL_SOFTSERIAL);
@@ -256,7 +254,7 @@ xc_lowpri(xcfunc_t func, void *arg1, void *arg2, struct cpu_info *ci)
 	uint64_t where;

 	mutex_enter(&xc->xc_lock);
-	while (xc->xc_headp != xc_tailp) {
+	while (xc->xc_headp != xc->xc_donep) {
 		cv_wait(&xc->xc_busy, &xc->xc_lock);
 	}
 	xc->xc_arg1 = arg1;
@@ -277,7 +275,7 @@ xc_lowpri(xcfunc_t func, void *arg1, void *arg2, struct cpu_info *ci)
 		ci->ci_data.cpu_xcall_pending = true;
 		cv_signal(&ci->ci_data.cpu_xcall);
 	}
-	KASSERT(xc_tailp < xc->xc_headp);
+	KASSERT(xc->xc_donep < xc->xc_headp);
 	where = xc->xc_headp;
 	mutex_exit(&xc->xc_lock);

@@ -302,7 +300,7 @@ xc_thread(void *cookie)
 	mutex_enter(&xc->xc_lock);
 	for (;;) {
 		while (!ci->ci_data.cpu_xcall_pending) {
-			if (xc->xc_headp == xc_tailp) {
+			if (xc->xc_headp == xc->xc_donep) {
 				cv_broadcast(&xc->xc_busy);
 			}
 			cv_wait(&ci->ci_data.cpu_xcall, &xc->xc_lock);
@@ -312,7 +310,6 @@ xc_thread(void *cookie)
 		func = xc->xc_func;
 		arg1 = xc->xc_arg1;
 		arg2 = xc->xc_arg2;
-		xc_tailp++;
 		mutex_exit(&xc->xc_lock);

 		KASSERT(func != NULL);

>Release-Note:

>Audit-Trail:
From: "Ryota Ozaki" <ozaki-r@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/51632 CVS commit: src/sys/kern
Date: Mon, 21 Nov 2016 00:54:21 +0000

 Module Name:	src
 Committed By:	ozaki-r
 Date:		Mon Nov 21 00:54:21 UTC 2016

 Modified Files:
 	src/sys/kern: subr_xcall.c

 Log Message:
 Fix a race condition of low priority xcall

 xc_lowpri and xc_thread are racy and xc_wait may return during/before
 executing all xcall callbacks, resulting in a kernel panic at worst.

 xc_lowpri serializes multiple jobs by a mutex and a cv. If all xcall
 callbacks are done, xc_wait returns and also xc_lowpri accepts a next job.

 The problem is that a counter that counts the number of finished xcall
 callbacks is incremented *before* actually executing a xcall callback
 (see xc_tailp++ in xc_thread). So xc_lowpri accepts a next job before
 all xcall callbacks complete and a next job begins to run its xcall callbacks.

 Even worse the counter is global and shared between jobs, so if a xcall
 callback of the next job completes, the shared counter is incremented,
 which confuses wc_wait of the previous job as all xcall callbacks of the
 previous job are done and wc_wait of the previous job returns during/before
 executing its xcall callbacks.

 How to fix: there are actually two counters that count the number of finished
 xcall callbacks for low priority xcall for historical reasons (I guess):
 xc_tailp and xc_low_pri.xc_donep. xc_low_pri.xc_donep is incremented correctly
 while xc_tailp is incremented wrongly, i.e., before executing a xcall callback.
 We can fix the issue by dropping xc_tailp and using only xc_low_pri.xc_donep.

 PR kern/51632


 To generate a diff of this commit:
 cvs rdiff -u -r1.18 -r1.19 src/sys/kern/subr_xcall.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Soren Jacobsen" <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/51632 CVS commit: [netbsd-7] src/sys/kern
Date: Mon, 12 Dec 2016 07:29:16 +0000

 Module Name:	src
 Committed By:	snj
 Date:		Mon Dec 12 07:29:16 UTC 2016

 Modified Files:
 	src/sys/kern [netbsd-7]: subr_xcall.c

 Log Message:
 Pull up following revision(s) (requested by ozaki-r in ticket #1306):
 	sys/kern/subr_xcall.c: revision 1.19
 Fix a race condition of low priority xcall
 xc_lowpri and xc_thread are racy and xc_wait may return during/before
 executing all xcall callbacks, resulting in a kernel panic at worst.
 xc_lowpri serializes multiple jobs by a mutex and a cv. If all xcall
 callbacks are done, xc_wait returns and also xc_lowpri accepts a next job.
 The problem is that a counter that counts the number of finished xcall
 callbacks is incremented *before* actually executing a xcall callback
 (see xc_tailp++ in xc_thread). So xc_lowpri accepts a next job before
 all xcall callbacks complete and a next job begins to run its xcall callbacks.
 Even worse the counter is global and shared between jobs, so if a xcall
 callback of the next job completes, the shared counter is incremented,
 which confuses wc_wait of the previous job as all xcall callbacks of the
 previous job are done and wc_wait of the previous job returns during/before
 executing its xcall callbacks.
 How to fix: there are actually two counters that count the number of finished
 xcall callbacks for low priority xcall for historical reasons (I guess):
 xc_tailp and xc_low_pri.xc_donep. xc_low_pri.xc_donep is incremented correctly
 while xc_tailp is incremented wrongly, i.e., before executing a xcall callback.
 We can fix the issue by dropping xc_tailp and using only xc_low_pri.xc_donep.
 PR kern/51632


 To generate a diff of this commit:
 cvs rdiff -u -r1.18 -r1.18.4.1 src/sys/kern/subr_xcall.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Soren Jacobsen" <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/51632 CVS commit: [netbsd-7-0] src/sys/kern
Date: Mon, 12 Dec 2016 07:30:20 +0000

 Module Name:	src
 Committed By:	snj
 Date:		Mon Dec 12 07:30:20 UTC 2016

 Modified Files:
 	src/sys/kern [netbsd-7-0]: subr_xcall.c

 Log Message:
 Pull up following revision(s) (requested by ozaki-r in ticket #1306):
 	sys/kern/subr_xcall.c: revision 1.19
 Fix a race condition of low priority xcall
 xc_lowpri and xc_thread are racy and xc_wait may return during/before
 executing all xcall callbacks, resulting in a kernel panic at worst.
 xc_lowpri serializes multiple jobs by a mutex and a cv. If all xcall
 callbacks are done, xc_wait returns and also xc_lowpri accepts a next job.
 The problem is that a counter that counts the number of finished xcall
 callbacks is incremented *before* actually executing a xcall callback
 (see xc_tailp++ in xc_thread). So xc_lowpri accepts a next job before
 all xcall callbacks complete and a next job begins to run its xcall callbacks.
 Even worse the counter is global and shared between jobs, so if a xcall
 callback of the next job completes, the shared counter is incremented,
 which confuses wc_wait of the previous job as all xcall callbacks of the
 previous job are done and wc_wait of the previous job returns during/before
 executing its xcall callbacks.
 How to fix: there are actually two counters that count the number of finished
 xcall callbacks for low priority xcall for historical reasons (I guess):
 xc_tailp and xc_low_pri.xc_donep. xc_low_pri.xc_donep is incremented correctly
 while xc_tailp is incremented wrongly, i.e., before executing a xcall callback.
 We can fix the issue by dropping xc_tailp and using only xc_low_pri.xc_donep.
 PR kern/51632


 To generate a diff of this commit:
 cvs rdiff -u -r1.18 -r1.18.8.1 src/sys/kern/subr_xcall.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->pending-pullups
State-Changed-By: ozaki-r@NetBSD.org
State-Changed-When: Mon, 12 Dec 2016 07:40:12 +0000
State-Changed-Why:
pullup-7 #1306, pullup-6 #1419


From: "Soren Jacobsen" <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/51632 CVS commit: [netbsd-6-0] src/sys/kern
Date: Thu, 6 Jul 2017 15:18:23 +0000

 Module Name:	src
 Committed By:	snj
 Date:		Thu Jul  6 15:18:23 UTC 2017

 Modified Files:
 	src/sys/kern [netbsd-6-0]: subr_xcall.c

 Log Message:
 Pull up following revision(s) (requested by ozaki-r in ticket #1419):
 	sys/kern/subr_xcall.c: revision 1.19
 Fix a race condition of low priority xcall
 xc_lowpri and xc_thread are racy and xc_wait may return during/before
 executing all xcall callbacks, resulting in a kernel panic at worst.
 xc_lowpri serializes multiple jobs by a mutex and a cv. If all xcall
 callbacks are done, xc_wait returns and also xc_lowpri accepts a next job.
 The problem is that a counter that counts the number of finished xcall
 callbacks is incremented *before* actually executing a xcall callback
 (see xc_tailp++ in xc_thread). So xc_lowpri accepts a next job before
 all xcall callbacks complete and a next job begins to run its xcall callbacks.
 Even worse the counter is global and shared between jobs, so if a xcall
 callback of the next job completes, the shared counter is incremented,
 which confuses wc_wait of the previous job as all xcall callbacks of the
 previous job are done and wc_wait of the previous job returns during/before
 executing its xcall callbacks.
 How to fix: there are actually two counters that count the number of finished
 xcall callbacks for low priority xcall for historical reasons (I guess):
 xc_tailp and xc_low_pri.xc_donep. xc_low_pri.xc_donep is incremented correctly
 while xc_tailp is incremented wrongly, i.e., before executing a xcall callback.
 We can fix the issue by dropping xc_tailp and using only xc_low_pri.xc_donep.
 PR kern/51632


 To generate a diff of this commit:
 cvs rdiff -u -r1.13.16.1 -r1.13.16.2 src/sys/kern/subr_xcall.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Soren Jacobsen" <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/51632 CVS commit: [netbsd-6-1] src/sys/kern
Date: Thu, 6 Jul 2017 15:19:02 +0000

 Module Name:	src
 Committed By:	snj
 Date:		Thu Jul  6 15:19:01 UTC 2017

 Modified Files:
 	src/sys/kern [netbsd-6-1]: subr_xcall.c

 Log Message:
 Pull up following revision(s) (requested by ozaki-r in ticket #1419):
 	sys/kern/subr_xcall.c: revision 1.19
 Fix a race condition of low priority xcall
 xc_lowpri and xc_thread are racy and xc_wait may return during/before
 executing all xcall callbacks, resulting in a kernel panic at worst.
 xc_lowpri serializes multiple jobs by a mutex and a cv. If all xcall
 callbacks are done, xc_wait returns and also xc_lowpri accepts a next job.
 The problem is that a counter that counts the number of finished xcall
 callbacks is incremented *before* actually executing a xcall callback
 (see xc_tailp++ in xc_thread). So xc_lowpri accepts a next job before
 all xcall callbacks complete and a next job begins to run its xcall callbacks.
 Even worse the counter is global and shared between jobs, so if a xcall
 callback of the next job completes, the shared counter is incremented,
 which confuses wc_wait of the previous job as all xcall callbacks of the
 previous job are done and wc_wait of the previous job returns during/before
 executing its xcall callbacks.
 How to fix: there are actually two counters that count the number of finished
 xcall callbacks for low priority xcall for historical reasons (I guess):
 xc_tailp and xc_low_pri.xc_donep. xc_low_pri.xc_donep is incremented correctly
 while xc_tailp is incremented wrongly, i.e., before executing a xcall callback.
 We can fix the issue by dropping xc_tailp and using only xc_low_pri.xc_donep.
 PR kern/51632


 To generate a diff of this commit:
 cvs rdiff -u -r1.13.10.1 -r1.13.10.1.2.1 src/sys/kern/subr_xcall.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Soren Jacobsen" <snj@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/51632 CVS commit: [netbsd-6] src/sys/kern
Date: Thu, 6 Jul 2017 15:20:00 +0000

 Module Name:	src
 Committed By:	snj
 Date:		Thu Jul  6 15:20:00 UTC 2017

 Modified Files:
 	src/sys/kern [netbsd-6]: subr_xcall.c

 Log Message:
 Pull up following revision(s) (requested by ozaki-r in ticket #1419):
 	sys/kern/subr_xcall.c: revision 1.19
 Fix a race condition of low priority xcall
 xc_lowpri and xc_thread are racy and xc_wait may return during/before
 executing all xcall callbacks, resulting in a kernel panic at worst.
 xc_lowpri serializes multiple jobs by a mutex and a cv. If all xcall
 callbacks are done, xc_wait returns and also xc_lowpri accepts a next job.
 The problem is that a counter that counts the number of finished xcall
 callbacks is incremented *before* actually executing a xcall callback
 (see xc_tailp++ in xc_thread). So xc_lowpri accepts a next job before
 all xcall callbacks complete and a next job begins to run its xcall callbacks.
 Even worse the counter is global and shared between jobs, so if a xcall
 callback of the next job completes, the shared counter is incremented,
 which confuses wc_wait of the previous job as all xcall callbacks of the
 previous job are done and wc_wait of the previous job returns during/before
 executing its xcall callbacks.
 How to fix: there are actually two counters that count the number of finished
 xcall callbacks for low priority xcall for historical reasons (I guess):
 xc_tailp and xc_low_pri.xc_donep. xc_low_pri.xc_donep is incremented correctly
 while xc_tailp is incremented wrongly, i.e., before executing a xcall callback.
 We can fix the issue by dropping xc_tailp and using only xc_low_pri.xc_donep.
 PR kern/51632


 To generate a diff of this commit:
 cvs rdiff -u -r1.13.10.1 -r1.13.10.2 src/sys/kern/subr_xcall.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: ozaki-r@NetBSD.org
State-Changed-When: Fri, 07 Jul 2017 00:55:24 +0000
State-Changed-Why:
pullup done


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.