NetBSD Problem Report #56592
From www@netbsd.org Fri Dec 31 02:05:20 2021
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 8E91F1A9239
for <gnats-bugs@gnats.NetBSD.org>; Fri, 31 Dec 2021 02:05:20 +0000 (UTC)
Message-Id: <20211231020519.3C0211A923C@mollari.NetBSD.org>
Date: Fri, 31 Dec 2021 02:05:19 +0000 (UTC)
From: baijiaju1990@gmail.com
Reply-To: baijiaju1990@gmail.com
To: gnats-bugs@NetBSD.org
Subject: sys: dev: sysmon: possible ABBA deadlock in sme_events_worker() and sme_events_check()
X-Send-Pr-Version: www-1.0
>Number: 56592
>Category: kern
>Synopsis: sys: dev: sysmon: possible ABBA deadlock in sme_events_worker() and sme_events_check()
>Confidential: no
>Severity: critical
>Priority: medium
>Responsible: pgoyette
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Dec 31 02:10:00 +0000 2021
>Closed-Date: Fri Dec 31 15:13:37 +0000 2021
>Last-Modified: Fri Dec 31 15:13:37 +0000 2021
>Originator: Jia-Ju Bai
>Release: 9.2
>Organization:
Tsinghua University
>Environment:
NetBSD localhost 9.2 NetBSD 9.2 (GENERIC) #0: Wed May 12 13:15:55 UTC 2021 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
Hello,
My static analysis tool reports a possible ABBA deadlock in the sysmon module in NetBSD-9.2:
sme_events_worker()
mutex_enter(&sme->sme_mtx); --> Line 775 (Lock A)
mutex_enter(&sme->sme_work_mtx); --> Line 833 (Lock B)
sme_events_check()
mutex_enter(&sme->sme_work_mtx); --> Line 739 (Lock B)
mutex_enter(&sme->sme_mtx); --> Line 746 (Lock A)
When sme_events_worker() and sme_events_check() are concurrently executed, the deadlocks can occur.
I am not quite sure whether this possible deadlock is real and how to fix it if it is real.
Any feedback would be appreciated, thanks :)
Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>
Best wishes,
Jia-Ju Bai
>How-To-Repeat:
I think we could execute sme_events_worker() and sme_events_check() concurrently to trigger this possible deadlock.
>Fix:
A possible way is to exchange mutex_enter(&sme->sme_work_mtx) and mutex_enter(&sme->sme_mtx) in sme_events_check().
>Release-Note:
>Audit-Trail:
From: Paul Goyette <paul@whooppee.com>
To: gnats@netbsd.org
Cc: baijiaju1990@gmail.com
Subject: Re: kern/56592 - sys: dev: sysmon: possible ABBA deadlock in
sme_events_worker() and sme_events_check()
Date: Thu, 30 Dec 2021 19:37:30 -0800 (PST)
Yeah, that looks like a possible deadlock. The following patch to
sme_events_check() should re-order the mutex_enter() calls to avoid
this. (Note, I have not actually confirmed that the deadlock will
ever occur, and the patch has only been compile-tested!)
Index: sysmon_envsys_events.c
===================================================================
RCS file: /cvsroot/src/sys/dev/sysmon/sysmon_envsys_events.c,v
retrieving revision 1.121
diff -u -p -r1.121 sysmon_envsys_events.c
--- sysmon_envsys_events.c 11 Sep 2017 06:02:09 -0000 1.121
+++ sysmon_envsys_events.c 31 Dec 2021 03:32:31 -0000
@@ -736,14 +736,15 @@ sme_events_check(void *arg)
KASSERT(sme != NULL);
+ mutex_enter(&sme->sme_mtx);
mutex_enter(&sme->sme_work_mtx);
if (sme->sme_busy > 0) {
log(LOG_WARNING, "%s: workqueue busy: updates stopped\n",
sme->sme_name);
mutex_exit(&sme->sme_work_mtx);
+ mutex_exit(&sme->sme_mtx);
return;
}
- mutex_enter(&sme->sme_mtx);
LIST_FOREACH(see, &sme->sme_events_list, see_list) {
workqueue_enqueue(sme->sme_wq, &see->see_wk, NULL);
see->see_edata->flags |= ENVSYS_FNEED_REFRESH;
@@ -751,8 +752,8 @@ sme_events_check(void *arg)
}
if (!sysmon_low_power)
sme_schedule_callout(sme);
- mutex_exit(&sme->sme_mtx);
mutex_exit(&sme->sme_work_mtx);
+ mutex_exit(&sme->sme_mtx);
}
/*
+--------------------+--------------------------+----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | paul@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette@netbsd.org |
| & Network Engineer | | pgoyette99@gmail.com |
+--------------------+--------------------------+----------------------+
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/56592 CVS commit: src/sys/dev/sysmon
Date: Fri, 31 Dec 2021 14:30:05 +0000
Module Name: src
Committed By: riastradh
Date: Fri Dec 31 14:30:04 UTC 2021
Modified Files:
src/sys/dev/sysmon: sysmon_envsys.c sysmon_envsys_events.c sysmonvar.h
Log Message:
sysmon(9): Fix callout/thread synchronization.
Callout may ONLY take sme_work_mtx, at IPL_SOFTCLOCK; MUST NOT touch
sme_mtx at IPL_NONE. All state the callout needs is serialized by
sme_work_mtx now:
- calls to sme_schedule_callout
- calls to sme_schedule_halt
- struct sysmon_envsys::sme_events_timeout
- struct sysmon_envsys::sme_events_list
- struct sysmon_envsys::sme_callout_state
- struct envsys_data::flags
=> yes, this is a little silly -- used for ENVSYS_FNEED_REFRESH
=> should maybe separate the static driver-defined features from
the state flags needed by sysmon_envsys but not important now
Sleeping under sme_work_mtx (except on other adaptive locks at
IPL_SOFTCLOCK) is forbidden. Calling out to the driver under
sme_work_mtx is forbidden.
This should properly fix:
https://mail-index.netbsd.org/tech-kern/2015/10/14/msg019511.html
PR kern/56592
To generate a diff of this commit:
cvs rdiff -u -r1.149 -r1.150 src/sys/dev/sysmon/sysmon_envsys.c
cvs rdiff -u -r1.122 -r1.123 src/sys/dev/sysmon/sysmon_envsys_events.c
cvs rdiff -u -r1.51 -r1.52 src/sys/dev/sysmon/sysmonvar.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Responsible-Changed-From-To: kern-bug-people->pgoyette
Responsible-Changed-By: pgoyette@NetBSD.org
Responsible-Changed-When: Fri, 31 Dec 2021 15:13:37 +0000
Responsible-Changed-Why:
grab
State-Changed-From-To: open->closed
State-Changed-By: pgoyette@NetBSD.org
State-Changed-When: Fri, 31 Dec 2021 15:13:37 +0000
State-Changed-Why:
riasatradh@ committed a fix (along with several others)
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.