NetBSD Problem Report #56592

From www@netbsd.org  Fri Dec 31 02:05:20 2021
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 8E91F1A9239
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 31 Dec 2021 02:05:20 +0000 (UTC)
Message-Id: <20211231020519.3C0211A923C@mollari.NetBSD.org>
Date: Fri, 31 Dec 2021 02:05:19 +0000 (UTC)
From: baijiaju1990@gmail.com
Reply-To: baijiaju1990@gmail.com
To: gnats-bugs@NetBSD.org
Subject: sys: dev: sysmon: possible ABBA deadlock in sme_events_worker() and sme_events_check()
X-Send-Pr-Version: www-1.0

>Number:         56592
>Category:       kern
>Synopsis:       sys: dev: sysmon: possible ABBA deadlock in sme_events_worker() and sme_events_check()
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    pgoyette
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Dec 31 02:10:00 +0000 2021
>Closed-Date:    Fri Dec 31 15:13:37 +0000 2021
>Last-Modified:  Fri Dec 31 15:13:37 +0000 2021
>Originator:     Jia-Ju Bai
>Release:        9.2
>Organization:
Tsinghua University
>Environment:
NetBSD localhost 9.2 NetBSD 9.2 (GENERIC) #0: Wed May 12 13:15:55 UTC 2021  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
Hello,

My static analysis tool reports a possible ABBA deadlock in the sysmon module in NetBSD-9.2:

sme_events_worker()
  mutex_enter(&sme->sme_mtx); --> Line 775 (Lock A)
  mutex_enter(&sme->sme_work_mtx); --> Line 833 (Lock B)

sme_events_check()
  mutex_enter(&sme->sme_work_mtx); --> Line 739 (Lock B)
  mutex_enter(&sme->sme_mtx); --> Line 746 (Lock A)

When sme_events_worker() and sme_events_check() are concurrently executed, the deadlocks can occur.

I am not quite sure whether this possible deadlock is real and how to fix it if it is real.
Any feedback would be appreciated, thanks :)

Reported-by: TOTE Robot <oslab@tsinghua.edu.cn>


Best wishes,
Jia-Ju Bai 
>How-To-Repeat:
I think we could execute sme_events_worker() and sme_events_check() concurrently to trigger this possible deadlock.
>Fix:
A possible way is to exchange mutex_enter(&sme->sme_work_mtx) and mutex_enter(&sme->sme_mtx) in sme_events_check().

>Release-Note:

>Audit-Trail:
From: Paul Goyette <paul@whooppee.com>
To: gnats@netbsd.org
Cc: baijiaju1990@gmail.com
Subject: Re: kern/56592 - sys: dev: sysmon: possible ABBA deadlock in
 sme_events_worker() and sme_events_check()
Date: Thu, 30 Dec 2021 19:37:30 -0800 (PST)

 Yeah, that looks like a possible deadlock.  The following patch to
 sme_events_check() should re-order the mutex_enter() calls to avoid
 this.  (Note, I have not actually confirmed that the deadlock will
 ever occur, and the patch has only been compile-tested!)


 Index: sysmon_envsys_events.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/sysmon/sysmon_envsys_events.c,v
 retrieving revision 1.121
 diff -u -p -r1.121 sysmon_envsys_events.c
 --- sysmon_envsys_events.c	11 Sep 2017 06:02:09 -0000	1.121
 +++ sysmon_envsys_events.c	31 Dec 2021 03:32:31 -0000
 @@ -736,14 +736,15 @@ sme_events_check(void *arg)

   	KASSERT(sme != NULL);

 +	mutex_enter(&sme->sme_mtx);
   	mutex_enter(&sme->sme_work_mtx);
   	if (sme->sme_busy > 0) {
   		log(LOG_WARNING, "%s: workqueue busy: updates stopped\n",
   		    sme->sme_name);
   		mutex_exit(&sme->sme_work_mtx);
 +		mutex_exit(&sme->sme_mtx);
   		return;
   	}
 -	mutex_enter(&sme->sme_mtx);
   	LIST_FOREACH(see, &sme->sme_events_list, see_list) {
   		workqueue_enqueue(sme->sme_wq, &see->see_wk, NULL);
   		see->see_edata->flags |= ENVSYS_FNEED_REFRESH;
 @@ -751,8 +752,8 @@ sme_events_check(void *arg)
   	}
   	if (!sysmon_low_power)
   		sme_schedule_callout(sme);
 -	mutex_exit(&sme->sme_mtx);
   	mutex_exit(&sme->sme_work_mtx);
 +	mutex_exit(&sme->sme_mtx);
   }

   /*



 +--------------------+--------------------------+----------------------+
 | Paul Goyette       | PGP Key fingerprint:     | E-mail addresses:    |
 | (Retired)          | FA29 0E3B 35AF E8AE 6651 | paul@whooppee.com    |
 | Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette@netbsd.org  |
 | & Network Engineer |                          | pgoyette99@gmail.com |
 +--------------------+--------------------------+----------------------+

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56592 CVS commit: src/sys/dev/sysmon
Date: Fri, 31 Dec 2021 14:30:05 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Fri Dec 31 14:30:04 UTC 2021

 Modified Files:
 	src/sys/dev/sysmon: sysmon_envsys.c sysmon_envsys_events.c sysmonvar.h

 Log Message:
 sysmon(9): Fix callout/thread synchronization.

 Callout may ONLY take sme_work_mtx, at IPL_SOFTCLOCK; MUST NOT touch
 sme_mtx at IPL_NONE.  All state the callout needs is serialized by
 sme_work_mtx now:

 - calls to sme_schedule_callout
 - calls to sme_schedule_halt
 - struct sysmon_envsys::sme_events_timeout
 - struct sysmon_envsys::sme_events_list
 - struct sysmon_envsys::sme_callout_state
 - struct envsys_data::flags
   => yes, this is a little silly -- used for ENVSYS_FNEED_REFRESH
   => should maybe separate the static driver-defined features from
      the state flags needed by sysmon_envsys but not important now

 Sleeping under sme_work_mtx (except on other adaptive locks at
 IPL_SOFTCLOCK) is forbidden.  Calling out to the driver under
 sme_work_mtx is forbidden.

 This should properly fix:

 https://mail-index.netbsd.org/tech-kern/2015/10/14/msg019511.html
 PR kern/56592


 To generate a diff of this commit:
 cvs rdiff -u -r1.149 -r1.150 src/sys/dev/sysmon/sysmon_envsys.c
 cvs rdiff -u -r1.122 -r1.123 src/sys/dev/sysmon/sysmon_envsys_events.c
 cvs rdiff -u -r1.51 -r1.52 src/sys/dev/sysmon/sysmonvar.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

Responsible-Changed-From-To: kern-bug-people->pgoyette
Responsible-Changed-By: pgoyette@NetBSD.org
Responsible-Changed-When: Fri, 31 Dec 2021 15:13:37 +0000
Responsible-Changed-Why:
grab


State-Changed-From-To: open->closed
State-Changed-By: pgoyette@NetBSD.org
State-Changed-When: Fri, 31 Dec 2021 15:13:37 +0000
State-Changed-Why:
riasatradh@ committed a fix (along with several others)


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.