NetBSD Problem Report #54189

From ozaki-r@netbsd.org  Fri May 10 09:22:58 2019
Return-Path: <ozaki-r@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 0ADBB7A16F
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 10 May 2019 09:22:58 +0000 (UTC)
Message-Id: <20190510092256.B03457A1D2@mollari.NetBSD.org>
Date: Fri, 10 May 2019 09:22:56 +0000 (UTC)
From: ozaki-r@netbsd.org
Reply-To: ozaki-r@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: IFNET_LOCK for if_mcast_op can cause a deadlock
X-Send-Pr-Version: 3.95

>Number:         54189
>Category:       kern
>Synopsis:       IFNET_LOCK for if_mcast_op can cause a deadlock
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri May 10 09:25:00 +0000 2019
>Closed-Date:    Wed Apr 17 21:56:42 +0000 2024
>Last-Modified:  Wed Apr 17 21:56:42 +0000 2024
>Originator:     Ryota Ozaki
>Release:        current and 8
>Organization:
>Environment:
System: NetBSD rangeley 8.99.38 NetBSD 8.99.38 (RANGELEY) #2: Fri May 10 13:37:40 JST 2019  ozaki-r@rangeley:(hidden) amd64
Architecture: x86_64
Machine: amd64
>Description:
From src/doc/TODO.smpnet:

  To avoid data race on if_flags it should be protected by a lock (currently it's
  IFNET_LOCK).  (snip)

  IFF_ALLMULTI can be set/unset via if_mcast_op.  To protect updates of the flag,
  we had added IFNET_LOCK around if_mcast_op.  However that was not a good
  approach because if_mcast_op is typically called in the middle of a call path
  and holding IFNET_LOCK such places is problematic.  Actually a deadlock is
  observed.  Probably we should remove IFNET_LOCK and manage IFF_ALLMULTI
  somewhere other than if_flags, for example ethercom or driver itself (or a
  common driver framework once it appears).  Such a change is feasible because
  IFF_ALLMULTI is only set/unset by a driver and not accessed from any common
  components such as network protocols.

The resource dependency graph of a deadlock I encountered is like this:
  Thread A: IFNET_LOCK => psref_target_destroy => softint
  Thread B: softint => softnet_lock
  Thread C: softnet_lock => IFNET_LOCK (for if_mcast_op) 

Note that the deadlock doesn't occur if NET_MPSAFE is NOT enabled because
psref_target_destroy is called only if NET_MPSAFE is enabled.
>How-To-Repeat:
Run userspace processes below concurrently while receive some packets destinating to the host, then wait several hours:
- a process adding/removing an IP address on an interface, and
- a process configuring/unconfiguring multicast groups on the same interface.
>Fix:
One solution is to remove IFNET_LOCK for if_mcat_op as described above and
protect operations of IFF_ALLMULTI in another way.

The below patch introduces a new field to struct ethercom (ec_flags) and store
the IFF_ALLMULTI flag to it (as ETHER_F_ALLMULTI).  So we can use ETHER_LOCK
to protect operations of the flag instead of IFNET_LOCK.

https://www.netbsd.org/~ozaki-r/allmulti.diff

One known issue of the approach is that it can't be applied to netbsd-8 because
the change breaks ABI IIUC.

>Release-Note:

>Audit-Trail:
From: "Ryota Ozaki" <ozaki-r@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54189 CVS commit: src/sys/dev/pci
Date: Tue, 14 May 2019 09:43:55 +0000

 Module Name:	src
 Committed By:	ozaki-r
 Date:		Tue May 14 09:43:55 UTC 2019

 Modified Files:
 	src/sys/dev/pci: if_wm.c
 	src/sys/dev/pci/ixgbe: ixgbe.c

 Log Message:
 Remove unnecessary checks of IFF_ALLMULTI

 IFF_ALLMULTI is changed by only driver itself, so we don't need to check its
 change on ec_ifflags_cb.

 This is part of PR kern/54189. NFCI.


 To generate a diff of this commit:
 cvs rdiff -u -r1.634 -r1.635 src/sys/dev/pci/if_wm.c
 cvs rdiff -u -r1.181 -r1.182 src/sys/dev/pci/ixgbe/ixgbe.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Ryota Ozaki" <ozaki-r@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54189 CVS commit: src/sys
Date: Wed, 15 May 2019 02:56:48 +0000

 Module Name:	src
 Committed By:	ozaki-r
 Date:		Wed May 15 02:56:48 UTC 2019

 Modified Files:
 	src/sys/dev/ic: dwc_gmac.c
 	src/sys/dev/pci: if_wm.c
 	src/sys/dev/pci/ixgbe: ixgbe.c
 	src/sys/net: if.c if_ether.h if_ethersubr.c

 Log Message:
 Store IFF_ALLMULTI in ec_flags instead of if_flags to avoid data races

 IFF_ALLMULTI is set/unset to if_flags via if_mcast_op.  To avoid data races on
 if_flags, IFNET_LOCK was added for if_mcast_op.  Unfortunately it produces
 a deadlock so we want to remove added IFNET_LOCK by avoiding the data races by
 another approach.

 This fix introduces ec_flags to struct ethercom and stores IFF_ALLMULTI to it.
 ec_flags is protected by ETHER_LOCK and thus IFNET_LOCK is no longer necessary
 for if_mcast_op.  Note that the fix is applied only to MP-safe drivers that
 the data races matter.

 In the kernel, IFF_ALLMULTI is set by a driver and used by the driver itself.
 So changing the storing place doesn't break anything.  One exception is
 ioctl(SIOCGIFFLAGS); we have to include IFF_ALLMULTI in a result if needed to
 export the flag as well as before.

 A upcoming commit will remove IFNET_LOCK.

 PR kern/54189


 To generate a diff of this commit:
 cvs rdiff -u -r1.59 -r1.60 src/sys/dev/ic/dwc_gmac.c
 cvs rdiff -u -r1.635 -r1.636 src/sys/dev/pci/if_wm.c
 cvs rdiff -u -r1.182 -r1.183 src/sys/dev/pci/ixgbe/ixgbe.c
 cvs rdiff -u -r1.451 -r1.452 src/sys/net/if.c
 cvs rdiff -u -r1.77 -r1.78 src/sys/net/if_ether.h
 cvs rdiff -u -r1.273 -r1.274 src/sys/net/if_ethersubr.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Ryota Ozaki" <ozaki-r@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/54189 CVS commit: src/sys
Date: Wed, 15 May 2019 02:59:19 +0000

 Module Name:	src
 Committed By:	ozaki-r
 Date:		Wed May 15 02:59:19 UTC 2019

 Modified Files:
 	src/sys/net: if_vlan.c
 	src/sys/netinet: in_pcb.c ip_output.c
 	src/sys/netinet6: in6_pcb.c ip6_output.c

 Log Message:
 Get rid of IFNET_LOCK for if_mcast_op to avoid a deadlock

 The IFNET_LOCK was added to avoid data races on if_flags for IFF_ALLMULTI.
 Unfortunatetly it caused a deadlock instead.  A known scenario causing a
 deadlock is to occur the following two operations concurrently: (a) a removal of
 an IP adddres assigned to an interface and (b) a manipulation of multicast
 groups to the interface.  The resource dependency graph is like this:
   softnet_lock => IFNET_LOCK => psref_target_destroy => softint => softnet_lock

 Thanks to the previous commit that avoids data races on if_flags for
 IFF_ALLMULTI by another approach, we can remove IFNET_LOCK and defuse the
 deadlock.

 PR kern/54189


 To generate a diff of this commit:
 cvs rdiff -u -r1.135 -r1.136 src/sys/net/if_vlan.c
 cvs rdiff -u -r1.182 -r1.183 src/sys/netinet/in_pcb.c
 cvs rdiff -u -r1.311 -r1.312 src/sys/netinet/ip_output.c
 cvs rdiff -u -r1.165 -r1.166 src/sys/netinet6/in6_pcb.c
 cvs rdiff -u -r1.219 -r1.220 src/sys/netinet6/ip6_output.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 17 Apr 2024 21:56:42 +0000
State-Changed-Why:
Unless you want to pull this up to 8, in some way that doesn't break
kernel ABI, I think we can consider this closed?

(If there are other drivers that need updates, let's file separate PRs
for each driver to track pullups.)


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.