NetBSD Problem Report #45093

From bouyer@asim.lip6.fr  Tue Jun 21 16:41:20 2011
Return-Path: <bouyer@asim.lip6.fr>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 72D4D63BA51
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 21 Jun 2011 16:41:20 +0000 (UTC)
Message-Id: <20110621164116.6A70F34C41@armandeche.soc.lip6.fr>
Date: Tue, 21 Jun 2011 18:41:16 +0200 (MEST)
From: bouyer@asim.lip6.fr
Reply-To: bouyer@asim.lip6.fr
To: gnats-bugs@gnats.NetBSD.org
Subject: kernel deadlock between TCP and UVM involving callouts
X-Send-Pr-Version: 3.95

>Number:         45093
>Category:       kern
>Synopsis:       kernel deadlock between TCP and UVM involving callouts
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jun 21 16:45:00 +0000 2011
>Last-Modified:  Wed Jul 20 07:10:02 +0000 2011
>Originator:     Manuel Bouyer
>Release:        NetBSD 5.1
>Organization:
>Environment:
System: NetBSD armandeche.soc.lip6.fr 5.1 NetBSD 5.1 (GENERIC) #0: Sun Nov 7 14:39:56 UTC 2010 builds@b6.netbsd.org:/home/builds/ab/netbsd-5-1-RELEASE/i386/201011061943Z-obj/home/builds/ab/netbsd-5-1-RELEASE/src/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:
	A deadlock condition exists in the NFS server easy to reproduce
	on my server here:
	The NFS server closing a socket will call uvm_unloanpage()
	(trough soclose->sodisconnect->sodopendfree->sodopendfreel)
	with softnet_lock held. uvm_unloanpage() can then kpause();
	if while the nfsd's thread is paused a network callout fires
	(e.g. TCP timers), it will block trying to get softnet_lock,
	and the softclock thread will go to sleep. The effect is that
	the kpause will not be woken up so we have a deadlock:
	the softclock thread waits for softnet_lock, and the thread holding
	the softnet_lock waits to be worken up by the softclock thread.

	More details and stack trace in
	http://mail-index.netbsd.org/tech-kern/2011/06/17/msg010734.html


>How-To-Repeat:
	have a NFS server with some NFS activity, some local activity
	(so there is contention on vnode locks and uvm_unloanpage will
	have to sleep) and enough network activity to have TCP callouts
	pending.
>Fix:
	workaround: either disable page loaning in nfs server, or
	change uvm_unloanpage() to use yield() instead of kpause()
	(the later has been confirmed to work around the issue).

	A longer term is to avoid long-sleeping threads with softnet_lock.
	For this specific case; maybe sodopendfree can be transfered to
	another thread; or the socket's lock (which is softnet_lock for
	TCP sockets) can be droped before calling sodopendfree and re-locked
	after.

	sokva_reclaim_callback() and sokvareserve() may have the same issue,
	if called with the socket locked.

>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/45093: kernel deadlock between TCP and UVM involving
 callouts
Date: Wed, 20 Jul 2011 07:07:45 +0000

 For some reason this didn't go to gnats even though it contains the PR
 number in what I think is supposed to be a recognized format.

    ------

 From: Manuel Bouyer <bouyer@netbsd.org>
 To: source-changes@NetBSD.org
 Subject: CVS commit: src/sys
 Date: Sat, 2 Jul 2011 17:53:51 +0000
 Mail-Followup-To: source-changes-d@NetBSD.org

 Module Name:	src
 Committed By:	bouyer
 Date:		Sat Jul  2 17:53:51 UTC 2011

 Modified Files:
 	src/sys/kern: init_main.c uipc_socket.c
 	src/sys/sys: socketvar.h

 Log Message:
 Fix kern/45093 as discussed on tech-kern@:
 http://mail-index.netbsd.org/tech-kern/2011/06/17/msg010734.html

 The cause of the problem is that the so_pendfree is processed with
 the softnet_lock held at one point, and processing the list
 calls sodoloanfree() which may kpause(). As the thread sleeps with
 softnet_lock held, it ultimately cause a deadlock (see the PR or tech-kern
 thread for details).
 Although it should be possible to call sodopendfree() after releasing
 the socket lock, it's not so easy to know where he socket lock is held and
 where it's not, so we may hit the issue again later.
 Add a kernel thread to handle the so_pendfree list, and wake up this
 thread when adding mbufs to this list. Get rid of the various sodopendfree()
 calls, hopefully fixing definitively the problem.


 To generate a diff of this commit:
 cvs rdiff -u -r1.432 -r1.433 src/sys/kern/init_main.c
 cvs rdiff -u -r1.204 -r1.205 src/sys/kern/uipc_socket.c
 cvs rdiff -u -r1.125 -r1.126 src/sys/sys/socketvar.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.