NetBSD Problem Report #45093
From email@example.com Tue Jun 21 16:41:20 2011
Received: from mail.netbsd.org (mail.netbsd.org [18.104.22.168])
by www.NetBSD.org (Postfix) with ESMTP id 72D4D63BA51
for <gnats-bugs@gnats.NetBSD.org>; Tue, 21 Jun 2011 16:41:20 +0000 (UTC)
Date: Tue, 21 Jun 2011 18:41:16 +0200 (MEST)
Subject: kernel deadlock between TCP and UVM involving callouts
>Synopsis: kernel deadlock between TCP and UVM involving callouts
>Arrival-Date: Tue Jun 21 16:45:00 +0000 2011
>Last-Modified: Wed Jul 20 07:10:02 +0000 2011
>Originator: Manuel Bouyer
>Release: NetBSD 5.1
System: NetBSD armandeche.soc.lip6.fr 5.1 NetBSD 5.1 (GENERIC) #0: Sun Nov 7 14:39:56 UTC 2010 firstname.lastname@example.org:/home/builds/ab/netbsd-5-1-RELEASE/i386/201011061943Z-obj/home/builds/ab/netbsd-5-1-RELEASE/src/sys/arch/i386/compile/GENERIC i386
A deadlock condition exists in the NFS server easy to reproduce
on my server here:
The NFS server closing a socket will call uvm_unloanpage()
with softnet_lock held. uvm_unloanpage() can then kpause();
if while the nfsd's thread is paused a network callout fires
(e.g. TCP timers), it will block trying to get softnet_lock,
and the softclock thread will go to sleep. The effect is that
the kpause will not be woken up so we have a deadlock:
the softclock thread waits for softnet_lock, and the thread holding
the softnet_lock waits to be worken up by the softclock thread.
More details and stack trace in
have a NFS server with some NFS activity, some local activity
(so there is contention on vnode locks and uvm_unloanpage will
have to sleep) and enough network activity to have TCP callouts
workaround: either disable page loaning in nfs server, or
change uvm_unloanpage() to use yield() instead of kpause()
(the later has been confirmed to work around the issue).
A longer term is to avoid long-sleeping threads with softnet_lock.
For this specific case; maybe sodopendfree can be transfered to
another thread; or the socket's lock (which is softnet_lock for
TCP sockets) can be droped before calling sodopendfree and re-locked
sokva_reclaim_callback() and sokvareserve() may have the same issue,
if called with the socket locked.
From: David Holland <email@example.com>
Subject: Re: kern/45093: kernel deadlock between TCP and UVM involving
Date: Wed, 20 Jul 2011 07:07:45 +0000
For some reason this didn't go to gnats even though it contains the PR
number in what I think is supposed to be a recognized format.
From: Manuel Bouyer <firstname.lastname@example.org>
Subject: CVS commit: src/sys
Date: Sat, 2 Jul 2011 17:53:51 +0000
Module Name: src
Committed By: bouyer
Date: Sat Jul 2 17:53:51 UTC 2011
src/sys/kern: init_main.c uipc_socket.c
Fix kern/45093 as discussed on tech-kern@:
The cause of the problem is that the so_pendfree is processed with
the softnet_lock held at one point, and processing the list
calls sodoloanfree() which may kpause(). As the thread sleeps with
softnet_lock held, it ultimately cause a deadlock (see the PR or tech-kern
thread for details).
Although it should be possible to call sodopendfree() after releasing
the socket lock, it's not so easy to know where he socket lock is held and
where it's not, so we may hit the issue again later.
Add a kernel thread to handle the so_pendfree list, and wake up this
thread when adding mbufs to this list. Get rid of the various sodopendfree()
calls, hopefully fixing definitively the problem.
To generate a diff of this commit:
cvs rdiff -u -r1.432 -r1.433 src/sys/kern/init_main.c
cvs rdiff -u -r1.204 -r1.205 src/sys/kern/uipc_socket.c
cvs rdiff -u -r1.125 -r1.126 src/sys/sys/socketvar.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.