NetBSD Problem Report #47506

From www@NetBSD.org  Mon Jan 28 15:49:25 2013
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id 14D2063BA5D
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 28 Jan 2013 15:49:25 +0000 (UTC)
Message-Id: <20130128154924.6235963BA5D@www.NetBSD.org>
Date: Mon, 28 Jan 2013 15:49:24 +0000 (UTC)
From: uwe@NetBSD.org
Reply-To: uwe@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: tap(4) gets stuck in OACTIVE
X-Send-Pr-Version: www-1.0

>Number:         47506
>Category:       kern
>Synopsis:       tap(4) gets stuck in OACTIVE
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jan 28 15:50:09 +0000 2013
>Closed-Date:    Tue Jun 03 17:00:20 +0000 2014
>Last-Modified:  Tue Jun 03 17:00:20 +0000 2014
>Originator:     Valery Ushakov
>Release:        NetBSD 6
>Organization:
>Environment:
NetBSD amd64 6.0_STABLE NetBSD 6.0_STABLE (GENERIC) #0: Sun Nov 18 04:21:07 MSK 2012  uwe@amd64:/home/uwe/work/netbsd/cvs/src-release-6/sys/arch/amd64/compile/GENERIC amd64

>Description:
It seems that under load tap(4) get stuck in a state where it has
OACTIVE flag set, but poll(2) on the tap's fd doesn't return POLLIN.

>How-To-Repeat:
I'm playing with lwIP tcp/ip stack.  It uses tap(4) to talk to the 
ethernet:

# Create tap(4) interface for lwIP
ifconfig tap1 create
ifconfig tap1 up

# Bridge it to the network
ifconfig bridge0 create
brconfig bridge0 add tap1 add wm1
brconfig bridge0 up

The code to read from tap(4) does something along these lines:

for (;;) {
  poll( [{ tapfd, POLLIN }] );
  read(tapfd, packet);
  post packet to tcp/ip thread;
}

If I throw enough incoming traffic load at it (benchmarks/netperf),
the loop above gets stuck.  It sits in poll(2) and never returns.
Meanwhile the tap(4) has OACTIVE flag set, and bridge(4) just enqueues
new frames and doesn't call if_start (the very end of bridge_enqueue()
function)

Since poll is redundant here (the read(2) is blocking anyway), I can
work around this problem by just dropping the poll(2).  In that case
read(2) does complete successfully and the loop is not stuck. However,
in a situation where poll(2)'ing was indeed required by the structure
of the code, the bug would be impossible to avoid.

>Fix:

>Release-Note:

>Audit-Trail:

From: Quentin Garnier <cube@cubidou.net>
To: gnats-bugs@NetBSD.org
Cc: tech-net@netbsd.org
Subject: re: kern/47506
Date: Fri, 2 May 2014 03:16:43 +0000

 Hi,

 It seems to me there is a race between tap_start() and tap_dev_poll().
 The splx() call in tap_dev_poll() needs to be moved after the call to
 selrecord().

 However, I'm not entirely certain how that interacts with the spin
 mutex that protects the selrecord() call.

 The kqueue(2) code path seems fine.

 Would someone less rusty than me care to comment?

 -- 
 Quentin Garnier - cube@cubidou.net
 "See the look on my face from staying too long in one place
 [...] every time the morning breaks I know I'm closer to falling"
 KT Tunstall, Saving My Face, Drastic Fantastic, 2007.

From: Nick Hudson <skrll@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: Quentin Garnier <cube@cubidou.net>, kern-bug-people@netbsd.org, 
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, uwe@NetBSD.org
Subject: Re: kern/47506
Date: Fri, 02 May 2014 08:12:54 +0100

 On 05/02/14 05:35, Quentin Garnier wrote:
 > The following reply was made to PR kern/47506; it has been noted by GNATS.
 >
 > From: Quentin Garnier <cube@cubidou.net>
 > To: gnats-bugs@NetBSD.org
 > Cc: tech-net@netbsd.org
 > Subject: re: kern/47506
 > Date: Fri, 2 May 2014 03:16:43 +0000
 >
 >   Hi,
 >   
 >   It seems to me there is a race between tap_start() and tap_dev_poll().
 >   The splx() call in tap_dev_poll() needs to be moved after the call to
 >   selrecord().
 >   
 >   However, I'm not entirely certain how that interacts with the spin
 >   mutex that protects the selrecord() call.

 mutex(9) and spl(9) interact appropriately :)

 First  spin mutex_enter saves spl and last spin mutex_exit restores that 
 spl.

 Nick

State-Changed-From-To: open->closed
State-Changed-By: cube@NetBSD.org
State-Changed-When: Tue, 03 Jun 2014 17:00:20 +0000
State-Changed-Why:
Fixed and pulled up to -6.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.