NetBSD Problem Report #37915

From dholland@eecs.harvard.edu  Wed Jan 30 01:27:30 2008
Return-Path: <dholland@eecs.harvard.edu>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id 3A7C363B101
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 30 Jan 2008 01:27:30 +0000 (UTC)
Message-Id: <20080130012855.BAC44F905@tanaqui.eecs.harvard.edu>
Date: Tue, 29 Jan 2008 20:28:55 -0500 (EST)
From: dholland@eecs.harvard.edu
Reply-To: dholland@eecs.harvard.edu
To: gnats-bugs@gnats.NetBSD.org
Subject: vt100 + wscons + tty vmlocking changes = panic
X-Send-Pr-Version: 3.95

>Number:         37915
>Category:       kern
>Synopsis:       vt100 + wscons + tty vmlocking changes = panic
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jan 30 01:30:00 +0000 2008
>Closed-Date:    
>Last-Modified:  Mon Apr 09 05:58:46 +0000 2012
>Originator:     David A. Holland <dholland@eecs.harvard.edu>
>Release:        NetBSD 4.99.50 (20080127)
>Organization:
>Environment:
System: NetBSD valkyrie 4.99.50 NetBSD 4.99.50 (VALKYRIE) #3: Sun Jan 27 14:54:46 EST 2008 root@valkyrie:/usr/src/sys/arch/amd64/compile/VALKYRIE amd64
Architecture: x86_64
Machine: amd64
>Description:

Sending some VT100 control sequences to wscons will panic the kernel.

The problem is that ttwrite() holds tty_lock while calling ttstart();
this makes its way down to wsemul_vt100_handle_csi(), which for
several sequences that report stuff back calls wsdisplay_emulinput(),
which calls ttyinput(), which tries to get tty_lock again.

>How-To-Repeat:

Run some program that sends such a sequence (e.g. tset) or echo one
manually, while logged in on the console.

>Fix:

One could make the lock recursive, but that's gross. Probably the best
bet would be to make another entry in linesw (l_rintlocked?) for when
one already holds the tty lock, and use that in wsdisplay_emulinput().

Or just ("just") hunt down all (other) callers of l_rint and make them
get the lock first. The comment above tty_input suggests this is the
eventual plan anyway...

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->ad
Responsible-Changed-By: ad@narn.netbsd.org
Responsible-Changed-When: Wed, 30 Jan 2008 08:40:32 +0000
Responsible-Changed-Why:
take


State-Changed-From-To: open->analyzed
State-Changed-By: ad@NetBSD.org
State-Changed-When: Fri, 30 May 2008 16:28:28 +0000
State-Changed-Why:
'resize' is a good test case.

I'm fixing all the tty locking so this is no longer a problem. All the linesw
methods will need to be called with tty_lock held.


From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/37915: vt100 + wscons + tty vmlocking changes = panic
Date: Mon, 8 Dec 2008 17:36:47 +0000

 On Wed, Jan 30, 2008 at 01:30:00AM +0000, dholland@eecs.harvard.edu wrote:
  > >Synopsis:       vt100 + wscons + tty vmlocking changes = panic
  >
  > Sending some VT100 control sequences to wscons will panic the kernel.
  > 
  > The problem is that ttwrite() holds tty_lock while calling ttstart();
  > this makes its way down to wsemul_vt100_handle_csi(), which for
  > several sequences that report stuff back calls wsdisplay_emulinput(),
  > which calls ttyinput(), which tries to get tty_lock again.

 This still happens with up-to-the-minute current.

 -- 
 David A. Holland
 dholland@netbsd.org

From: David Holland <dholland-bugs@netbsd.org>
To: dholland@eecs.harvard.edu
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: kern/37915: vt100 + wscons + tty vmlocking changes = panic
Date: Fri, 2 Jan 2009 09:07:32 +0000

 On Wed, Jan 30, 2008 at 01:30:00AM +0000, dholland@eecs.harvard.edu wrote:
  > The problem is that ttwrite() holds tty_lock while calling ttstart();
  > this makes its way down to wsemul_vt100_handle_csi(), which for
  > several sequences that report stuff back calls wsdisplay_emulinput(),
  > which calls ttyinput(), which tries to get tty_lock again.

 Making a small and noninvasive patch for this (as ad@ requested) turns
 out to be not so trivial, since it turns out that spin mutexes don't
 track who's holding them. (And for a while I thought tty_lock was
 being released and reacquired somewhere inside ttinput, which would
 have made it just about impossible, but that turns out to have been a
 red herring.)

 Thus, while the following patch could be considered a workaround, I
 can't with a straight face call it a fix.

 Index: tty.c
 ===================================================================
 RCS file: /cvsroot/src/sys/kern/tty.c,v
 retrieving revision 1.228
 diff -u -p -r1.228 tty.c
 --- tty.c	19 Nov 2008 18:36:07 -0000	1.228
 +++ tty.c	2 Jan 2009 09:01:35 -0000
 @@ -202,6 +202,10 @@ kmutex_t tty_lock;
  krwlock_t ttcompat_lock;
  int (*ttcompatvec)(struct tty *, u_long, void *, int, struct lwp *);

 +/* XXX: names the cpu that's currently doing ttstart(). See PR 37915. */
 +static struct cpu_info *in_ttstart_hack_cpu;
 +static unsigned in_ttstart_hack_depth;
 +
  uint64_t tk_cancc;
  uint64_t tk_nin;
  uint64_t tk_nout;
 @@ -702,11 +706,20 @@ ttyinput_wlock(int c, struct tty *tp)
   *
   * XXX - this is a hack, all drivers must changed to acquire the
   *	 lock before calling linesw->l_rint()
 + *
 + * XXX - it is a bigger hack; wscons sometimes already has the lock.
 + *	 We record which cpu is in ttstart (and how many times) to
 + *	 avoid re-entering the lock when we come back here via wscons.
 + *	 This is a hack for a specific known case, not a general fix.
 + *	 Note that we're also misusing mutex_owned, but the specific
 + *	 usage ought to work and a real fix is planned for later.
 + *	 PR 37915.
   */
  int
  ttyinput(int c, struct tty *tp)
  {
  	int error;
 +	int dolock = 1;

  	/*
  	 * Unless the receiver is enabled, drop incoming data.
 @@ -714,9 +727,16 @@ ttyinput(int c, struct tty *tp)
  	if (!ISSET(tp->t_cflag, CREAD))
  		return (0);

 -	mutex_spin_enter(&tty_lock);
 +	/* The bigger hack. */
 +	if (mutex_owned(&tty_lock) && in_ttstart_hack_cpu == curcpu()) {
 +		dolock = 0;
 +	}
 +
 +	if (dolock)
 +		mutex_spin_enter(&tty_lock);
  	error = ttyinput_wlock(c, tp);
 -	mutex_spin_exit(&tty_lock);
 +	if (dolock)
 +		mutex_spin_exit(&tty_lock);

  	return (error);
  }
 @@ -1551,9 +1571,39 @@ ttrstrt(void *tp_arg)
  int
  ttstart(struct tty *tp)
  {
 +	struct cpu_info *mycpu;
 +
 +	mycpu = curcpu();
 +	KASSERT(mutex_owned(&tty_lock));
 +
 +	/*
 +	 * Track who's in ttstart. See comment above ttyinput(), and
 +	 * PR 37915. If someone else was still in ttstart, overwrite
 +	 * them. This appears to be necessary because something
 +	 * releases and reacquires tty_lock inside ->t_oproc; I'm
 +	 * hoping it doesn't call ttyinput() again after that.
 +	 */
 +	if (in_ttstart_hack_cpu == mycpu) {
 +		in_ttstart_hack_depth++;
 +	} else if (in_ttstart_hack_cpu == NULL) {
 +		in_ttstart_hack_cpu = mycpu;
 +		in_ttstart_hack_depth = 1;
 +	} else {
 +		panic("ttstart: unexpected reentrant call "
 +		      "(tty_lock released inside t_oproc?)\n");
 +	}

  	if (tp->t_oproc != NULL)	/* XXX: Kludge for pty. */
  		(*tp->t_oproc)(tp);
 +
 +	KASSERT(mycpu == curcpu());
 +	KASSERT(in_ttstart_hack_cpu == mycpu);
 +	KASSERT(in_ttstart_hack_depth > 0);
 +	in_ttstart_hack_depth--;
 +	if (in_ttstart_hack_depth == 0) {
 +		in_ttstart_hack_cpu = NULL;
 +	}
 +
  	return (0);
  }



 -- 
 David A. Holland
 dholland@netbsd.org

State-Changed-From-To: analyzed->feedback
State-Changed-By: drochner@NetBSD.org
State-Changed-When: Thu, 22 Jan 2009 20:47:56 +0000
State-Changed-Why:
applied a workaround which is as small and noninvasive I can think of,
surely it needs a fix in the tty framework, but the assumptions
this is based on are not unrealistic, so this might get a pullup candidate


State-Changed-From-To: feedback->open
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 07 Feb 2009 07:09:55 +0000
State-Changed-Why:
drochner@'s workaround is committed and pulled up to netbsd-5; however, we
should leave this open until we get some real fixes done. I'll adjust
severity.


Responsible-Changed-From-To: ad->kern-bug-people
Responsible-Changed-By: dholland@NetBSD.org
Responsible-Changed-When: Mon, 09 Apr 2012 05:58:46 +0000
Responsible-Changed-Why:
ad resigned, should not own PRs any more


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.