NetBSD Problem Report #46780

From www@NetBSD.org  Tue Aug  7 17:14:22 2012
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id 37E5063B882
	for <gnats-bugs@gnats.NetBSD.org>; Tue,  7 Aug 2012 17:14:22 +0000 (UTC)
Message-Id: <20120807171420.9445A63B85F@www.NetBSD.org>
Date: Tue,  7 Aug 2012 17:14:20 +0000 (UTC)
From: dennis@mistimed.com
Reply-To: dennis@mistimed.com
To: gnats-bugs@NetBSD.org
Subject: tty TIOCSQSIZE can free clist buffers which drivers are still using for output
X-Send-Pr-Version: www-1.0

>Number:         46780
>Category:       kern
>Synopsis:       tty TIOCSQSIZE can free clist buffers which drivers are still using for output
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Aug 07 17:15:00 +0000 2012
>Closed-Date:    Wed Aug 22 06:36:00 +0000 2012
>Last-Modified:  Wed Aug 22 06:36:00 +0000 2012
>Originator:     Dennis Ferguson
>Release:        6.99.10
>Organization:
>Environment:
NetBSD ts1.to.mistimed.ca 6.99.10 NetBSD 6.99.10 (GENERIC) #0: Tue Jul 31 13:49:09 EDT 2012  dennis@ts1.to.mistimed.ca:/usr/obj/sys/arch/amd64/compile/GENERIC amd64
>Description:
It is common for serial device drivers to cache a pointer into the tty clist buffer along with a character count in their private softc structure for use in the hardware transmit interrupt routine.  Here is a typical example from sys/dev/ic/com.c, in comstart():

        {
                u_char *tba;
                int tbc;

                tba = tp->t_outq.c_cf;
                tbc = ndqb(&tp->t_outq, 0);

                mutex_spin_enter(&sc->sc_lock);

                sc->sc_tba = tba;
                sc->sc_tbc = tbc;
        }

Many other drivers have similar code.

The tty TIOCSQSIZE ioctl() allows the size of this clist buffer to be changed.  It does this, in sys/kern/tty.c, by calling tty_set_qsize(), which ultimately kmem_zalloc()'s a buffer of the requested new size, points the tty outq at the new buffer and then kmem_free()'s the old buffer.

The problem with this seems to be that if there is data present in old buffer when it is freed it will often be the case that the device driver will still have cached a pointer into the old buffer and will continue to transmit data from the now-free buffer until it exhausts the character count.  In addition, when the operation does complete, drivers seem to signal that the characters have now been consumed by calling ndflush() with a character count computed by pointer math which is now going to be a bit off.
>How-To-Repeat:
Do an output-buffer-sized write() to a (slower-is-better) serial port, then immediately make a TIOCSQSIZE call to change the buffer.
>Fix:
I don't think this can be fixed in the serial device drivers, they have good reasons to work the way they do.

I think what TIOCSQSIZE does with the input clists is okay, only the output has a problem.  An approach to fixing this might be to wait until the character count in the old output clist goes to zero before swapping in the new one.

>Release-Note:

>Audit-Trail:
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: 
Subject: Re: kern/46780: tty TIOCSQSIZE can free clist buffers which drivers are still using for output
Date: Sat, 11 Aug 2012 04:45:55 -0400

 On Aug 7,  5:15pm, dennis@mistimed.com (dennis@mistimed.com) wrote:
 -- Subject: kern/46780: tty TIOCSQSIZE can free clist buffers which drivers a

 | I don't think this can be fixed in the serial device drivers, they have good reasons to work the way they do.

 How about http://www.netbsd.org/~christos/tty.diff ?

 christos

From: Dennis Ferguson <dennis@mistimed.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org
Subject: Re: kern/46780: tty TIOCSQSIZE can free clist buffers which drivers are still using for output
Date: Sat, 11 Aug 2012 17:39:05 -0400

 On 11 Aug, 2012, at 04:50 , Christos Zoulas wrote:
 > | I don't think this can be fixed in the serial device drivers, they =
 have good reasons to work the way they do.
 >=20
 > How about http://www.netbsd.org/~christos/tty.diff ?

 I'd like it if it could work that way, but I'm pretty sure that may be
 dangerous.  The usual reason the drivers cache that pointer is that
 they are sending those bytes from their interrupt service routine, and
 they don't what the ISR touching anything in the struct tty (that would
 require locking that the ISR isn't doing).  In your _flush() routines
 you don't have the interrupt blocked and the update isn't atomic (i.e.
 you write two things, the pointer and the byte count) so a concurrent
 interrupt may see an inconsistent pair of values.  And worse is that
 the ISR usually updates those values as it sends stuff, so if the ISR
 is running concurrently on another core when _flush() writes those
 values the ISR might rewrite one or both back to the old values,
 slightly adjusted.

 For the _flush() routine to work reliably it needs to lock out the ISR,
 but how that is done is driver-specific; e.g. in the com driver it
 requires taking sc->sc_lock.  And while it might be possible to fix
 the drivers which exist to do that this is not guaranteed to be cheap
 and easy.  For the driver I am working on, which caused me to notice
 the problem in the first place, I was planning on not taking a spin
 lock in the ISR (and saving that overhead) since for that device it
 looks possible to produce an mpsafe driver while running the ISR
 lockless; supporting the _flush() entry would likely add the cost of
 the lock back in.  And this wouldn't fix, e.g., the VAX dhu driver
 which does DMA output directly from the output clist buffer (i.e.
 the pointer is cached inside the hardware rather than in the softc
 structure).

 While I still think it is possible to do a _flush() entry point I
 think it is going to take more driver-specific code then you have.
 The alternative of deferring the free of a non-empty output buffer's
 memory (the bookkeeping needn't be kept, just the ring buffer memory
 allocation) until the driver indicates that the hardware/interrupt
 is inactive by calling ndqb() or ndflush() is probably easier even
 if it does require some checks to be added to the 'normal' code
 path.

 Dennis Ferguson=

From: christos@zoulas.com (Christos Zoulas)
To: Dennis Ferguson <dennis@mistimed.com>, gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, 
	netbsd-bugs@netbsd.org
Subject: Re: kern/46780: tty TIOCSQSIZE can free clist buffers which drivers are still using for output
Date: Sun, 12 Aug 2012 03:24:37 -0400

 On Aug 11,  5:39pm, dennis@mistimed.com (Dennis Ferguson) wrote:
 -- Subject: Re: kern/46780: tty TIOCSQSIZE can free clist buffers which drive

 | I'd like it if it could work that way, but I'm pretty sure that may be
 | dangerous.  The usual reason the drivers cache that pointer is that
 | they are sending those bytes from their interrupt service routine, and
 | they don't what the ISR touching anything in the struct tty (that would
 | require locking that the ISR isn't doing).  In your _flush() routines
 | you don't have the interrupt blocked and the update isn't atomic (i.e.
 | you write two things, the pointer and the byte count) so a concurrent
 | interrupt may see an inconsistent pair of values.  And worse is that
 | the ISR usually updates those values as it sends stuff, so if the ISR
 | is running concurrently on another core when _flush() writes those
 | values the ISR might rewrite one or both back to the old values,
 | slightly adjusted.
 | 
 | For the _flush() routine to work reliably it needs to lock out the ISR,
 | but how that is done is driver-specific; e.g. in the com driver it
 | requires taking sc->sc_lock.  And while it might be possible to fix
 | the drivers which exist to do that this is not guaranteed to be cheap
 | and easy.  For the driver I am working on, which caused me to notice
 | the problem in the first place, I was planning on not taking a spin
 | lock in the ISR (and saving that overhead) since for that device it
 | looks possible to produce an mpsafe driver while running the ISR
 | lockless; supporting the _flush() entry would likely add the cost of
 | the lock back in.  And this wouldn't fix, e.g., the VAX dhu driver
 | which does DMA output directly from the output clist buffer (i.e.
 | the pointer is cached inside the hardware rather than in the softc
 | structure).

 I agree, the point here is that locking is too expensive. And changing
 the order of assignment is also unsafe.

 | While I still think it is possible to do a _flush() entry point I
 | think it is going to take more driver-specific code then you have.
 | The alternative of deferring the free of a non-empty output buffer's
 | memory (the bookkeeping needn't be kept, just the ring buffer memory
 | allocation) until the driver indicates that the hardware/interrupt
 | is inactive by calling ndqb() or ndflush() is probably easier even
 | if it does require some checks to be added to the 'normal' code
 | path.

 Well, we could always wait for ndflush() but I will need to verify that
 this happens. We could also make the drivers voluntarily call a function
 to get the new pointers once they consume all they bytes from the old ones.
 Or we can make the ioctl fail with ebysy if the buffer is not empty.

 christos

From: Dennis Ferguson <dennis@mistimed.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org
Subject: Re: kern/46780: tty TIOCSQSIZE can free clist buffers which drivers are still using for output
Date: Sun, 12 Aug 2012 14:15:25 +0100

 On 12 Aug, 2012, at 08:25 , Christos Zoulas wrote:

 > | While I still think it is possible to do a _flush() entry point I
 > | think it is going to take more driver-specific code then you have.
 > | The alternative of deferring the free of a non-empty output buffer's
 > | memory (the bookkeeping needn't be kept, just the ring buffer memory
 > | allocation) until the driver indicates that the hardware/interrupt
 > | is inactive by calling ndqb() or ndflush() is probably easier even
 > | if it does require some checks to be added to the 'normal' code
 > | path.
 >=20
 > Well, we could always wait for ndflush() but I will need to verify =
 that
 > this happens. We could also make the drivers voluntarily call a =
 function
 > to get the new pointers once they consume all they bytes from the old =
 ones.
 > Or we can make the ioctl fail with ebysy if the buffer is not empty.

 I only know this because I've been working on that driver, but the
 protocol most drivers seem to follow works like this:

 - call ndqb() to get the number of bytes available in the clist.  If
   there are some, start the tx interrupt/DMA to send those bytes.

 - when the above completes, call ndflush() to remove the bytes sent
   from the clist.

 When the driver calls ndqb() or ndflush(), or when the clist byte
 count is zero, it should be quite safe to assume there is no tx output
 in progress.

 Given this I'm pretty sure this might work:

 (1) Change all drivers which call ndqb() to copy the tp->t_outq.c_cf
     pointer only after the ndqb() call (or, better, make ndqb() return
     the pointer as well as the count, or something).  This avoids a
     potential race when changing what tp->t_outq.c_cf points to.

 (2) When replacing the output queue memory, if the byte count in the =
 clist
     is non-zero save a reference to the old buffer memory (no need to =
 keep the
     bookkeeping, just the buffer itself) rather than freeing it.

 (3) If the driver calls ndqb() check to see if an old buffer is present =
 and,
     if it is, free it (before returning information from the current =
 clist,
     as normal).  If the driver is calling ndqb() it is safe to dump the =
 old
     memory.

 (4) If the driver calls ndflush() when an old buffer is present, free =
 the old
     buffer and do not flush anything from the current buffer (i.e. =
 ignore the
     byte count).  If the driver is calling ndflush() whatever output it =
 was
     doing has completed, so it is safe to free the old buffer, but if =
 the old
     buffer is present the byte count it is giving you must be from the
     previous buffer and shouldn't be subtracted from the current clist.

 (5) While most drivers use the ndqb()/ndflush(), there may be a single
     character _get() routine as well (I'm doing this from memory since
     I'm traveling and away from my sources).  Any code which reads bytes
     from the outq should similarly check for and free the old buffer.

 If that's too much work, however, returning EBUSY when the output queue
 has bytes in it should also be safe.

 Dennis Ferguson=

From: "Christos Zoulas" <christos@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/46780 CVS commit: src/sys/kern
Date: Sun, 12 Aug 2012 10:45:45 -0400

 Module Name:	src
 Committed By:	christos
 Date:		Sun Aug 12 14:45:45 UTC 2012

 Modified Files:
 	src/sys/kern: tty.c

 Log Message:
 PR/46780: Dennis Ferguson: Take the easy way out and return EBUSY when changing
 the queue size if the output queue is not empty. Other solutions seemed too
 complex/fragile.


 To generate a diff of this commit:
 cvs rdiff -u -r1.250 -r1.251 src/sys/kern/tty.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Matthew Mondor <mm_lists@pulsar-zone.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/46780: tty TIOCSQSIZE can free clist buffers which drivers
 are still using for output
Date: Sun, 19 Aug 2012 19:37:47 -0400

 Addig to this PR other relevant information:

 From: christos@zoulas.com (Christos Zoulas)
 To: paul@whooppee.com, current-users@netbsd.org
 Subject: Re: Regression! kern/t_pty fails - cannot set pty's queue size
 Date: Fri, 17 Aug 2012 12:14:53 -0400
 Sender: current-users-owner@NetBSD.org
 X-Mailer: Mail User's Shell (7.2.6 beta(4.pl1)+dynamic 20000103)
 Organization: Astron Software

 On Aug 17,  5:56am, paul@whooppee.com (Paul Goyette) wrote:
 -- Subject: Re: Regression! kern/t_pty fails - cannot set pty's queue size

 | It is definitely not obvious to me how outq.c_cc can have any useful 
 | value at this point - outq has only just been allocated on the stack.

 Fixed, thanks!

 christos


 ---


 Should this be pulled up to netbsd-6?  The feature of being able to
 change the queue size (via ioctl and sysctl) appeared on netbsd-6.

 Thanks,
 -- 
 Matt

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, dennis@mistimed.com
Cc: 
Subject: Re: kern/46780: tty TIOCSQSIZE can free clist buffers which drivers are still using for output
Date: Mon, 20 Aug 2012 03:59:54 -0400

 On Aug 19, 11:40pm, mm_lists@pulsar-zone.net (Matthew Mondor) wrote:
 -- Subject: Re: kern/46780: tty TIOCSQSIZE can free clist buffers which drive

 I will ask for it. Thanks.

 christos

 | The following reply was made to PR kern/46780; it has been noted by GNATS.
 | 
 | From: Matthew Mondor <mm_lists@pulsar-zone.net>
 | To: gnats-bugs@NetBSD.org
 | Cc: 
 | Subject: Re: kern/46780: tty TIOCSQSIZE can free clist buffers which drivers
 |  are still using for output
 | Date: Sun, 19 Aug 2012 19:37:47 -0400
 | 
 |  Addig to this PR other relevant information:
 |  
 |  From: christos@zoulas.com (Christos Zoulas)
 |  To: paul@whooppee.com, current-users@netbsd.org
 |  Subject: Re: Regression! kern/t_pty fails - cannot set pty's queue size
 |  Date: Fri, 17 Aug 2012 12:14:53 -0400
 |  Sender: current-users-owner@NetBSD.org
 |  X-Mailer: Mail User's Shell (7.2.6 beta(4.pl1)+dynamic 20000103)
 |  Organization: Astron Software
 |  
 |  On Aug 17,  5:56am, paul@whooppee.com (Paul Goyette) wrote:
 |  -- Subject: Re: Regression! kern/t_pty fails - cannot set pty's queue size
 |  
 |  | It is definitely not obvious to me how outq.c_cc can have any useful 
 |  | value at this point - outq has only just been allocated on the stack.
 |  
 |  Fixed, thanks!
 |  
 |  christos
 |  
 |  
 |  ---
 |  
 |  
 |  Should this be pulled up to netbsd-6?  The feature of being able to
 |  change the queue size (via ioctl and sysctl) appeared on netbsd-6.
 |  
 |  Thanks,
 |  -- 
 |  Matt
 |  
 -- End of excerpt from Matthew Mondor


From: "Jeff Rizzo" <riz@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/46780 CVS commit: [netbsd-6] src/sys/kern
Date: Mon, 20 Aug 2012 19:15:37 +0000

 Module Name:	src
 Committed By:	riz
 Date:		Mon Aug 20 19:15:36 UTC 2012

 Modified Files:
 	src/sys/kern [netbsd-6]: tty.c

 Log Message:
 Pull up following revision(s) (requested by christos in ticket #516):
 	sys/kern/tty.c: revision 1.251
 	sys/kern/tty.c: revision 1.252
 	sys/kern/tty.c: revision 1.253
 Better (not racy fix) from Paul Goyette.
 Use the queue of the tty not garbage from the stack (Paul Goyette)
 PR/46780: Dennis Ferguson: Take the easy way out and return EBUSY when changing
 the queue size if the output queue is not empty. Other solutions seemed too
 complex/fragile.


 To generate a diff of this commit:
 cvs rdiff -u -r1.249.8.1 -r1.249.8.2 src/sys/kern/tty.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Wed, 22 Aug 2012 06:36:00 +0000
State-Changed-Why:
Fixed and pulled up to netbsd-6. Thanks.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.