NetBSD Problem Report #36864

From Wolfgang.Stukenbrock@nagler-company.com  Thu Aug 30 10:04:48 2007
Return-Path: <Wolfgang.Stukenbrock@nagler-company.com>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by narn.NetBSD.org (Postfix) with ESMTP id 79A6B63B8B5
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 30 Aug 2007 10:04:48 +0000 (UTC)
Message-Id: <200708301004.l7UA4eog012731@test-s0.nagler-company.com>
Date: Thu, 30 Aug 2007 12:04:40 +0200 (CEST)
From: Wolfgang Stukenbrock <Wolfgang.Stukenbrock@nagler-company.com>
Reply-To: Wolfgang.Stukenbrock@nagler-company.com
To: gnats-bugs@NetBSD.org
Subject: opencrypto - deflate failed to decompress large packets
X-Send-Pr-Version: 3.95

>Number:         36864
>Category:       kern
>Synopsis:       opencrypto - deflate failed to decompress large packets
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Aug 30 10:05:00 +0000 2007
>Closed-Date:    Tue Mar 08 16:58:05 +0000 2011
>Last-Modified:  Tue Mar 08 16:58:05 +0000 2011
>Originator:     Wolfgang Stukenbrock
>Release:        NetBSD 3.1
>Organization:
Dr. Nagler & Company GmbH

>Environment:


System: NetBSD test-s0 3.1 NetBSD 3.1 (test-s0) #0: Tue Apr 3 11:33:43 CEST 2007 root@test-s0:/usr/src/sys/arch/i386/compile/test-s0 i386
Architecture: i386
Machine: i386
>Description:
	When packets with e.g. all 0 in it gets compressed the compression 
	will have a verygood result, but the current implemntation failed
	to decompress it again.
	This may happen, if you use NFS though an ipsec-tunnel with compression.
	The main problem is that the decompression stuff uses the current
	size as an estimation for the resulting size with a limit of 8 additional
	malloc() expansion.
	The result of this bug is, that you cannot use ipcomp for tunnels if
	you are tunneling NFS connections trough it.
	The buggy code is located in /usr/src/sys/opencrypto/deflate.c.
>How-To-Repeat:
	setup an ipcomp-connection and try to send the first 10k of /dev/zero
	through it ...
>Fix:
	This fix is not very best one, but it solves the problem.
	The idea is to increase the allocated size to a minimum, even if
	there is a very small packet. I've decided to use 512 bytes as the min.
	in order to get all normal packets decompressed in one block in order
	to speed up the whole thing.

*** deflate.c   2007/08/30 09:51:30     1.1
--- deflate.c   2007/08/30 09:54:40
***************
*** 104,109 ****
--- 104,117 ----
                i++;
        } else {
                /*
+                * workaround a problem with very goo compression
+                * e.g. a NFS packet with all 0 in it gets compressed to something
+                * around 65 bytes, but it will expand to 1448 bytes.
+                * We usse this code only for ipsec processing - so speedup everything
+                * by increasing size to at 512 bytes.
+                */
+               if (size < 512) size = 512;
+               /*
                 * Choose a buffer with 4x the size of the input buffer
                 * for the size of the output buffer in the case of
                 * decompression. If it's not sufficient, it will need to be


>Release-Note:

>Audit-Trail:

State-Changed-From-To: open->feedback
State-Changed-By: drochner@NetBSD.org
State-Changed-When: Fri, 18 Feb 2011 22:06:47 +0000
State-Changed-Why:
deflate.c introduced exponential growth of the output buffer which
should limit the number of allocations to a small one-digit number.
I'd add a larger initial allocation, but I'd like to have some
real-world statistics first.


From: "Matthias Drochner" <drochner@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/36864 CVS commit: src/sys/opencrypto
Date: Fri, 18 Feb 2011 22:02:10 +0000

 Module Name:	src
 Committed By:	drochner
 Date:		Fri Feb 18 22:02:09 UTC 2011

 Modified Files:
 	src/sys/opencrypto: deflate.c

 Log Message:
 redo result buffer allocation, to avoid dynamic allocations:
 -use exponentially growing buffer sizes instead of just linear extension
 -drop the dynamic allocation of buffer metadata introduced in rev.1.8 --
  if the initial array is not sufficient something is wrong
 -apply some (arbitrary, heuristic) limit so that compressed data
  which extract into insane amounts of constant data don't kill the system
 This addresses PR kern/36864 by Wolfgang Stukenbrock. Some tuning
 might be useful, but hopefully this is an improvement already.


 To generate a diff of this commit:
 cvs rdiff -u -r1.17 -r1.18 src/sys/opencrypto/deflate.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Wolfgang Stukenbrock <Wolfgang.Stukenbrock@nagler-company.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org,
        Wolfgang.Stukenbrock@nagler-company.com
Subject: Re: PR/36864 CVS commit: src/sys/opencrypto
Date: Mon, 21 Feb 2011 15:49:31 +0100

 Hi,

 I've had a look at it.

 In general it is a good idea - see conclusion below.

 Let's do some calculations ...

 I've compressed /dev/zero with gzip -9 - result:
     size     output
      1024 ->   29  -> factor 35
     10240 ->   45  -> factor 227
    102400 ->  133  -> factor 769
   1024000 -> 1026  -> factor 998
 10240000 -> 9981  -> factor 1025

 remark: 1024000000 -> 995308 -> factor 1028, it looks like we won't get 
 much better factors at all.

 ZBUF is 10 at the moment, so we get the following size factors,
 aggregated size factors and buffer sizes (assumed last is 1000000)

     4 -    4 -    1950
     8 -   12 -    3901
    16 -   28 -    7812
    32 -   60 -   15625
    64 -  124 -   31250
   128 -  292 -   62500
   256 -  548 -  125000
   512 - 1060 -  230000
 1024 - 2084 -  500000
 2048 - 4132 - 1000000 -> max buffer 1978038 (near 2 MB)

 remark: If we will never get much better factor than 1030 at all, ZBUF = 
 8 would be large enougth. My test above seems to point to that.

 There are two use-cases for the opencrypto stuff - as far as I know:
 1. IPSEC
 2. /dev/crypto

 Let's look at 1 first:
 The reasonable max. size for network packets for common network types is 
 1500 or 9k for jumbo frames.
 So lets assume 10k. This will lead to a max factor of 227 in compressing 
 only 0 bytes. This mean, you have 45 bytes of input size, resulting in a 
 180 byte start buffer. In order to do the decompression you need 6 
 allocations (180+360+720+1440+2880+5760 = 11340).
 This is much better than 10240/180 = 57 buffer allocations.

 In my PR, I've mentioned a minimum start size for the algorithm. I've 
 fixed it to 2k with the idea that a power of two is a good idea and all 
 normal network packets will fit into one buffer when decompressed.
 Till now I think that it would be a good idea to create a sysctl 
 variable that defines a minimum start buffer size that should be used if 
 4*size is smaller. The variable should have a default value of 0 - no 
 minimum size.
 This allows the admin to set it to e.g. 1500 (or 9k if jumbo frames are 
 used for compressed traffic) so that "normal" network packets will fit 
 into one buffer regardless of the compression factor.
 For 10k pakets 1500 will lead to only 3 allocations (1500, 3000, 6000)
 I think this would be a even better performance boost for IPSEC, because 
 if we can get everything into one buffer, we avoid copying everything 
 again at end of decompression ...


 Let's look at 2:
 In general we cannot assume anything about the size that gets compressed 
 or decompressed.
 The limit of 1000000 Bytes max. chunk size may be a very hard thing 
 here. (Sorry, I do not use /dev/crypto, so I don't know anything about 
 realy reasonable sizes.)
 In general, I think for /dev/crypto it would be better to allow any size 
 for compression/decompression and do the copyin/copyout in parts. (There 
 must be a reason why all compression lib interfaces are designed like 
 this ...)
 Again, I do not know if this would break some semanticts of the 
 interface for /dev/crypto with overlapping in-/out-buffers.
 And I've no Idea if all hardware devices would be able to handle this.
 Perhaps there is already an interface mode for /dev/crypto that allows 
 partial work and continuations - I've never looked for this up to now.
 remark: That may conflict with the change done some PR's before by 
 setting Z_FINISH in compression - that needs to get "tunable" to support 
 this ...

 The current Patch will limit decompression to something a little bit 
 less 2MB. This may be OK, but I would be surprised if it is, because 2MB 
 is not very much for user-level stuff.


 Conclusion:
 For IPSEC this change is a great step into the right direction.
 But I recommend adding a sysctl variable with a minimum start size too.
 Due to the fact that I do not use /dev/crypto, I cannot decide if the 
 limit to ca. 2MB in decompression is OK or not.


 Last, some ideas from my side for /dev/crypto:

 For /dev/crypto usage I think that the internal interfaces of opencrypto 
 are not optimal.
 The aproach to get everything into some "objects" and operate on them, 
 creating new objects, sounds like a commonly used aproach in object 
 oriented programming.
 This aproach is not optimal for data-stream processing, because it 
 assumes that we have endless memory if we are going to process larger 
 amounts of data - it will not scale. It is no good idea to do this on 
 user level and inside the kernel such an aproach has lost nothing - my 
 oppion.
 The current patch will speed up /dev/crypto work, so it is a step into 
 the right direction for /dev/crypto usage too.
 In the interface of /dev/crypto the user needs to reserve enougth 
 user-level memory to copy back everything.
 So the user-process knows something about the expected size of the 
 thing. (Or makes a dummy call just in order to get the required size 
 information without any data transfer.)
 Perhaps this can be used to do some even better heuristics when 
 allocating buffer space. If there are some limit-checks at /dev/crypto 
 syscall level, the size used there should be used here too. (At least a 
 comment that references theese values is missing here.)
 Perhaps a max result size parameter makes sence in the calling interface 
 of deflate/gzip-routines, to avoid useless decompression that cannot be 
 copied back to user level due to size reasons. Or the "rest" gets 
 decompressed into a static garbage buffer in order get the result-size, 
 if required.
 The "best sollution" would be (from my point of view), if the whole 
 internal interface get changed, so that partial data transfer may 
 directly happen from/to the user level and the result-size gets 
 allocated (and expanded if needed) by the caller.
 Of cause this would break interchangeability with other implementations 
 and I'm not shure it this realy make sense.
 I also simply do not know what kind of "productive" user-level programs 
 are "designed" to use compresion via /dev/crypto.
 All network data stream related things should be inside of the kernel - 
 my oppinion - for performance reasons. Sorry, I'm no friend of placeing 
 kernel functionality (mainly data stream or network packet processing) 
 in user-level programs.
 (racoon (IKE) and IPSEC is a good example here: (fast) data-stream 
 processing inside the kernel - key-exchange, access validation and other 
 complex things in a user-level program that configures the kernel IPSEC 
 parts.)
 For testing and development of something /dev/crypto might be a very 
 nice thing, of cause.

 Best regards

 W. Stukenbrock


 Matthias Drochner wrote:

 > The following reply was made to PR kern/36864; it has been noted by GNATS.
 > 
 > From: "Matthias Drochner" <drochner@netbsd.org>
 > To: gnats-bugs@gnats.NetBSD.org
 > Cc: 
 > Subject: PR/36864 CVS commit: src/sys/opencrypto
 > Date: Fri, 18 Feb 2011 22:02:10 +0000
 > 
 >  Module Name:	src
 >  Committed By:	drochner
 >  Date:		Fri Feb 18 22:02:09 UTC 2011
 >  
 >  Modified Files:
 >  	src/sys/opencrypto: deflate.c
 >  
 >  Log Message:
 >  redo result buffer allocation, to avoid dynamic allocations:
 >  -use exponentially growing buffer sizes instead of just linear extension
 >  -drop the dynamic allocation of buffer metadata introduced in rev.1.8 --
 >   if the initial array is not sufficient something is wrong
 >  -apply some (arbitrary, heuristic) limit so that compressed data
 >   which extract into insane amounts of constant data don't kill the system
 >  This addresses PR kern/36864 by Wolfgang Stukenbrock. Some tuning
 >  might be useful, but hopefully this is an improvement already.
 >  
 >  
 >  To generate a diff of this commit:
 >  cvs rdiff -u -r1.17 -r1.18 src/sys/opencrypto/deflate.c
 >  
 >  Please note that diffs are not public domain; they are subject to the
 >  copyright notices on the relevant files.
 >  
 > 


From: Wolfgang Stukenbrock <Wolfgang.Stukenbrock@nagler-company.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org,
        Wolfgang.Stukenbrock@nagler-company.com
Subject: Re: PR/36864 CVS commit: src/sys/opencrypto
Date: Mon, 07 Mar 2011 10:27:11 +0100

 Hi again,

 due to the fact that this reports is still in feedback-mode, I got a 
 reminder mail again.

 The last look into the actual implementation shows that there is now a 
 hint for the size is passed from the caller to the decompression stuff 
 (the o_len field is used for that).

 This makes an additional sysctl-variable - as mentioned from my side 
 before - obsolete. The use of MCLBYTES is a good idea and avoids any 
 additional allocation and/or copy operation for normal MTU sizes. For 
 Jumbo frames only additional allocations are required on very good 
 (better than factor 4) compression.
 So from the view of IPSEC I think this PR can be closed.

 I had no look the /dev/crypto part again. So I cannot decide if this PR 
 can be closed from the view of /dev/crypto too.

 But in any case, it should be changed from feedback to something else.

 Thanks

 W. Stukenbrock

 Wolfgang Stukenbrock wrote:

 > The following reply was made to PR kern/36864; it has been noted by GNATS.
 > 
 > From: Wolfgang Stukenbrock <Wolfgang.Stukenbrock@nagler-company.com>
 > To: gnats-bugs@NetBSD.org
 > Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org,
 >         Wolfgang.Stukenbrock@nagler-company.com
 > Subject: Re: PR/36864 CVS commit: src/sys/opencrypto
 > Date: Mon, 21 Feb 2011 15:49:31 +0100
 > 
 >  Hi,
 >  
 >  I've had a look at it.
 >  
 >  In general it is a good idea - see conclusion below.
 >  
 >  Let's do some calculations ...
 >  
 >  I've compressed /dev/zero with gzip -9 - result:
 >      size     output
 >       1024 ->   29  -> factor 35
 >      10240 ->   45  -> factor 227
 >     102400 ->  133  -> factor 769
 >    1024000 -> 1026  -> factor 998
 >  10240000 -> 9981  -> factor 1025
 >  
 >  remark: 1024000000 -> 995308 -> factor 1028, it looks like we won't get 
 >  much better factors at all.
 >  
 >  ZBUF is 10 at the moment, so we get the following size factors,
 >  aggregated size factors and buffer sizes (assumed last is 1000000)
 >  
 >      4 -    4 -    1950
 >      8 -   12 -    3901
 >     16 -   28 -    7812
 >     32 -   60 -   15625
 >     64 -  124 -   31250
 >    128 -  292 -   62500
 >    256 -  548 -  125000
 >    512 - 1060 -  230000
 >  1024 - 2084 -  500000
 >  2048 - 4132 - 1000000 -> max buffer 1978038 (near 2 MB)
 >  
 >  remark: If we will never get much better factor than 1030 at all, ZBUF = 
 >  8 would be large enougth. My test above seems to point to that.
 >  
 >  There are two use-cases for the opencrypto stuff - as far as I know:
 >  1. IPSEC
 >  2. /dev/crypto
 >  
 >  Let's look at 1 first:
 >  The reasonable max. size for network packets for common network types is 
 >  1500 or 9k for jumbo frames.
 >  So lets assume 10k. This will lead to a max factor of 227 in compressing 
 >  only 0 bytes. This mean, you have 45 bytes of input size, resulting in a 
 >  180 byte start buffer. In order to do the decompression you need 6 
 >  allocations (180+360+720+1440+2880+5760 = 11340).
 >  This is much better than 10240/180 = 57 buffer allocations.
 >  
 >  In my PR, I've mentioned a minimum start size for the algorithm. I've 
 >  fixed it to 2k with the idea that a power of two is a good idea and all 
 >  normal network packets will fit into one buffer when decompressed.
 >  Till now I think that it would be a good idea to create a sysctl 
 >  variable that defines a minimum start buffer size that should be used if 
 >  4*size is smaller. The variable should have a default value of 0 - no 
 >  minimum size.
 >  This allows the admin to set it to e.g. 1500 (or 9k if jumbo frames are 
 >  used for compressed traffic) so that "normal" network packets will fit 
 >  into one buffer regardless of the compression factor.
 >  For 10k pakets 1500 will lead to only 3 allocations (1500, 3000, 6000)
 >  I think this would be a even better performance boost for IPSEC, because 
 >  if we can get everything into one buffer, we avoid copying everything 
 >  again at end of decompression ...
 >  
 >  
 >  Let's look at 2:
 >  In general we cannot assume anything about the size that gets compressed 
 >  or decompressed.
 >  The limit of 1000000 Bytes max. chunk size may be a very hard thing 
 >  here. (Sorry, I do not use /dev/crypto, so I don't know anything about 
 >  realy reasonable sizes.)
 >  In general, I think for /dev/crypto it would be better to allow any size 
 >  for compression/decompression and do the copyin/copyout in parts. (There 
 >  must be a reason why all compression lib interfaces are designed like 
 >  this ...)
 >  Again, I do not know if this would break some semanticts of the 
 >  interface for /dev/crypto with overlapping in-/out-buffers.
 >  And I've no Idea if all hardware devices would be able to handle this.
 >  Perhaps there is already an interface mode for /dev/crypto that allows 
 >  partial work and continuations - I've never looked for this up to now.
 >  remark: That may conflict with the change done some PR's before by 
 >  setting Z_FINISH in compression - that needs to get "tunable" to support 
 >  this ...
 >  
 >  The current Patch will limit decompression to something a little bit 
 >  less 2MB. This may be OK, but I would be surprised if it is, because 2MB 
 >  is not very much for user-level stuff.
 >  
 >  
 >  Conclusion:
 >  For IPSEC this change is a great step into the right direction.
 >  But I recommend adding a sysctl variable with a minimum start size too.
 >  Due to the fact that I do not use /dev/crypto, I cannot decide if the 
 >  limit to ca. 2MB in decompression is OK or not.
 >  
 >  
 >  Last, some ideas from my side for /dev/crypto:
 >  
 >  For /dev/crypto usage I think that the internal interfaces of opencrypto 
 >  are not optimal.
 >  The aproach to get everything into some "objects" and operate on them, 
 >  creating new objects, sounds like a commonly used aproach in object 
 >  oriented programming.
 >  This aproach is not optimal for data-stream processing, because it 
 >  assumes that we have endless memory if we are going to process larger 
 >  amounts of data - it will not scale. It is no good idea to do this on 
 >  user level and inside the kernel such an aproach has lost nothing - my 
 >  oppion.
 >  The current patch will speed up /dev/crypto work, so it is a step into 
 >  the right direction for /dev/crypto usage too.
 >  In the interface of /dev/crypto the user needs to reserve enougth 
 >  user-level memory to copy back everything.
 >  So the user-process knows something about the expected size of the 
 >  thing. (Or makes a dummy call just in order to get the required size 
 >  information without any data transfer.)
 >  Perhaps this can be used to do some even better heuristics when 
 >  allocating buffer space. If there are some limit-checks at /dev/crypto 
 >  syscall level, the size used there should be used here too. (At least a 
 >  comment that references theese values is missing here.)
 >  Perhaps a max result size parameter makes sence in the calling interface 
 >  of deflate/gzip-routines, to avoid useless decompression that cannot be 
 >  copied back to user level due to size reasons. Or the "rest" gets 
 >  decompressed into a static garbage buffer in order get the result-size, 
 >  if required.
 >  The "best sollution" would be (from my point of view), if the whole 
 >  internal interface get changed, so that partial data transfer may 
 >  directly happen from/to the user level and the result-size gets 
 >  allocated (and expanded if needed) by the caller.
 >  Of cause this would break interchangeability with other implementations 
 >  and I'm not shure it this realy make sense.
 >  I also simply do not know what kind of "productive" user-level programs 
 >  are "designed" to use compresion via /dev/crypto.
 >  All network data stream related things should be inside of the kernel - 
 >  my oppinion - for performance reasons. Sorry, I'm no friend of placeing 
 >  kernel functionality (mainly data stream or network packet processing) 
 >  in user-level programs.
 >  (racoon (IKE) and IPSEC is a good example here: (fast) data-stream 
 >  processing inside the kernel - key-exchange, access validation and other 
 >  complex things in a user-level program that configures the kernel IPSEC 
 >  parts.)
 >  For testing and development of something /dev/crypto might be a very 
 >  nice thing, of cause.
 >  
 >  Best regards
 >  
 >  W. Stukenbrock
 >  
 >  
 >  Matthias Drochner wrote:
 >  
 >  > The following reply was made to PR kern/36864; it has been noted by GNATS.
 >  > 
 >  > From: "Matthias Drochner" <drochner@netbsd.org>
 >  > To: gnats-bugs@gnats.NetBSD.org
 >  > Cc: 
 >  > Subject: PR/36864 CVS commit: src/sys/opencrypto
 >  > Date: Fri, 18 Feb 2011 22:02:10 +0000
 >  > 
 >  >  Module Name:	src
 >  >  Committed By:	drochner
 >  >  Date:		Fri Feb 18 22:02:09 UTC 2011
 >  >  
 >  >  Modified Files:
 >  >  	src/sys/opencrypto: deflate.c
 >  >  
 >  >  Log Message:
 >  >  redo result buffer allocation, to avoid dynamic allocations:
 >  >  -use exponentially growing buffer sizes instead of just linear extension
 >  >  -drop the dynamic allocation of buffer metadata introduced in rev.1.8 --
 >  >   if the initial array is not sufficient something is wrong
 >  >  -apply some (arbitrary, heuristic) limit so that compressed data
 >  >   which extract into insane amounts of constant data don't kill the system
 >  >  This addresses PR kern/36864 by Wolfgang Stukenbrock. Some tuning
 >  >  might be useful, but hopefully this is an improvement already.
 >  >  
 >  >  
 >  >  To generate a diff of this commit:
 >  >  cvs rdiff -u -r1.17 -r1.18 src/sys/opencrypto/deflate.c
 >  >  
 >  >  Please note that diffs are not public domain; they are subject to the
 >  >  copyright notices on the relevant files.
 >  >  
 >  > 
 >  
 >  
 > 


 -- 


 Dr. Nagler & Company GmbH
 Hauptstraße 9
 92253 Schnaittenbach

 Tel. +49 9622/71 97-42
 Fax +49 9622/71 97-50

 Wolfgang.Stukenbrock@nagler-company.com
 http://www.nagler-company.com


 Hauptsitz: Schnaittenbach
 Handelregister: Amberg HRB
 Gerichtsstand: Amberg
 Steuernummer: 201/118/51825
 USt.-ID-Nummer: DE 273143997
 Geschäftsführer: Dr. Martin Nagler, Dr. Dr. Karl-Kuno Kunze


State-Changed-From-To: feedback->closed
State-Changed-By: drochner@NetBSD.org
State-Changed-When: Tue, 08 Mar 2011 16:58:05 +0000
State-Changed-Why:
submitter agrees that the problem is fixed
(performance improvements for userland cryptodev access are not done
yet, but decompression doesn't fail afaict)


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.