NetBSD Problem Report #50638

From tsutsui@ceres.dti.ne.jp  Sun Jan 10 17:36:50 2016
Return-Path: <tsutsui@ceres.dti.ne.jp>
Received: from mail.netbsd.org (mail.NetBSD.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 6463F7A21A
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 10 Jan 2016 17:36:50 +0000 (UTC)
Message-Id: <201601101736.u0AHajgX010846@mirage.localdomain>
Date: Mon, 11 Jan 2016 02:36:45 +0900 (JST)
From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
Reply-To: tsutsui@ceres.dti.ne.jp
To: gnats-bugs@NetBSD.org
Cc: tsutsui@ceres.dti.ne.jp
Subject: Extreme slowness on loading gzipped kernels on old CPUs
X-Send-Pr-Version: 3.95

>Number:         50638
>Category:       bin
>Synopsis:       Extreme slowness on loading gzipped kernels on old CPUs
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    bin-bug-people
>State:          closed
>Class:          change-request
>Submitter-Id:   net
>Arrival-Date:   Sun Jan 10 17:40:00 +0000 2016
>Closed-Date:    Sun Jan 17 08:20:18 +0000 2016
>Last-Modified:  Sun Jan 17 23:25:01 +0000 2016
>Originator:     Izumi Tsutsui
>Release:        NetBSD 7.0
>Organization:
>Environment:
System: NetBSD 7.0
Architecture: all, but especially m68k and vax etc.
Machine: ditto
>Description:
As mentioned in the following post, loading gzipped kernels
via libsa based bootloader much slower since NetBSD 5.x
than 4.x and prior:
 http://mail-index.netbsd.org/port-vax/2014/05/25/msg002214.html

This post is about vax, but loading gzipped kernel on x68k is
also extremely slower than gunzipped kernels.

This seems caused by crc32() function changes in the following commits,
which added modified crc32() was added into libkern based on zlib one:
 http://mail-index.netbsd.org/source-changes/2009/03/25/msg218803.html
 http://mail-index.netbsd.org/source-changes/2009/03/25/msg218817.html

In the 4.x and prior, _STANDALONE programs used 32 bit tables
with "DYNAMIC_CRC_TABLE" option (which generated CRC tables runtime)
for crc32() calculations:
 http://nxr.netbsd.org/xref/src/common/dist/zlib/crc32.c?r=1.4#84

However, after the above "crc32() in libkern" changes,
_STANDALONE programs were changed to use "dumb" CRC functions
based on its definition:
 http://cvsweb.netbsd.org/cgi-bin/cvsweb.cgi/src/sys/lib/libsa/cread.c.diff?r1=1.22&r2=1.23&f=u

As the comment claims this was smaller than the table version,
but actually it's also extremely slower on ancient CPUs
(requires additional several minutes).

>How-To-Repeat:
Boot gzipped kernels from any CD or HDD etc.

>Fix:

* Approach:

There are three approach to handle this issue:

(1) leave as is and use the dumb crc32() function (for small binary)
(2) pull "DYNAMIC_CRC_TABLE" implementation from zlib into libkern/crc32.c
(3) completely disable gunzip crc32() calculation in cread() function

* Pros. and cons:

(1) lowest risk? (slow but still works)
(2) better compromise?
(3) it looks CRC is not used in most case;
    the CRC can be confirmed when the whole file contents are read,
    but our libsa loadfile() finction loads only text, data, bss,
    and symbol table, which could also be padded by ldscript.

* Sample implementation:

I have confirmed the following dumb change for these three cases.

- "-DSMALL_CRC32" specifies (1)
- the default (no additional macro) specifies (2)
  (note: actual CRC calculation results are not confirmed, nor kernels)
- "-DCREAD_NOCRC" specifies (3)

The size of boot binaries and time of loading the netbsd-INSTALL.gz
kernel on LUNA-II with NetBSD/luna68k 7.0 are:

(1) -DSMALL_CRC32 (dumb crc32(), as current implementation)

size:
section      size      addr
.text       69384   7340032
.data        2432   7416896
.bss        17952   7419328

time:
9 min 22 sec

(2) with DYNAMIC_CRC_TABLE crc32() (as past default)

size:
section      size      addr
.text       70416   7340032
.data        2436   7417936
.bss        26144   7420376

time:
4 min 21 sec

(3) -DCREAD_NOCRC

size:
section      size      addr
.text       69288   7340032
.data        2432   7416800
.bss        17952   7419232

time:
3 min 59 sec

---

Index: sys/arch/luna68k/stand/boot/Makefile
===================================================================
RCS file: /cvsroot/src/sys/arch/luna68k/stand/boot/Makefile,v
retrieving revision 1.11
diff -u -p -d -r1.11 Makefile
--- sys/arch/luna68k/stand/boot/Makefile	16 Jan 2014 01:15:34 -0000	1.11
+++ sys/arch/luna68k/stand/boot/Makefile	10 Jan 2016 14:53:17 -0000
@@ -19,6 +19,8 @@ CPPFLAGS+=	-DSUPPORT_DHCP -DSUPPORT_BOOT
 #CPPFLAGS+=	-DRPC_DEBUG -DRARP_DEBUG -DNET_DEBUG -DDEBUG -DPARANOID
 CPPFLAGS+=	-DLIBSA_ENABLE_LS_OP
 CPPFLAGS+=	-DLIBSA_PRINTF_WIDTH_SUPPORT
+CPPFLAGS+=	-DCREAD_NOCRC32
+#CPPFLAGS+=	-DSMALL_CRC32

 CFLAGS=		-Os -msoft-float
 CFLAGS+=	-ffreestanding
Index: sys/lib/libkern/crc32.c
===================================================================
RCS file: /cvsroot/src/sys/lib/libkern/crc32.c,v
retrieving revision 1.4
diff -u -p -d -r1.4 crc32.c
--- sys/lib/libkern/crc32.c	26 Mar 2009 22:18:14 -0000	1.4
+++ sys/lib/libkern/crc32.c	10 Jan 2016 14:53:24 -0000
@@ -16,6 +16,10 @@

 /* @(#) Id */

+#if defined(_STANDALONE)
+#define DYNAMIC_CRC_TABLE
+#endif
+
 #include <sys/param.h>
 #include <machine/endian.h>

@@ -29,7 +33,91 @@ typedef uint32_t u4;
  * Tables of CRC-32s of all single-byte values, made by make_crc_table().
  */
 #include <lib/libkern/libkern.h>
+
+#ifdef DYNAMIC_CRC_TABLE
+
+static volatile bool crc_table_empty = true;
+static uint32_t crc_table[8][256];
+static void make_crc_table(void);
+
+/*
+  Generate tables for a byte-wise 32-bit CRC calculation on the polynomial:
+  x^32+x^26+x^23+x^22+x^16+x^12+x^11+x^10+x^8+x^7+x^5+x^4+x^2+x+1.
+
+  Polynomials over GF(2) are represented in binary, one bit per coefficient,
+  with the lowest powers in the most significant bit.  Then adding polynomials
+  is just exclusive-or, and multiplying a polynomial by x is a right shift by
+  one.  If we call the above polynomial p, and represent a byte as the
+  polynomial q, also with the lowest power in the most significant bit (so the
+  byte 0xb1 is the polynomial x^7+x^3+x+1), then the CRC is (q*x^32) mod p,
+  where a mod b means the remainder after dividing a by b.
+
+  This calculation is done using the shift-register method of multiplying and
+  taking the remainder.  The register is initialized to zero, and for each
+  incoming bit, x^32 is added mod p to the register if the bit is a one (where
+  x^32 mod p is p+x^32 = x^26+...+1), and the register is multiplied mod p by
+  x (which is shifting right by one and adding x^32 mod p if the bit shifted
+  out is a one).  We start with the highest power (least significant bit) of
+  q and repeat for all eight bits of q.
+
+  The first table is simply the CRC of all possible eight bit values.  This is
+  all the information needed to generate CRCs on data a byte at a time for all
+  combinations of CRC register values and incoming bytes.  The remaining tables
+  allow for word-at-a-time CRC calculation for both big-endian and little-
+  endian machines, where a word is four bytes.
+*/
+static void make_crc_table(void)
+{
+    uint32_t c;
+    int n, k;
+    uint32_t poly;                      /* polynomial exclusive-or pattern */
+    /* terms of polynomial defining this crc (except x^32): */
+    static volatile bool first = true;  /* flag to limit concurrent making */
+    static const unsigned char p[] = {0,1,2,4,5,7,8,10,11,12,16,22,23,26};
+
+    /* See if another task is already doing this (not thread-safe, but better
+       than nothing -- significantly reduces duration of vulnerability in
+       case the advice about DYNAMIC_CRC_TABLE is ignored) */
+    if (first) {
+        first = false;
+
+        /* make exclusive-or pattern from polynomial (0xedb88320UL) */
+        poly = 0;
+        for (n = 0; n < sizeof(p)/sizeof(unsigned char); n++)
+            poly |= 1 << (31 - p[n]);
+
+        /* generate a crc for every 8-bit value */
+        for (n = 0; n < 256; n++) {
+            c = (uint32_t)n;
+            for (k = 0; k < 8; k++)
+                c = c & 1 ? poly ^ (c >> 1) : c >> 1;
+            crc_table[0][n] = c;
+        }
+
+        /* generate crc for each value followed by one, two, and three zeros,
+           and then the byte reversal of those as well as the first table */
+        for (n = 0; n < 256; n++) {
+            c = crc_table[0][n];
+            crc_table[4][n] = REV(c);
+            for (k = 1; k < 4; k++) {
+                c = crc_table[0][c & 0xff] ^ (c >> 8);
+                crc_table[k][n] = c;
+                crc_table[k + 4][n] = REV(c);
+            }
+        }
+
+        crc_table_empty = false;
+    }
+    else {      /* not first */
+        /* wait for the other guy to finish (not efficient, but rare) */
+        while (crc_table_empty)
+            ;
+    }
+}
+
+#else /* !DYNAMIC_CRC_TABLE */
 #include "crc32.h"
+#endif

 #if BYTE_ORDER == LITTLE_ENDIAN
 /* ========================================================================= */
@@ -46,6 +134,11 @@ uint32_t crc32(uint32_t crc, const uint8

     if (buf == NULL) return 0UL;

+#ifdef DYNAMIC_CRC_TABLE
+    if (crc_table_empty)
+        make_crc_table();
+#endif /* DYNAMIC_CRC_TABLE */
+
     c = (u4)crc;
     c = ~c;
     while (len && ((uintptr_t)buf & 3)) {
@@ -87,6 +180,11 @@ uint32_t crc32(uint32_t crc, const uint8

     if (buf == NULL) return 0UL;

+#ifdef DYNAMIC_CRC_TABLE
+    if (crc_table_empty)
+        make_crc_table();
+#endif /* DYNAMIC_CRC_TABLE */
+
     c = REV((u4)crc);
     c = ~c;
     while (len && ((uintptr_t)buf & 3)) {
Index: sys/lib/libsa/cread.c
===================================================================
RCS file: /cvsroot/src/sys/lib/libsa/cread.c,v
retrieving revision 1.26
diff -u -p -d -r1.26 cread.c
--- sys/lib/libsa/cread.c	13 Oct 2013 20:09:02 -0000	1.26
+++ sys/lib/libsa/cread.c	10 Jan 2016 14:53:24 -0000
@@ -86,11 +86,11 @@ void	*zcalloc(void *, unsigned int, unsi
 void	zcfree(void *, void *);
 void	zmemcpy(unsigned char *, unsigned char *, unsigned int);

+#if defined(SMALL_CRC32) || defined(CREAD_NOCRC32)
 /*
- * The libkern version of this function uses an 8K set of tables.
  * This is the double-loop version of LE CRC32 from if_ethersubr,
- * lightly modified -- it is 200 bytes smaller than the version using
- * a 4-bit table and at least 8K smaller than the libkern version.
+ * lightly modified -- it is ~1KB smaller than libkern version with
+ * DYNAMIC_CRC_TABLE but too much slower especially on ancient poor CPUs.
  */
 #ifndef ETHER_CRC_POLY_LE
 #define ETHER_CRC_POLY_LE	0xedb88320
@@ -98,6 +98,7 @@ void	zmemcpy(unsigned char *, unsigned c
 uint32_t
 crc32(uint32_t crc, const uint8_t *const buf, size_t len)
 {
+#if !defined(CREAD_NOCRC)
 	uint32_t c, carry;
 	size_t i, j;

@@ -114,7 +115,9 @@ crc32(uint32_t crc, const uint8_t *const
 	    }
 	}
 	return (crc ^ 0xffffffffU);
+#endif /* defined(CREAD_NOCRC) */
 }
+#endif /* defined(SMALL_CRC32) || defined(CREAD_NOCRC) */

 /*
  * compression utilities
@@ -317,7 +320,9 @@ ssize_t
 read(int fd, void *buf, size_t len)
 {
 	struct sd *s;
+#if !defined(CREAD_NOCRC32)
 	unsigned char *start = buf; /* starting point for crc computation */
+#endif

 	s = ss[fd];

@@ -373,13 +378,24 @@ read(int fd, void *buf, size_t len)
 		s->z_err = inflate(&(s->stream), Z_NO_FLUSH);

 		if (s->z_err == Z_STREAM_END) {
+			uint32_t total_out;
+#if !defined(CREAD_NOCRC32)
+			uint32_t crc;
 			/* Check CRC and original size */
 			s->crc = crc32(s->crc, start, (unsigned int)
 					(s->stream.next_out - start));
 			start = s->stream.next_out;
+			crc = getLong(s);
+#else
+			(void)getLong(s);
+#endif
+			total_out = getLong(s);

-			if (getLong(s) != s->crc ||
-			    getLong(s) != s->stream.total_out) {
+			if (total_out != s->stream.total_out
+#if !defined(CREAD_NOCRC32)
+			    || crc != s->crc
+#endif
+			    ) {

 				s->z_err = Z_DATA_ERROR;
 			} else {
@@ -387,7 +403,9 @@ read(int fd, void *buf, size_t len)
 				check_header(s);
 				if (s->z_err == Z_OK) {
 					inflateReset(&(s->stream));
+#if !defined(CREAD_NOCRC32)
 					s->crc = crc32(0L, Z_NULL, 0);
+#endif
 				}
 			}
 		}
@@ -395,8 +413,10 @@ read(int fd, void *buf, size_t len)
 			break;
 	}

+#if !defined(CREAD_NOCRC32)
 	s->crc = crc32(s->crc, start,
 	               (unsigned int)(s->stream.next_out - start));
+#endif

 	return (int)(len - s->stream.avail_out);
 }

---

Comments?

---
Izumi Tsutsui

>Release-Note:

>Audit-Trail:
From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on old
 CPUs
Date: Sun, 10 Jan 2016 19:06:39 +0100

 On Sun, Jan 10, 2016 at 05:40:00PM +0000, Izumi Tsutsui wrote:
 > (1) leave as is and use the dumb crc32() function (for small binary)
 > (2) pull "DYNAMIC_CRC_TABLE" implementation from zlib into libkern/crc32.c
 > (3) completely disable gunzip crc32() calculation in cread() function

 Another option would be to use a version of what libarchive is using in
 src/external/bsd/libarchive/dist/libarchive/archive_crc32.h. That
 version is still pretty small.

 Joerg

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on old CPUs
Date: Sun, 10 Jan 2016 19:07:52 +0100

 On Sun, Jan 10, 2016 at 05:40:00PM +0000, Izumi Tsutsui wrote:
 > (3) completely disable gunzip crc32() calculation in cread() function

 My vote is on (3).

 Thanks for dealing with this!

 Martin

From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, tsutsui@ceres.dti.ne.jp
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on old
 CPUs
Date: Sun, 10 Jan 2016 19:22:19 +0100

 On Sun, Jan 10, 2016 at 06:10:01PM +0000, Martin Husemann wrote:
 > The following reply was made to PR bin/50638; it has been noted by GNATS.
 > 
 > From: Martin Husemann <martin@duskware.de>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on old CPUs
 > Date: Sun, 10 Jan 2016 19:07:52 +0100
 > 
 >  On Sun, Jan 10, 2016 at 05:40:00PM +0000, Izumi Tsutsui wrote:
 >  > (3) completely disable gunzip crc32() calculation in cread() function
 >  
 >  My vote is on (3).
 >  
 >  Thanks for dealing with this!

 Can you try the faster crc32 from libarchive? It doesn't add much code
 and IMO having the verification is quite useful.

 Joerg

From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on old
 CPUs
Date: Sun, 10 Jan 2016 22:18:27 +0100

 -----BEGIN PGP SIGNED MESSAGE-----
 Hash: SHA256

 On 10.01.2016 19:06, Joerg Sonnenberger wrote:
 > On Sun, Jan 10, 2016 at 05:40:00PM +0000, Izumi Tsutsui wrote:
 >> (1) leave as is and use the dumb crc32() function (for small
 >> binary) (2) pull "DYNAMIC_CRC_TABLE" implementation from zlib
 >> into libkern/crc32.c (3) completely disable gunzip crc32()
 >> calculation in cread() function
 > 
 > Another option would be to use a version of what libarchive is
 > using in 
 > src/external/bsd/libarchive/dist/libarchive/archive_crc32.h. That 
 > version is still pretty small.
 > 

 Just a note. Recent x86_64 and optionally armv8 ships with intrinsic
 functions implemented in their cores for CRC32 functions.

 According to my tests the lookup algorithm is 3-4x slower on x86_64
 and armv8 than a function reusing the dedicated CPU instruction.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2

 iQIcBAEBCAAGBQJWksqhAAoJEEuzCOmwLnZst2UP/0FuXiM1/mhU1G0hawqEAdeh
 /POd5Od0sZpkHGmeK6lJRSWUN3ZmXg5oyIAH9EExbpCH2IueZMOYnSWRLS1MsTbB
 /iAe+MNWk7li/VlM3i947wZ2Tzsz8gs5Ar/E9OEsF2Vno6HDs0djp6tIOeB1OUS2
 V/XYmzbomGuIrkjT5aluIz5aTq/0yVWOOcL4YQw3nWfJWJWzThZ0tr7u3teylEgZ
 OVnE7PLJQjyGysZ7jzTaPoByjuOINVmM3Bn924X1X4UmuEPkOSRTql40C8yac+Nk
 bkl636JR6og+VDrbKnOBkkXMDfQlKO01sefWj4YjkWjJmNr7jmgYzlR1d1QaXvKr
 3prHt9pIjLM9Y+7fy6OFMMLXVk/euvaHbuw2G+yfvoMt/6ll+CXtIAvh1DHQRcu0
 Jw+czWYMWoYMRBc79zCiXbgDDHZVXob+DQD9QlhK7IRuHxOpfnwAm1mJhLB03MHq
 wJyHBxwvlqBelLi10ZnKOIIOebVA/n3HYpMBN5QyLs2aYT1v1IhtOceP3yko699w
 E0PHxVksWekatid4YSBV6rqS/krrsYg154hwOYVvLQzRe5voMHZo6FIzvSYTkeL3
 3zbLtH5opYA+iA+R94h+RJ2k+Kl9icrND3Fr7GNmDNHiswkNkGchVMC2cR5Rcyl4
 d9FQbFAi68/Q0BWcRFzo
 =e7dy
 -----END PGP SIGNATURE-----

From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, tsutsui@ceres.dti.ne.jp
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on old
 CPUs
Date: Sun, 10 Jan 2016 22:56:12 +0100

 On Sun, Jan 10, 2016 at 09:20:01PM +0000, Kamil Rytarowski wrote:
 > The following reply was made to PR bin/50638; it has been noted by GNATS.
 > 
 > From: Kamil Rytarowski <n54@gmx.com>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on old
 >  CPUs
 > Date: Sun, 10 Jan 2016 22:18:27 +0100
 > 
 >  -----BEGIN PGP SIGNED MESSAGE-----
 >  Hash: SHA256
 >  
 >  On 10.01.2016 19:06, Joerg Sonnenberger wrote:
 >  > On Sun, Jan 10, 2016 at 05:40:00PM +0000, Izumi Tsutsui wrote:
 >  >> (1) leave as is and use the dumb crc32() function (for small
 >  >> binary) (2) pull "DYNAMIC_CRC_TABLE" implementation from zlib
 >  >> into libkern/crc32.c (3) completely disable gunzip crc32()
 >  >> calculation in cread() function
 >  > 
 >  > Another option would be to use a version of what libarchive is
 >  > using in 
 >  > src/external/bsd/libarchive/dist/libarchive/archive_crc32.h. That 
 >  > version is still pretty small.
 >  > 
 >  
 >  Just a note. Recent x86_64 and optionally armv8 ships with intrinsic
 >  functions implemented in their cores for CRC32 functions.
 >  
 >  According to my tests the lookup algorithm is 3-4x slower on x86_64
 >  and armv8 than a function reusing the dedicated CPU instruction.

 The point is not that there are faster ways to do, just to do it fast
 enough and in small enough executable footprint to not hurt loading a
 kernel.

 Joerg

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: joerg@britannica.bec.de
Cc: gnats-bugs@NetBSD.org, tsutsui@ceres.dti.ne.jp
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
Date: Mon, 11 Jan 2016 10:56:51 +0900

 > >  > (3) completely disable gunzip crc32() calculation in cread() function
 > >  
 > >  My vote is on (3).
 > >  
 > >  Thanks for dealing with this!
 > 
 > Can you try the faster crc32 from libarchive? It doesn't add much code
 > and IMO having the verification is quite useful.

 It looks like Armchair Detective's comments.

 What's your comment about the following sentence in the original PR?

 >> * Pros. and cons:
  :
 >> (3) it looks CRC is not used in most case;
 >>     the CRC can be confirmed when the whole file contents are read,
 >>     but our libsa loadfile() finction loads only text, data, bss,
 >>     and symbol table, which could also be padded by ldscript.

 How can your suggestion be useful and appropriate for implementation cost?

 ---
 Izumi Tsutsui

From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, tsutsui@ceres.dti.ne.jp
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
Date: Mon, 11 Jan 2016 11:10:16 +0100

 On Mon, Jan 11, 2016 at 02:00:01AM +0000, Izumi Tsutsui wrote:
 > The following reply was made to PR bin/50638; it has been noted by GNATS.
 > 
 > From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
 > To: joerg@britannica.bec.de
 > Cc: gnats-bugs@NetBSD.org, tsutsui@ceres.dti.ne.jp
 > Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
 > Date: Mon, 11 Jan 2016 10:56:51 +0900
 > 
 >  > >  > (3) completely disable gunzip crc32() calculation in cread() function
 >  > >  
 >  > >  My vote is on (3).
 >  > >  
 >  > >  Thanks for dealing with this!
 >  > 
 >  > Can you try the faster crc32 from libarchive? It doesn't add much code
 >  > and IMO having the verification is quite useful.
 >  
 >  It looks like Armchair Detective's comments.

 Can you please try to avoid calling people names without even looking at
 the suggestions? We have been running into a situation very similar in
 libarchive, where the original dumb crc32 implementation created
 significant performance impact. The new fallback version is only
 slightly larger than the dumb version, both in terms of number of source
 lines and code size. It was fast enough for libarchive's purpose and it
 should be fast enough for input the size of the ramdisk kernels on
 slower systems.

 >  What's your comment about the following sentence in the original PR?
 >  
 >  >> * Pros. and cons:
 >   :
 >  >> (3) it looks CRC is not used in most case;
 >  >>     the CRC can be confirmed when the whole file contents are read,
 >  >>     but our libsa loadfile() finction loads only text, data, bss,
 >  >>     and symbol table, which could also be padded by ldscript.
 >  
 >  How can your suggestion be useful and appropriate for implementation cost?

 Even if it is not used in some cases, we can load full compressed files
 e.g. for external ramdisks. Because trying to castrate a functionality,
 it much better to look if there is a trivial fix. The suggestion does
 exactly that.

 Joerg

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: joerg@britannica.bec.de
Cc: gnats-bugs@NetBSD.org, tsutsui@ceres.dti.ne.jp
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
Date: Mon, 11 Jan 2016 19:45:26 +0900

 joerg@ wrote:

 > >  > >  > (3) completely disable gunzip crc32() calculation in cread() function
 > >  > >  
 > >  > >  My vote is on (3).
 > >  > >  
 > >  > >  Thanks for dealing with this!
 > >  > 
 > >  > Can you try the faster crc32 from libarchive? It doesn't add much code
 > >  > and IMO having the verification is quite useful.
 > >  
 > >  It looks like Armchair Detective's comments.
 > 
 > Can you please try to avoid calling people names without even looking at
 > the suggestions? We have been running into a situation very similar in
 > libarchive, where the original dumb crc32 implementation created
 > significant performance impact. The new fallback version is only
 > slightly larger than the dumb version, both in terms of number of source
 > lines and code size. It was fast enough for libarchive's purpose and it
 > should be fast enough for input the size of the ramdisk kernels on
 > slower systems.

 Consider paticular use and background of libsa cread.
 You are claiming the perfect world without consideration
 of cost vs benefit. Unfortunately we have very limited resoures.

 On i386 and amd64, gzipped kernels are not used at all.
 Most evbarm gadgets don't use NetBSD's native loader.
 On most other tier II ports, there are a few users and developers
 but we see the slowness just on installation, unlike libarchive
 in pkgsrc etc.

 Using existing code (including zlib DYNAMIC_CRC_TABLE version, which
 was used by 4.0 and prior) or removing CRC caluclation is acceptable
 compromise for me, but I don't have extra motivation to pull
 whole new infrastructures in this case. That's why I proposed
 three approach in the PR.

 > >  What's your comment about the following sentence in the original PR?
 > >  
 > >  >> * Pros. and cons:
 > >   :
 > >  >> (3) it looks CRC is not used in most case;
 > >  >>     the CRC can be confirmed when the whole file contents are read,
 > >  >>     but our libsa loadfile() finction loads only text, data, bss,
 > >  >>     and symbol table, which could also be padded by ldscript.
 > >  
 > >  How can your suggestion be useful and appropriate for implementation cost?
 > 
 > Even if it is not used in some cases, we can load full compressed files
 > e.g. for external ramdisks. Because trying to castrate a functionality,
 > it much better to look if there is a trivial fix. The suggestion does
 > exactly that.

 I still don't see actual benefit with measured numbers.
 Furthermore, currently there is no use of external ramdisks.
 No one will bother implement code without visible benefit
 (or just technical interests) so it can't be an alternative
 if there is no working code.  That's all.

 Of course, if you claim it's trival and you will work on it,
 I won't object it at all.

 ---
 Izumi Tsutsui

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, tsutsui@ceres.dti.ne.jp
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
Date: Mon, 11 Jan 2016 12:36:11 +0100

 On Mon, Jan 11, 2016 at 10:50:01AM +0000, Izumi Tsutsui wrote:
 >  [...]
 >  On i386 and amd64, gzipped kernels are not used at all.

 that's not true. I use gzipped kernels and modules for tftp boot

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, tsutsui@ceres.dti.ne.jp
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
Date: Mon, 11 Jan 2016 12:44:51 +0100

 On Mon, Jan 11, 2016 at 10:50:01AM +0000, Izumi Tsutsui wrote:
 > The following reply was made to PR bin/50638; it has been noted by GNATS.
 > 
 > From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
 > To: joerg@britannica.bec.de
 > Cc: gnats-bugs@NetBSD.org, tsutsui@ceres.dti.ne.jp
 > Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
 > Date: Mon, 11 Jan 2016 19:45:26 +0900
 > 
 >  joerg@ wrote:
 >  
 >  > >  > >  > (3) completely disable gunzip crc32() calculation in cread() function
 >  > >  > >  
 >  > >  > >  My vote is on (3).
 >  > >  > >  
 >  > >  > >  Thanks for dealing with this!
 >  > >  > 
 >  > >  > Can you try the faster crc32 from libarchive? It doesn't add much code
 >  > >  > and IMO having the verification is quite useful.
 >  > >  
 >  > >  It looks like Armchair Detective's comments.
 >  > 
 >  > Can you please try to avoid calling people names without even looking at
 >  > the suggestions? We have been running into a situation very similar in
 >  > libarchive, where the original dumb crc32 implementation created
 >  > significant performance impact. The new fallback version is only
 >  > slightly larger than the dumb version, both in terms of number of source
 >  > lines and code size. It was fast enough for libarchive's purpose and it
 >  > should be fast enough for input the size of the ramdisk kernels on
 >  > slower systems.
 >  
 >  Consider paticular use and background of libsa cread.
 >  You are claiming the perfect world without consideration
 >  of cost vs benefit. Unfortunately we have very limited resoures.

 Let me rerpeat: please stop calling people names. I have explicitly
 asked about *measuring* things because cutting it out. You obviously
 still haven't bothered looking at the suggestion since it even has some
 hard numbers. I am quite aware of the limited resources, both CPU time
 and space. That is exactly why I offered a version that is known to be
 almost the same size and significantly faster.

 >  On i386 and amd64, gzipped kernels are not used at all.

 They are not used by default, that doesn't mean it is not used at all.
 For booting from ISO images, it would likely be a net gain in terms of
 boot time, but doesn't matter in terms of size win.

 >  Using existing code (including zlib DYNAMIC_CRC_TABLE version, which
 >  was used by 4.0 and prior) or removing CRC caluclation is acceptable
 >  compromise for me, but I don't have extra motivation to pull
 >  whole new infrastructures in this case. That's why I proposed
 >  three approach in the PR.

 Have you spend even 5 seconds to look at the file I pointed to? There is
 no new infrastructure. It is pretty much a drop-in replacement of crc32
 with a dynamic single byte table computation.

 Joerg

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: joerg@britannica.bec.de
Cc: gnats-bugs@NetBSD.org, tsutsui@ceres.dti.ne.jp
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
Date: Mon, 11 Jan 2016 23:56:33 +0900

 joerg@ wrote:

 > Let me rerpeat: please stop calling people names. I have explicitly
 > asked about *measuring* things because cutting it out.

 Sorry, no motivation or benefit for me to measure it.

 > I am quite aware of the limited resources, both CPU time
 > and space.

 I meant limited human resources. If you have enough spare time
 to try it, it's fine and no reason to ask other guys.

 > They are not used by default, that doesn't mean it is not used at all.

 If someone complains slowness on x86, others might look at it.
 That's all.

 > Have you spend even 5 seconds to look at the file I pointed to? There is
 > no new infrastructure. It is pretty much a drop-in replacement of crc32
 > with a dynamic single byte table computation.

 The fact is not so simple.

 For example, persons who added crc32() into libkern didn't
 change the function name but did change an argument type,
 then it caused a conflict against the original zlib one.
 I cannot simply use zlib/crc32.c and I have to copy code
 from zlib/crc32.c to slightly modified libkern/crc32.c.

 Looking at code might take only five seconds, but
 implementation is often annoying and boring.

 ---
 Izumi Tsutsui

From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, tsutsui@ceres.dti.ne.jp
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
Date: Mon, 11 Jan 2016 16:53:10 +0100

 On Mon, Jan 11, 2016 at 03:00:01PM +0000, Izumi Tsutsui wrote:
 >  > I am quite aware of the limited resources, both CPU time
 >  > and space.
 >  
 >  I meant limited human resources. If you have enough spare time
 >  to try it, it's fine and no reason to ask other guys.

 I haven't hit the problems and I don't have hardware easily at hand that
 is slow enough that it matters.

 >  > They are not used by default, that doesn't mean it is not used at all.
 >  
 >  If someone complains slowness on x86, others might look at it.
 >  That's all.

 It is not noticable on semi-modern hardware. Even the slow version takes
 less than 1s, IO is still worse.

 >  > Have you spend even 5 seconds to look at the file I pointed to? There is
 >  > no new infrastructure. It is pretty much a drop-in replacement of crc32
 >  > with a dynamic single byte table computation.
 >  
 >  The fact is not so simple.

 Yes, the fact is that simple.

 >  For example, persons who added crc32() into libkern didn't
 >  change the function name but did change an argument type,
 >  then it caused a conflict against the original zlib one.
 >  I cannot simply use zlib/crc32.c and I have to copy code
 >  from zlib/crc32.c to slightly modified libkern/crc32.c.

 I mentioned the code specifically because it is much easier to use than
 the copy & pasting from zlib's dynamic table compution, which is might
 be somewhat faster for larger input, but also is larger in terms of code
 size. If you had looked, it would be have been quite clear.

 Joerg

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: joerg@britannica.bec.de
Cc: gnats-bugs@NetBSD.org, tsutsui@ceres.dti.ne.jp
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
Date: Tue, 12 Jan 2016 01:27:52 +0900

 joerg@ wrote:

 > >  If someone complains slowness on x86, others might look at it.
 > >  That's all.
 > 
 > It is not noticable on semi-modern hardware. Even the slow version takes
 > less than 1s, IO is still worse.

 Then there is no reason to try a possibly faster version
 which will do nothing on loading kernels anyway.
 I will add an option to simply disable CRC calculation in
 the libsa cread() function to solve the main issue of this PR.

 You can still suggest kernel crypto guys to replace current zlib based
 crc32() function in libkern with your libarchive one in another PR.

 > >  For example, persons who added crc32() into libkern didn't
 > >  change the function name but did change an argument type,
 > >  then it caused a conflict against the original zlib one.
 > >  I cannot simply use zlib/crc32.c and I have to copy code
 > >  from zlib/crc32.c to slightly modified libkern/crc32.c.
 > 
 > I mentioned the code specifically because it is much easier to use than
 > the copy & pasting from zlib's dynamic table compution, which is might
 > be somewhat faster for larger input, but also is larger in terms of code
 > size. If you had looked, it would be have been quite clear.

 I wish someone who has enough spare time will try it.

 ---
 Izumi Tsutsui

From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
Cc: gnats-bugs@NetBSD.org
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
Date: Mon, 11 Jan 2016 17:50:08 +0100

 On Tue, Jan 12, 2016 at 01:27:52AM +0900, Izumi Tsutsui wrote:
 > joerg@ wrote:
 > 
 > > >  If someone complains slowness on x86, others might look at it.
 > > >  That's all.
 > > 
 > > It is not noticable on semi-modern hardware. Even the slow version takes
 > > less than 1s, IO is still worse.
 > 
 > Then there is no reason to try a possibly faster version
 > which will do nothing on loading kernels anyway.
 > I will add an option to simply disable CRC calculation in
 > the libsa cread() function to solve the main issue of this PR.
 > 
 > You can still suggest kernel crypto guys to replace current zlib based
 > crc32() function in libkern with your libarchive one in another PR.

 You asked about possible options. You missed a valid one. Now you don't
 want to hear that your preferred choice may not be the best approach.
 Seriously, why bother asking in first place?

 > 
 > > >  For example, persons who added crc32() into libkern didn't
 > > >  change the function name but did change an argument type,
 > > >  then it caused a conflict against the original zlib one.
 > > >  I cannot simply use zlib/crc32.c and I have to copy code
 > > >  from zlib/crc32.c to slightly modified libkern/crc32.c.
 > > 
 > > I mentioned the code specifically because it is much easier to use than
 > > the copy & pasting from zlib's dynamic table compution, which is might
 > > be somewhat faster for larger input, but also is larger in terms of code
 > > size. If you had looked, it would be have been quite clear.
 > 
 > I wish someone who has enough spare time will try it.

 The emails from you have already wasted way more time than copying the
 function and measuring is ever going to do.

 Joerg

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: joerg@britannica.bec.de
Cc: gnats-bugs@NetBSD.org, tsutsui@ceres.dti.ne.jp
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
Date: Tue, 12 Jan 2016 02:44:23 +0900

 joerg@ wrote:

 > You asked about possible options. You missed a valid one. Now you don't
 > want to hear that your preferred choice may not be the best approach.
 > Seriously, why bother asking in first place?

 Because there were three options. martin@ agreed the third one.

 You thought your forth one is valid and best and
 I didn't think so because there is no working implementation
 and actual benefit.

 > The emails from you have already wasted way more time than copying the
 > function and measuring is ever going to do.

 Me too.

 ---
 Izumi Tsutsui

From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
Date: Mon, 11 Jan 2016 19:58:44 +0100

 --zhXaljGHf11kAtnf
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 On Mon, Jan 11, 2016 at 11:40:01AM +0000, Manuel Bouyer wrote:
 > The following reply was made to PR bin/50638; it has been noted by GNATS.
 > 
 > From: Manuel Bouyer <bouyer@antioche.eu.org>
 > To: gnats-bugs@NetBSD.org
 > Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, tsutsui@ceres.dti.ne.jp
 > Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
 > Date: Mon, 11 Jan 2016 12:36:11 +0100
 > 
 >  On Mon, Jan 11, 2016 at 10:50:01AM +0000, Izumi Tsutsui wrote:
 >  >  [...]
 >  >  On i386 and amd64, gzipped kernels are not used at all.
 >  
 >  that's not true. I use gzipped kernels and modules for tftp boot

 It's even better. The install ISO image uses gzipped kernels and also
 clearly shows that CRC32 verification is done. Ah well, let's not bother
 anymore. Attached is a working patch that needed 2min of work and
 slightly more time to test...

 Joerg

 --zhXaljGHf11kAtnf
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="cread.c.diff"

 Index: cread.c
 ===================================================================
 RCS file: /home/joerg/repo/netbsd/src/sys/lib/libsa/cread.c,v
 retrieving revision 1.27
 diff -u -p -r1.27 cread.c
 --- cread.c	25 Jul 2015 07:06:11 -0000	1.27
 +++ cread.c	11 Jan 2016 18:55:47 -0000
 @@ -93,26 +93,32 @@ void	zmemcpy(unsigned char *, unsigned c
   * a 4-bit table and at least 8K smaller than the libkern version.
   */
  #ifndef ETHER_CRC_POLY_LE
 -#define ETHER_CRC_POLY_LE	0xedb88320
 +#define ETHER_CRC_POLY_LE	0xedb88320U
  #endif
  uint32_t
 -crc32(uint32_t crc, const uint8_t *const buf, size_t len)
 +crc32(uint32_t crc, const uint8_t *buf, size_t len)
  {
 -	uint32_t c, carry;
 -	size_t i, j;
 +	static volatile int crc_tbl_inited = 0;
 +	static uint32_t crc_tbl[256];

 -	crc = 0xffffffffU ^ crc;
 -	for (i = 0; i < len; i++) {
 -		c = buf[i];
 -		for (j = 0; j < 8; j++) {
 -			carry = ((crc & 0x01) ? 1 : 0) ^ (c & 0x01);
 -			crc >>= 1;
 -			c >>= 1;
 -			if (carry) {
 -				crc = (crc ^ ETHER_CRC_POLY_LE);
 +	if (!crc_tbl_inited) {
 +		uint32_t crc2, b, i;
 +		for (b = 0; b < 256; ++b) {
 +			crc2 = b;
 +			for (i = 8; i > 0; --i) {
 +				if (crc2 & 1)
 +					crc2 = (crc2 >> 1) ^ ETHER_CRC_POLY_LE;
 +				else    
 +					crc2 = (crc2 >> 1);
  			}
 +			crc_tbl[b] = crc2;
  		}
 +		crc_tbl_inited = 1;
  	}
 +
 +	crc = crc ^ 0xffffffffU;
 +	while (len--)
 +		crc = crc_tbl[(crc ^ *buf++) & 0xff] ^ (crc >> 8);
  	return (crc ^ 0xffffffffU);
  }


 --zhXaljGHf11kAtnf--

From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
Date: Mon, 11 Jan 2016 22:49:00 +0100

 -----BEGIN PGP SIGNED MESSAGE-----
 Hash: SHA256

 On 11.01.2016 19:58, Joerg Sonnenberger wrote:
 > On Mon, Jan 11, 2016 at 11:40:01AM +0000, Manuel Bouyer wrote:
 >> The following reply was made to PR bin/50638; it has been noted
 >> by GNATS.
 >> 
 >> From: Manuel Bouyer <bouyer@antioche.eu.org> To:
 >> gnats-bugs@NetBSD.org Cc: gnats-admin@netbsd.org,
 >> netbsd-bugs@netbsd.org, tsutsui@ceres.dti.ne.jp Subject: Re:
 >> bin/50638: Extreme slowness on loading gzipped kernels on
 >> oldCPUs Date: Mon, 11 Jan 2016 12:36:11 +0100
 >> 
 >> On Mon, Jan 11, 2016 at 10:50:01AM +0000, Izumi Tsutsui wrote:
 >>> [...] On i386 and amd64, gzipped kernels are not used at all.
 >> 
 >> that's not true. I use gzipped kernels and modules for tftp boot
 > 
 > It's even better. The install ISO image uses gzipped kernels and
 > also clearly shows that CRC32 verification is done. Ah well, let's
 > not bother anymore. Attached is a working patch that needed 2min of
 > work and slightly more time to test...
 > 

 Is it space cheaper to generate lookup table instead of using a
 pregenerated one?

 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2

 iQIcBAEBCAAGBQJWlCNLAAoJEEuzCOmwLnZsyPQP/00C5jbNhV3zV2skjAkKKRRn
 yGV8L57SnbbjcRksOfg2Ev74j5fGG2g3MmfTYajJMsf7w6F2e38/Itmz6GFzzLWc
 2f7Dnrsw74ubB4PiQBSy5q/B3BxcJEIeYCia7jsqc8hcpXA2VhbeL7xParMW4bq+
 eiXSBoZgZgJxIeQFJz2BsRhQxBkvu2kfdRTts8RBmNWlpNEYYrBb6POuGOg5bOEh
 mdVfFDz/HnAtbUA15XlJq32tAlKRym0NhMm7ckX1CtH2qxaiFUxAofUS0vpwRU2L
 ONUy9b5gNdM+vXkJV2Y/Z/L03eR8OnKbzkK8XW2HLoFdC0bcz//sVGnPXkvgAnkW
 d32GN0ZmY5skjgXcVC9R1dpYNKx0QnIcJx9zXn+HppJJd9K7ngAvsLY1eurbrddj
 C0dssQUF0ZlfmremJwKdPEfx0FWDSAaLTBCwFVN5kGZJqmxSWMaPB/GxNk4grcpQ
 C4yHEHsOrkwCEW8HDvqoOukgUpJnjwRCaUhIzN1jxpXaPvyDeWqyjvWc8gucSMiy
 KgAUvQNCpRw1HGqitwb7s92Pi3l0t3Spppa7LkT0dR3j26cI+IKVAInukMycZhiu
 3ZT/V2RBEd0kDwmhmFs3yGuOjP8OvquSNGylteFmMZRQlqMmVsAe6WBok1Jo24fm
 qiAdppKWJuDOWSUGtD6w
 =oDdF
 -----END PGP SIGNATURE-----

From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, tsutsui@ceres.dti.ne.jp
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
Date: Mon, 11 Jan 2016 23:37:02 +0100

 On Mon, Jan 11, 2016 at 09:50:00PM +0000, Kamil Rytarowski wrote:
 >  Is it space cheaper to generate lookup table instead of using a
 >  pregenerated one?

 See the posted patch. It's a bit difficult to say, but it requires less
 than 100 Bytes for x86's boot.

 Joerg

From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
Date: Tue, 12 Jan 2016 08:10:47 +0100

 -----BEGIN PGP SIGNED MESSAGE-----
 Hash: SHA256

 On 11.01.2016 23:37, Joerg Sonnenberger wrote:
 > On Mon, Jan 11, 2016 at 09:50:00PM +0000, Kamil Rytarowski wrote:
 >> Is it space cheaper to generate lookup table instead of using a 
 >> pregenerated one?
 > 
 > See the posted patch. It's a bit difficult to say, but it requires
 > less than 100 Bytes for x86's boot.
 > 
 > Joerg
 > 

 If it doesn't matter that it's semireentrant it's OK from me.

 IIRC it should be ~10x quicker than the previous one.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2

 iQIcBAEBCAAGBQJWlKbxAAoJEEuzCOmwLnZsWFoP/iPWEerlRtJ9dYv2Sr+9X1sX
 jYWbtq9qPoDh7Ko73mOce0oHFYB9psmmL2culvLP+VpzYkTBfRX3E208xbN5EpAw
 hwnG3KBd2Kws+SIqA1VEsMk/ksY9lxdoGBVBZJ7PULpNJxnPkw4Sjh0kqZRC+hpA
 4NDrkLqF5vn4GclBuduLxgQ/gLSPehch+gRqEr7kyxPBPfOCrRuFcb63njxhFnI0
 1wwszgvE+c655ujnxLluM9KLoKNbAB0/UgsgwznQ7AEn7O4zWt92OXk7GoPzFIb/
 xA2cUfkUZSo59PtiCOim8NVsZpVIXWu2ONqxe4VyUFaIvDuSkrP5kDXqaYUwuHkb
 cpUvS4U0u9KHcpzL4bTgXCwNsErCQXbHjPdqvgcNa8qofZf20SptsFDF52CBreWo
 QYeJdLf8anedEpGP25c8RtIa2YKlrsc+7yUfKrQSFn+tsRD2SPUkM8ZGka1b+4Qg
 tZIhzllvoSO3IL8yus3fntNa4sMYN9BGAH2y8zFITn0c/w3T1A3agtRblpZexA8T
 ay9HVgyjEycU3oq1EN2szLnXubfGdoqDbvSynh5AupVThfVHoQVjndIyDRA0tGNP
 k0qaBkClCotp9PDCktmSkcdEZe226Y2/HRtpgfh3KMZ16ZJuR2m+4jesKYcOkwjc
 cTDXsnvOugkjtexIBZnN
 =1dGf
 -----END PGP SIGNATURE-----

From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, tsutsui@ceres.dti.ne.jp
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
Date: Tue, 12 Jan 2016 10:01:57 +0100

 On Tue, Jan 12, 2016 at 07:15:01AM +0000, Kamil Rytarowski wrote:
 > The following reply was made to PR bin/50638; it has been noted by GNATS.
 > 
 > From: Kamil Rytarowski <n54@gmx.com>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
 > Date: Tue, 12 Jan 2016 08:10:47 +0100
 > 
 >  -----BEGIN PGP SIGNED MESSAGE-----
 >  Hash: SHA256
 >  
 >  On 11.01.2016 23:37, Joerg Sonnenberger wrote:
 >  > On Mon, Jan 11, 2016 at 09:50:00PM +0000, Kamil Rytarowski wrote:
 >  >> Is it space cheaper to generate lookup table instead of using a 
 >  >> pregenerated one?
 >  > 
 >  > See the posted patch. It's a bit difficult to say, but it requires
 >  > less than 100 Bytes for x86's boot.
 >  > 
 >  > Joerg
 >  > 
 >  
 >  If it doesn't matter that it's semireentrant it's OK from me.

 It is reentrant. It doesn't have to be fully thread-safe, but will
 be for any platform providing TSO. Given the constraints libsa operates
 under, that's more than enough.

 Joerg

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: tsutsui@ceres.dti.ne.jp
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
Date: Tue, 12 Jan 2016 12:41:24 +0100

 --k1lZvvs/B4yU6o8G
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 I measured four variants by netbooting the slowest VAX I have available
 right now with a hacked version of script(1), using the -r option to record
 timestamps and a hack to display them (seconds:nanoseconds: $output).
 I removed the twidling and size output. What looks like command input in
 the first line is actually output from the VAX boot when it starts trying
 a new kernel name, so no user interaction involved here anywhere.

 non compressed kernel:

   1452589100:570546000: > boot netbsd
   1452589150:429907000: Copyright(c)...
   = 49.8594s

 gzip'd kernel with -current code in cread.c:

   1452589566:425603000: > boot netbsd
   1452590438:634284000: Copyright(c)...
   = 872.209s

    text    data     bss     dec     hex filename
   61532     900   10236   72668   11bdc /ssd/hosts/vax/usr/mdec/boot


 gzip'd kernel with Joerg's variant:

  1452591021:055044000: > boot netbsd
  1452591461:785939000: Copyright(c)...
  = 440.731s

    text    data     bss     dec     hex filename
   61596     900   11264   73760   12020 /ssd/hosts/vax/usr/mdec/boot


 gzip'd kernel and crc32() returning 0 always:

   1452593154:151872000: > boot netbsd
   1452593527:861325000: Copyright(c)...
   = 373.709s

    text    data     bss     dec     hex filename
   61360     900   10236   72496   11b30 /ssd/hosts/vax/usr/mdec/boot


 DYNAMIC_CRC_TABLE variant:


   1452595355:235578000: > boot netbsd
   1452595738:124262000: Copyright(c)...
   = 382.889s

 which sounds too quick, repeat DYNAMIC_CRC_TABLE test:

   1452596046:563402000: > boot netbsd
   1452596447:310237000: Copyright(c)...
   = 400.747s

    text    data     bss     dec     hex filename
   62538     900   18428   81866   13fca /ssd/hosts/vax/usr/mdec/boot



 The patch I used is attached. It is not optimized for size, but for ease
 of testing all variants.

 Martin

 --k1lZvvs/B4yU6o8G
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename=patch

 Index: cread.c
 ===================================================================
 RCS file: /cvsroot/src/sys/lib/libsa/cread.c,v
 retrieving revision 1.27
 diff -u -r1.27 cread.c
 --- cread.c	25 Jul 2015 07:06:11 -0000	1.27
 +++ cread.c	12 Jan 2016 11:34:29 -0000
 @@ -93,8 +93,45 @@
   * a 4-bit table and at least 8K smaller than the libkern version.
   */
  #ifndef ETHER_CRC_POLY_LE
 -#define ETHER_CRC_POLY_LE	0xedb88320
 +#define ETHER_CRC_POLY_LE	0xedb88320U
  #endif
 +
 +#define	CRC_VARIANT	2	// 0 = none, 1 = joerg's, 2 = -current, 3 = DYNAMIC_CRC_TABLE
 +
 +#if CRC_VARIANT == 0
 +uint32_t
 +crc32(uint32_t crc, const uint8_t *buf, size_t len)
 +{
 +	return 0;
 +}
 +#elif CRC_VARIANT == 1
 +uint32_t
 +crc32(uint32_t crc, const uint8_t *buf, size_t len)
 +{
 +	static volatile int crc_tbl_inited = 0;
 +	static uint32_t crc_tbl[256];
 +
 +	if (!crc_tbl_inited) {
 +		uint32_t crc2, b, i;
 +		for (b = 0; b < 256; ++b) {
 +			crc2 = b;
 +			for (i = 8; i > 0; --i) {
 +				if (crc2 & 1)
 +					crc2 = (crc2 >> 1) ^ ETHER_CRC_POLY_LE;
 +				else    
 +					crc2 = (crc2 >> 1);
 +			}
 +			crc_tbl[b] = crc2;
 +		}
 +		crc_tbl_inited = 1;
 +	}
 +
 +	crc = crc ^ 0xffffffffU;
 +	while (len--)
 +		crc = crc_tbl[(crc ^ *buf++) & 0xff] ^ (crc >> 8);
 +	return (crc ^ 0xffffffffU);
 +}
 +#elif CRC_VARIANT == 2
  uint32_t
  crc32(uint32_t crc, const uint8_t *const buf, size_t len)
  {
 @@ -115,6 +152,186 @@
  	}
  	return (crc ^ 0xffffffffU);
  }
 +#elif CRC_VARIANT == 3
 +
 +static volatile bool crc_table_empty = true;
 +static uint32_t crc_table[8][256];
 +static void make_crc_table(void);
 +
 +typedef uint32_t u4;
 +
 +/* Definitions for doing the crc four data bytes at a time. */
 +#define REV(w) (((w)>>24)+(((w)>>8)&0xff00)+ \
 +               (((w)&0xff00)<<8)+(((w)&0xff)<<24))
 +
 +/*
 +  Generate tables for a byte-wise 32-bit CRC calculation on the polynomial:
 +  x^32+x^26+x^23+x^22+x^16+x^12+x^11+x^10+x^8+x^7+x^5+x^4+x^2+x+1.
 +
 +  Polynomials over GF(2) are represented in binary, one bit per coefficient,
 +  with the lowest powers in the most significant bit.  Then adding polynomials
 +  is just exclusive-or, and multiplying a polynomial by x is a right shift by
 +  one.  If we call the above polynomial p, and represent a byte as the
 +  polynomial q, also with the lowest power in the most significant bit (so the
 +  byte 0xb1 is the polynomial x^7+x^3+x+1), then the CRC is (q*x^32) mod p,
 +  where a mod b means the remainder after dividing a by b.
 +
 +  This calculation is done using the shift-register method of multiplying and
 +  taking the remainder.  The register is initialized to zero, and for each
 +  incoming bit, x^32 is added mod p to the register if the bit is a one (where
 +  x^32 mod p is p+x^32 = x^26+...+1), and the register is multiplied mod p by
 +  x (which is shifting right by one and adding x^32 mod p if the bit shifted
 +  out is a one).  We start with the highest power (least significant bit) of
 +  q and repeat for all eight bits of q.
 +
 +  The first table is simply the CRC of all possible eight bit values.  This is
 +  all the information needed to generate CRCs on data a byte at a time for all
 +  combinations of CRC register values and incoming bytes.  The remaining tables
 +  allow for word-at-a-time CRC calculation for both big-endian and little-
 +  endian machines, where a word is four bytes.
 +*/
 +static void make_crc_table(void)
 +{
 +    uint32_t c;
 +    int n, k;
 +    uint32_t poly;                      /* polynomial exclusive-or pattern */
 +    /* terms of polynomial defining this crc (except x^32): */
 +    static volatile bool first = true;  /* flag to limit concurrent making */
 +    static const unsigned char p[] = {0,1,2,4,5,7,8,10,11,12,16,22,23,26};
 +
 +    /* See if another task is already doing this (not thread-safe, but better
 +       than nothing -- significantly reduces duration of vulnerability in
 +       case the advice about DYNAMIC_CRC_TABLE is ignored) */
 +    if (first) {
 +        first = false;
 +
 +        /* make exclusive-or pattern from polynomial (0xedb88320UL) */
 +        poly = 0;
 +        for (n = 0; n < sizeof(p)/sizeof(unsigned char); n++)
 +            poly |= 1 << (31 - p[n]);
 +
 +        /* generate a crc for every 8-bit value */
 +        for (n = 0; n < 256; n++) {
 +            c = (uint32_t)n;
 +            for (k = 0; k < 8; k++)
 +                c = c & 1 ? poly ^ (c >> 1) : c >> 1;
 +            crc_table[0][n] = c;
 +        }
 +
 +        /* generate crc for each value followed by one, two, and three zeros,
 +           and then the byte reversal of those as well as the first table */
 +        for (n = 0; n < 256; n++) {
 +            c = crc_table[0][n];
 +            crc_table[4][n] = REV(c);
 +            for (k = 1; k < 4; k++) {
 +                c = crc_table[0][c & 0xff] ^ (c >> 8);
 +                crc_table[k][n] = c;
 +                crc_table[k + 4][n] = REV(c);
 +            }
 +        }
 +
 +        crc_table_empty = false;
 +    }
 +    else {      /* not first */
 +        /* wait for the other guy to finish (not efficient, but rare) */
 +        while (crc_table_empty)
 +            ;
 +    }
 +}
 +
 +#if BYTE_ORDER == LITTLE_ENDIAN
 +/* ========================================================================= */
 +#define DOLIT4 c ^= *buf4++; \
 +        c = crc_table[3][c & 0xff] ^ crc_table[2][(c >> 8) & 0xff] ^ \
 +            crc_table[1][(c >> 16) & 0xff] ^ crc_table[0][c >> 24]
 +#define DOLIT32 DOLIT4; DOLIT4; DOLIT4; DOLIT4; DOLIT4; DOLIT4; DOLIT4; DOLIT4
 +
 +/* ========================================================================= */
 +uint32_t crc32(uint32_t crc, const uint8_t *buf, size_t len)
 +{
 +    register u4 c;
 +    register const u4 *buf4;
 +
 +    if (buf == NULL) return 0UL;
 +
 +    if (crc_table_empty)
 +        make_crc_table();
 +
 +    c = (u4)crc;
 +    c = ~c;
 +    while (len && ((uintptr_t)buf & 3)) {
 +        c = crc_table[0][(c ^ *buf++) & 0xff] ^ (c >> 8);
 +        len--;
 +    }
 +
 +    buf4 = (const u4 *)(const void *)buf;
 +    while (len >= 32) {
 +        DOLIT32;
 +        len -= 32;
 +    }
 +    while (len >= 4) {
 +        DOLIT4;
 +        len -= 4;
 +    }
 +    buf = (const unsigned char *)buf4;
 +
 +    if (len) do {
 +        c = crc_table[0][(c ^ *buf++) & 0xff] ^ (c >> 8);
 +    } while (--len);
 +    c = ~c;
 +    return (uint32_t)c;
 +}
 +
 +#else /* BIG_ENDIAN */
 +
 +/* ========================================================================= */
 +#define DOBIG4 c ^= *++buf4; \
 +        c = crc_table[4][c & 0xff] ^ crc_table[5][(c >> 8) & 0xff] ^ \
 +            crc_table[6][(c >> 16) & 0xff] ^ crc_table[7][c >> 24]
 +#define DOBIG32 DOBIG4; DOBIG4; DOBIG4; DOBIG4; DOBIG4; DOBIG4; DOBIG4; DOBIG4
 +
 +/* ========================================================================= */
 +uint32_t crc32(uint32_t crc, const uint8_t *buf, size_t len)
 +{
 +    register u4 c;
 +    register const u4 *buf4;
 +
 +    if (buf == NULL) return 0UL;
 +
 +    if (crc_table_empty)
 +        make_crc_table();
 +
 +    c = REV((u4)crc);
 +    c = ~c;
 +    while (len && ((uintptr_t)buf & 3)) {
 +        c = crc_table[4][(c >> 24) ^ *buf++] ^ (c << 8);
 +        len--;
 +    }
 +
 +    buf4 = (const u4 *)(const void *)buf;
 +    buf4--;
 +    while (len >= 32) {
 +        DOBIG32;
 +        len -= 32;
 +    }
 +    while (len >= 4) {
 +        DOBIG4;
 +        len -= 4;
 +    }
 +    buf4++;
 +    buf = (const unsigned char *)buf4;
 +
 +    if (len) do {
 +        c = crc_table[4][(c >> 24) ^ *buf++] ^ (c << 8);
 +    } while (--len);
 +    c = ~c;
 +    return (uint32_t)(REV(c));
 +}
 +#endif
 +
 +#else
 +#error something is wrong
 +#endif

  /*
   * compression utilities

 --k1lZvvs/B4yU6o8G--

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org, tsutsui@ceres.dti.ne.jp
Cc: 
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
Date: Tue, 12 Jan 2016 13:31:26 +0100

 And the ideal variant would do it in:

 >  non compressed kernel:
 >  
 >    1452589100:570546000: > boot netbsd
 >    1452589150:429907000: Copyright(c)...
 >    = 49.8594s

 + total time for one full decompression:

 # ls -l netbsd
 -rw-r--r--  1 root  wheel  1827667 Jan 12 09:04 netbsd
 # file netbsd
 netbsd: gzip compressed data, last modified: Tue Jan 12 08:05:45 2016, max compression, from Unix
 # /usr/bin/time gzcat netbsd >/dev/null
        29.96 real        23.04 user         1.78 sys

 ... so like 80s.


 Martin

From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, tsutsui@ceres.dti.ne.jp
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
Date: Wed, 13 Jan 2016 16:36:13 +0100

 On Tue, Jan 12, 2016 at 12:35:01PM +0000, Martin Husemann wrote:
 > The following reply was made to PR bin/50638; it has been noted by GNATS.
 > 
 > From: Martin Husemann <martin@duskware.de>
 > To: gnats-bugs@netbsd.org, tsutsui@ceres.dti.ne.jp
 > Cc: 
 > Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on oldCPUs
 > Date: Tue, 12 Jan 2016 13:31:26 +0100
 > 
 >  And the ideal variant would do it in:
 >  
 >  >  non compressed kernel:
 >  >  
 >  >    1452589100:570546000: > boot netbsd
 >  >    1452589150:429907000: Copyright(c)...
 >  >    = 49.8594s
 >  
 >  + total time for one full decompression:
 >  
 >  # ls -l netbsd
 >  -rw-r--r--  1 root  wheel  1827667 Jan 12 09:04 netbsd
 >  # file netbsd
 >  netbsd: gzip compressed data, last modified: Tue Jan 12 08:05:45 2016, max compression, from Unix
 >  # /usr/bin/time gzcat netbsd >/dev/null
 >         29.96 real        23.04 user         1.78 sys
 >  
 >  ... so like 80s.

 This strongly suggests that a lot of seeking happens and the input is
 processed multiple times. Is there a very good reason for not just
 sucking in the whole kernel in one go or at least use a resize+read loop
 once we know that it is an ELF binary we are interested in?

 Joerg

From: "Izumi Tsutsui" <tsutsui@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/50638 CVS commit: src/sys/lib/libsa
Date: Sun, 17 Jan 2016 04:35:42 +0000

 Module Name:	src
 Committed By:	tsutsui
 Date:		Sun Jan 17 04:35:42 UTC 2016

 Modified Files:
 	src/sys/lib/libsa: cread.c

 Log Message:
 Add an option (LIBSA_CREAD_NOCRC) to disable gunzip CRC32 calculation.

 No obvious sideeffect on booting i386 GENERIC kernels (without the option).
 Closes PR/50638 (Extreme slowness on loading gzipped kernels on old CPUs).


 To generate a diff of this commit:
 cvs rdiff -u -r1.27 -r1.28 src/sys/lib/libsa/cread.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Izumi Tsutsui" <tsutsui@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/50638 CVS commit: src/sys/arch/luna68k/stand/boot
Date: Sun, 17 Jan 2016 04:40:10 +0000

 Module Name:	src
 Committed By:	tsutsui
 Date:		Sun Jan 17 04:40:10 UTC 2016

 Modified Files:
 	src/sys/arch/luna68k/stand/boot: Makefile version

 Log Message:
 Enable LIBSA_CREAD_NOCRC.  PR/50638

 Also bump version to denote user visible change.
 Tested on LUNA-II.


 To generate a diff of this commit:
 cvs rdiff -u -r1.11 -r1.12 src/sys/arch/luna68k/stand/boot/Makefile \
     src/sys/arch/luna68k/stand/boot/version

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Izumi Tsutsui" <tsutsui@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/50638 CVS commit: src/sys/arch/x68k/stand
Date: Sun, 17 Jan 2016 04:47:59 +0000

 Module Name:	src
 Committed By:	tsutsui
 Date:		Sun Jan 17 04:47:59 UTC 2016

 Modified Files:
 	src/sys/arch/x68k/stand/boot: version
 	src/sys/arch/x68k/stand/libsa: Makefile
 	src/sys/arch/x68k/stand/netboot: version

 Log Message:
 Enable LIBSA_CREAD_NOCRC.  PR/50638

 Bump version to denote user visible change.
  XXX: recent visible changes (memsize probe, SRAM switch command)
       were not denoted in versions
 Tested on (real) X68030.


 To generate a diff of this commit:
 cvs rdiff -u -r1.6 -r1.7 src/sys/arch/x68k/stand/boot/version
 cvs rdiff -u -r1.30 -r1.31 src/sys/arch/x68k/stand/libsa/Makefile
 cvs rdiff -u -r1.1 -r1.2 src/sys/arch/x68k/stand/netboot/version

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Izumi Tsutsui" <tsutsui@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/50638 CVS commit: src/sys/arch/news68k/stand
Date: Sun, 17 Jan 2016 04:50:37 +0000

 Module Name:	src
 Committed By:	tsutsui
 Date:		Sun Jan 17 04:50:37 UTC 2016

 Modified Files:
 	src/sys/arch/news68k/stand/boot: version
 	src/sys/arch/news68k/stand/common: Makefile

 Log Message:
 Enable LIBSA_CREAD_NOCRC.  PR/50638

 Bump version to denote user visible change.
 Tested on NWS-1750.


 To generate a diff of this commit:
 cvs rdiff -u -r1.4 -r1.5 src/sys/arch/news68k/stand/boot/version
 cvs rdiff -u -r1.14 -r1.15 src/sys/arch/news68k/stand/common/Makefile

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Izumi Tsutsui" <tsutsui@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/50638 CVS commit: src/sys/arch/vax/boot/boot
Date: Sun, 17 Jan 2016 04:53:16 +0000

 Module Name:	src
 Committed By:	tsutsui
 Date:		Sun Jan 17 04:53:16 UTC 2016

 Modified Files:
 	src/sys/arch/vax/boot/boot: Makefile version

 Log Message:
 Enable LIBSA_CREAD_NOCRC.  PR/50638

 Bump version to denote user visible change.
 Tested on simh 4.0-Beta1 emulationg MicroVAX 3900.


 To generate a diff of this commit:
 cvs rdiff -u -r1.43 -r1.44 src/sys/arch/vax/boot/boot/Makefile
 cvs rdiff -u -r1.7 -r1.8 src/sys/arch/vax/boot/boot/version

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Izumi Tsutsui" <tsutsui@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/50638 CVS commit: src/sys/arch/hp300/stand
Date: Sun, 17 Jan 2016 08:05:20 +0000

 Module Name:	src
 Committed By:	tsutsui
 Date:		Sun Jan 17 08:05:20 UTC 2016

 Modified Files:
 	src/sys/arch/hp300/stand: Makefile.buildboot
 	src/sys/arch/hp300/stand/inst: version
 	src/sys/arch/hp300/stand/uboot: version

 Log Message:
 Enable LIBSA_CREAD_NOCRC.  PR/50638

 Bump version to denote user visible change.
 Tested on HP9000/382.


 To generate a diff of this commit:
 cvs rdiff -u -r1.34 -r1.35 src/sys/arch/hp300/stand/Makefile.buildboot
 cvs rdiff -u -r1.12 -r1.13 src/sys/arch/hp300/stand/inst/version
 cvs rdiff -u -r1.19 -r1.20 src/sys/arch/hp300/stand/uboot/version

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: tsutsui@NetBSD.org
State-Changed-When: Sun, 17 Jan 2016 08:20:18 +0000
State-Changed-Why:
Now load speed is acceptable (at least for me).


From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: joerg@britannica.bec.de
Cc: gnats-bugs@NetBSD.org, tsutsui@ceres.dti.ne.jp
Subject: Re: bin/50638: Extreme slowness on loading gzipped kernels on old CPUs
Date: Sun, 17 Jan 2016 17:15:06 +0900

 joerg@ wrote:

 > This strongly suggests that a lot of seeking happens and the input is
 > processed multiple times. Is there a very good reason for not just
 > sucking in the whole kernel in one go or at least use a resize+read loop
 > once we know that it is an ELF binary we are interested in?

 It's a known and separate issue, filed as PR/38943.

 ---
 Izumi Tsutsui

From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, tsutsui@ceres.dti.ne.jp
Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa
Date: Sun, 17 Jan 2016 13:38:50 +0100

 On Sun, Jan 17, 2016 at 04:40:01AM +0000, Izumi Tsutsui wrote:
 > The following reply was made to PR bin/50638; it has been noted by GNATS.
 > 
 > From: "Izumi Tsutsui" <tsutsui@netbsd.org>
 > To: gnats-bugs@gnats.NetBSD.org
 > Cc: 
 > Subject: PR/50638 CVS commit: src/sys/lib/libsa
 > Date: Sun, 17 Jan 2016 04:35:42 +0000
 > 
 >  Module Name:	src
 >  Committed By:	tsutsui
 >  Date:		Sun Jan 17 04:35:42 UTC 2016
 >  
 >  Modified Files:
 >  	src/sys/lib/libsa: cread.c
 >  
 >  Log Message:
 >  Add an option (LIBSA_CREAD_NOCRC) to disable gunzip CRC32 calculation.
 >  
 >  No obvious sideeffect on booting i386 GENERIC kernels (without the option).
 >  Closes PR/50638 (Extreme slowness on loading gzipped kernels on old CPUs).

 Please revert this.

 (1) The commit message is wrong, it has been clearly demonstrated that
 the CRC code *does* provide obvious side-effects.

 (2) The offered patches fixes the stated original original problem to
 within a small degree of the cost of uncompression.

 (3) Fixing the real problem of repeated uncompression would make (2)
 even less a problem.

 Stop adding hacks on top of hacks.

 Joerg

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: joerg@britannica.bec.de
Cc: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
        tsutsui@ceres.dti.ne.jp
Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa
Date: Sun, 17 Jan 2016 21:59:23 +0900

 > (1) The commit message is wrong, it has been clearly demonstrated that
 > the CRC code *does* provide obvious side-effects.

 I just noted a test result of "booting i386 kernel without the option."
 If you see something that is obvious, can you file a new PR?

 > (2) The offered patches fixes the stated original original problem to
 > within a small degree of the cost of uncompression.

 I see no reason to revert it because it isn't default.
 Most ports require "smallest or fastet" and your patch doesn't
 provide either.

 > (3) Fixing the real problem of repeated uncompression would make (2)
 > even less a problem.

 Once you can provide real fixes, you can apply it independently.
 Even in that case, "no CRC check" (or zlib version) is still
 faster than yours.

 ---
 Izumi Tsutsui 

From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
Cc: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa
Date: Sun, 17 Jan 2016 14:50:36 +0100

 On Sun, Jan 17, 2016 at 09:59:23PM +0900, Izumi Tsutsui wrote:
 > > (1) The commit message is wrong, it has been clearly demonstrated that
 > > the CRC code *does* provide obvious side-effects.
 > 
 > I just noted a test result of "booting i386 kernel without the option."
 > If you see something that is obvious, can you file a new PR?

 No, you said it is without side effects. That's wrong -- before bad CRC
 was detected, now it isn't.

 > > (2) The offered patches fixes the stated original original problem to
 > > within a small degree of the cost of uncompression.
 > 
 > I see no reason to revert it because it isn't default.
 > Most ports require "smallest or fastet" and your patch doesn't
 > provide either.

 No, most ports only require small enough and fast enough. Don't invent
 things just because you feel like it. The patch I offered less than
 doubles the space required for crc32 (i.e. ~100 to ~180 Bytes) and
 pushes it within a factor of 2 of the optimal results. The alternative
 might be faster, but is also significant larger (i.e. > 1KB). Even on
 martin's test, that would be less than 3s which to me seems quite
 acceptable. Your hack just removes functionality without any
 cost/benefit analysis. More importantly, it hacks around the symptoms
 and doesn't solve the real problem.

 > > (3) Fixing the real problem of repeated uncompression would make (2)
 > > even less a problem.
 > 
 > Once you can provide real fixes, you can apply it independently.
 > Even in that case, "no CRC check" (or zlib version) is still
 > faster than yours.

 If you care so much about fast, why are you compressing in first place?
 No CRC check is a significant regression and fixing the real problem
 means that the CRC check simply isn't relevant any more. Just like the
 LOAD_NOTE hacks you references in your older problems, you are just
 adding more special case hacks instead of fixing the real problem.
 That's not how NetBSD is supposed to work.

 Joerg

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa
Date: Sun, 17 Jan 2016 14:04:49 +0000 (UTC)

 joerg@britannica.bec.de (Joerg Sonnenberger) writes:

 >cost/benefit analysis. More importantly, it hacks around the symptoms
 >and doesn't solve the real problem.

 I agree with you that removing the crc check isn't necessary. But
 what do you think the real problem is? A non-optimal math routine or
 a feature being impractical for the user?

 -- 
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, tsutsui@ceres.dti.ne.jp
Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa
Date: Sun, 17 Jan 2016 15:19:50 +0100

 On Sun, Jan 17, 2016 at 02:10:01PM +0000, Michael van Elst wrote:
 > The following reply was made to PR bin/50638; it has been noted by GNATS.
 > 
 > From: mlelstv@serpens.de (Michael van Elst)
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa
 > Date: Sun, 17 Jan 2016 14:04:49 +0000 (UTC)
 > 
 >  joerg@britannica.bec.de (Joerg Sonnenberger) writes:
 >  
 >  >cost/benefit analysis. More importantly, it hacks around the symptoms
 >  >and doesn't solve the real problem.
 >  
 >  I agree with you that removing the crc check isn't necessary. But
 >  what do you think the real problem is? A non-optimal math routine or
 >  a feature being impractical for the user?

 Look back at the earlier number -- loading a compressed kernel needs ten
 times as much time as zcat from userland. The amount of time spend in
 crc32 in either variant, even the bad current libsa version, pales in
 comparision. The reason for that factor is almost guaranteed the seeking
 in loadfile. I asked if there is a good reason for seeking (or
 discarding the data for that matter) before the recent commits, since
 that is the issue where 90% of the wasted times goes. I still haven't
 received an answer to that question.

 Joerg

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: joerg@britannica.bec.de
Cc: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
        tsutsui@ceres.dti.ne.jp
Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa
Date: Sun, 17 Jan 2016 23:25:43 +0900

 joerg@ wrote:

 > On Sun, Jan 17, 2016 at 09:59:23PM +0900, Izumi Tsutsui wrote:
 > > > (1) The commit message is wrong, it has been clearly demonstrated that
 > > > the CRC code *does* provide obvious side-effects.
 > > 
 > > I just noted a test result of "booting i386 kernel without the option."
 > > If you see something that is obvious, can you file a new PR?
 > 
 > No, you said it is without side effects. That's wrong -- before bad CRC
 > was detected, now it isn't.

 I don't understand what you mean.

 On i386, the new option LIBSA_CREAD_NOCRC is not enabled
 so no functional change is intended.

 > > > (2) The offered patches fixes the stated original original problem to
 > > > within a small degree of the cost of uncompression.
 > > 
 > > I see no reason to revert it because it isn't default.
 > > Most ports require "smallest or fastet" and your patch doesn't
 > > provide either.
 > 
 > No, most ports only require small enough and fast enough.

 It's your opinion.  On the other hand, martin@ (who have also
 worked on the same issue for vax etc.) aggreed disabling CRC.
 Furthermore, CRC values are not checked at all on several ports.

 If someone really want "small enough and fast enough" method,
 he/she can introduce another option like LIBSA_CREAD_LIBARCHIVECRC
 with proper measurements, implementation, and tests, as a test
 patch martin@ provieded.

 > > Once you can provide real fixes, you can apply it independently.
 > > Even in that case, "no CRC check" (or zlib version) is still
 > > faster than yours.
 > 
 > If you care so much about fast, why are you compressing in first place?

 I (and several Tier II users) have seen the problem for long time
 and there are several PRs for it, but there is no progress or analysis.

 Adding an option is a enough good compromise for me because
 I (and others) have limited spare time and I don't have any
 interest in such "best or nothing" strategy which only worked
 with enough human resources and motivations.

 ---
 Izumi Tsutsui

From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
Cc: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa
Date: Sun, 17 Jan 2016 15:39:36 +0100

 On Sun, Jan 17, 2016 at 11:25:43PM +0900, Izumi Tsutsui wrote:
 > > > > (2) The offered patches fixes the stated original original problem to
 > > > > within a small degree of the cost of uncompression.
 > > > 
 > > > I see no reason to revert it because it isn't default.
 > > > Most ports require "smallest or fastet" and your patch doesn't
 > > > provide either.
 > > 
 > > No, most ports only require small enough and fast enough.
 > 
 > It's your opinion.  On the other hand, martin@ (who have also
 > worked on the same issue for vax etc.) aggreed disabling CRC.
 > Furthermore, CRC values are not checked at all on several ports.

 He agreed before the fourth option was available or any hard numbers.
 CRC values are checked if the file is read completely. Whether that
 happens or not depends on a variety of factors, most of them are not
 even MD. That argument has been disproven already.

 > If someone really want "small enough and fast enough" method,
 > he/she can introduce another option like LIBSA_CREAD_LIBARCHIVECRC
 > with proper measurements, implementation, and tests, as a test
 > patch martin@ provieded.

 I find your attitude to be quite annoying. There is an implementation,
 it is tested and it is even measured. You ignored all that.

 > > > Once you can provide real fixes, you can apply it independently.
 > > > Even in that case, "no CRC check" (or zlib version) is still
 > > > faster than yours.
 > > 
 > > If you care so much about fast, why are you compressing in first place?
 > 
 > I (and several Tier II users) have seen the problem for long time
 > and there are several PRs for it, but there is no progress or analysis.

 OK, I guess you didn't bother to read the content of this PR at all
 after making up your mind.

 > Adding an option is a enough good compromise for me because
 > I (and others) have limited spare time and I don't have any
 > interest in such "best or nothing" strategy which only worked
 > with enough human resources and motivations.

 You still haven't answered my question of why seeking is beneficals. You
 pushed your own hack through, completely ignoring the analysis. It has
 been clearly demonstrated that CRC32 is not the majority of the wasted
 time. Removing it (vs just committing the already written patch) does
 not remove fix the problem, it just adds more complications. Your
 behavior is removing any motivation I have for fixing the real problem.

 So I am asking you one last time: please revert your hacks.

 Joerg

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: joerg@britannica.bec.de
Cc: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
        tsutsui@ceres.dti.ne.jp
Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa
Date: Mon, 18 Jan 2016 00:13:04 +0900

 joerg@ wrote:

 > He agreed before the fourth option was available or any hard numbers.
 > CRC values are checked if the file is read completely.

 On several ports, whole file is not read.
 That's enough reason to disable it by MD options.

 > I find your attitude to be quite annoying. There is an implementation,
 > it is tested and it is even measured. You ignored all that.

 No diffs which can be committed as is.
 You always ignore actual work to be done
 on real software development.

 > You still haven't answered my question of why seeking is beneficals. You
 > pushed your own hack through, completely ignoring the analysis.

 You should ask DTRACE (or other) persons who introdueced the changes
 into MI libsa loadfile(), not me. There are PRs for it but no answer.

 I won't block such generic MI improvements just to avoid slowness
 on poor Tier-II ports.

 ---
 Izumi Tsutsui

From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
Cc: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa
Date: Sun, 17 Jan 2016 16:30:30 +0100

 On Mon, Jan 18, 2016 at 12:13:04AM +0900, Izumi Tsutsui wrote:
 > joerg@ wrote:
 > 
 > > He agreed before the fourth option was available or any hard numbers.
 > > CRC values are checked if the file is read completely.
 > 
 > On several ports, whole file is not read.
 > That's enough reason to disable it by MD options.

 Again, whether it is read completely or not depends on a variety of
 factors, many of them are more accidental than not.

 > > I find your attitude to be quite annoying. There is an implementation,
 > > it is tested and it is even measured. You ignored all that.
 > 
 > No diffs which can be committed as is.
 > You always ignore actual work to be done
 > on real software development.

 Oh right, I missed that I posted a working diff to cread.c. I guess
 Martin missed it too, when he used it in his benchmarks.

 > > You still haven't answered my question of why seeking is beneficals. You
 > > pushed your own hack through, completely ignoring the analysis.
 > 
 > You should ask DTRACE (or other) persons who introdueced the changes
 > into MI libsa loadfile(), not me. There are PRs for it but no answer.
 > 
 > I won't block such generic MI improvements just to avoid slowness
 > on poor Tier-II ports.

 Stop with your silly tier II conspiricy. The simply problem is that
 noone on x86 will care about reading a few extra KB from hard disk and
 it is potentially even harmful when booting from a real CD. So whether
 seeking ever provides an improvement matters most on those slow and
 somewhat memory constrained systemed systems where the current
 decompression (mis)handling also hurts the most. As such the question is
 perfectly well aimed at tier II port users.

 Joerg

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: joerg@britannica.bec.de
Cc: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
        tsutsui@ceres.dti.ne.jp
Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa
Date: Mon, 18 Jan 2016 00:58:14 +0900

 joerg@ wrote:

 > > On several ports, whole file is not read.
 > > That's enough reason to disable it by MD options.
 > 
 > Again, whether it is read completely or not depends on a variety of
 > factors, many of them are more accidental than not.

 loadfile() takes "flags" args. They are specified in MD sources.

 > > > I find your attitude to be quite annoying. There is an implementation,
 > > > it is tested and it is even measured. You ignored all that.
 > > 
 > > No diffs which can be committed as is.
 > > You always ignore actual work to be done
 > > on real software development.
 > 
 > Oh right, I missed that I posted a working diff to cread.c. I guess
 > Martin missed it too, when he used it in his benchmarks.

 Your patch completely replace current implementation and
 in such case it affects all ports.  Who have tested them?

 > > > You still haven't answered my question of why seeking is beneficals. You
 > > > pushed your own hack through, completely ignoring the analysis.
 > > 
 > > You should ask DTRACE (or other) persons who introdueced the changes
 > > into MI libsa loadfile(), not me. There are PRs for it but no answer.
 > > 
 > > I won't block such generic MI improvements just to avoid slowness
 > > on poor Tier-II ports.
 > 
 > Stop with your silly tier II conspiricy. The simply problem is that
 > noone on x86 will care about reading a few extra KB from hard disk and
 > it is potentially even harmful when booting from a real CD. So whether
 > seeking ever provides an improvement matters most on those slow and
 > somewhat memory constrained systemed systems where the current
 > decompression (mis)handling also hurts the most. As such the question is
 > perfectly well aimed at tier II port users.

 Again, ask persons who committed the changes, or consult Core.
 IIRC there was no seek at least on 4.x days.

 ---
 Izumi Tsutsui

From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa
Date: Sun, 17 Jan 2016 17:13:29 +0100

 -----BEGIN PGP SIGNED MESSAGE-----
 Hash: SHA256

 On 17.01.2016 17:00, Izumi Tsutsui wrote:
 > Your patch completely replace current implementation and in such
 > case it affects all ports.  Who have tested them?
 > 

 Do you see any possible side effects in the crc32 patch? Undefined
 behavior?

 If not, there are two dozens of lines of code that will work equally
 well on arm64 and vax.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2

 iQIcBAEBCAAGBQJWm72oAAoJEEuzCOmwLnZsKOYP/RcJn4R0uH/ILi1MDEnHr77j
 622D7PQGUtbnU8YVy6QucfWDVP67Sn8nOuv/zpbRWkdVMMbytGzjXsiSFzEwK7Db
 /89L+K1MrBfCM3CLedwjIoZMa0vvgT/y0XdLxDUfQMQrZaDgGKERdmAUZepVspAy
 gWbpSi54qdUG3EWwvMWLRL9NH6a0G9BqAg19SavQqiXTHkNx+Kn1PiKozaAYiRXJ
 uapKcSzXPj5vaH5jIwHXy87GKyLxMas/rZMarcBFGpcAbnc1V40NQe0qw3oeL7S6
 xZLerMTHxhf0WxVFdHwREC/cZW4TsnCur9S8MAJoDkHIhYAcwZos2ddBTCjCD2mc
 SSB+QayMRuWeA6RIE5WehWvc7BbsscKqE0CGAiIJ+7/tCwYiEgoXFCzFOqG3GT1K
 W8k7BTrimOUgxhJSzex+aULVGYzBBYWaJohwjrTUY2yEoc2uWWsPFt1a3Ucb+Rfd
 Pr2EpIPx9xsVmU43298UOOnF230XGm3bYAgPYeCXxMokXtaJPnMeYBhwfFpRsTFF
 f6uOk6R0w8YGmxwg7FgKI5GCrNKa2TttToKkWUbXvNwKn3KvNQcmu6XpXBBByOH/
 cJBDxb1BN2Hjv+S7xlDp28C6Y3GhI4mbtz5aU6CqdmJq4aKxR94NogH4KH0G6kb+
 3RLSLfSODB2eV0LuSjYu
 =JAKA
 -----END PGP SIGNATURE-----

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: n54@gmx.com
Cc: gnats-bugs@NetBSD.org, tsutsui@ceres.dti.ne.jp
Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa
Date: Mon, 18 Jan 2016 01:48:10 +0900

 >  > Your patch completely replace current implementation and in such
 >  > case it affects all ports.  Who have tested them?
 >  > 
 >  
 >  Do you see any possible side effects in the crc32 patch? Undefined
 >  behavior?

 It might trigger ramdisk overflow, compiler bugs etc.
 I would like to avoid extra risks without particular benefits.

 ---
 Izumi Tsutsui

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	tsutsui@ceres.dti.ne.jp
Cc: 
Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa
Date: Sun, 17 Jan 2016 11:52:00 -0500

 On Jan 17,  4:50pm, tsutsui@ceres.dti.ne.jp (Izumi Tsutsui) wrote:
 -- Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa

 |  It might trigger ramdisk overflow, compiler bugs etc.
 |  I would like to avoid extra risks without particular benefits.

 Well, if we follow this reasoning we might as well never make any changes
 to anything.

 christos

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: christos@zoulas.com
Cc: gnats-bugs@NetBSD.org, tsutsui@ceres.dti.ne.jp
Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa
Date: Mon, 18 Jan 2016 02:14:28 +0900

 christos@ wrote:

 > |  It might trigger ramdisk overflow, compiler bugs etc.
 > |  I would like to avoid extra risks without particular benefits.
 > 
 > Well, if we follow this reasoning we might as well never make any changes
 > to anything.

 Even with the LIBSA_CREAD_NOCRC option, anyone can add yet another
 crc32() implementation.  You don't have to follow me because there is
 no conflict, except personal likes.

 Note disabling CRC is still smaller than suggested one and
 I believe it's a valid MD option.

 If you also object my changes as a Core member, please revert them.

 ---
 Izumi Tsutsui

From: christos@zoulas.com (Christos Zoulas)
To: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
Cc: gnats-bugs@NetBSD.org
Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa
Date: Sun, 17 Jan 2016 12:17:40 -0500

 On Jan 18,  2:14am, tsutsui@ceres.dti.ne.jp (Izumi Tsutsui) wrote:
 -- Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa

 | Even with the LIBSA_CREAD_NOCRC option, anyone can add yet another
 | crc32() implementation.  You don't have to follow me because there is
 | no conflict, except personal likes.
 | 
 | Note disabling CRC is still smaller than suggested one and
 | I believe it's a valid MD option.
 | 
 | If you also object my changes as a Core member, please revert them.

 I have not looked closely enough to say either way what needs to be
 done... BTW atari has been out of space for a while... What should
 we do?

 christos

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: christos@zoulas.com
Cc: gnats-bugs@NetBSD.org, tsutsui@ceres.dti.ne.jp
Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa
Date: Mon, 18 Jan 2016 03:00:28 +0900

 christos@ wrote:

 > BTW atari has been out of space for a while... What should
 > we do?

 File a PR as a known issue?
  http://mail-index.netbsd.org/port-atari/2015/07/19/msg000540.html

 Note (un)fortunately sys/arch/atari/stand/bootxxx loader doesn't
 seem to define SA_USE_CREAD so this PR won't affect it.

 ---
 Izumi Tsutsui

From: christos@zoulas.com (Christos Zoulas)
To: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
Cc: gnats-bugs@NetBSD.org
Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa
Date: Sun, 17 Jan 2016 13:15:18 -0500

 On Jan 18,  3:00am, tsutsui@ceres.dti.ne.jp (Izumi Tsutsui) wrote:
 -- Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa

 | christos@ wrote:
 | 
 | > BTW atari has been out of space for a while... What should
 | > we do?
 | 
 | File a PR as a known issue?
 |  http://mail-index.netbsd.org/port-atari/2015/07/19/msg000540.html

 Yes, that is great information and we should put it in a README file
 in stand.

 | Note (un)fortunately sys/arch/atari/stand/bootxxx loader doesn't
 | seem to define SA_USE_CREAD so this PR won't affect it.

 Right.

 christos

From: Kamil Rytarowski <n54@gmx.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa
Date: Sun, 17 Jan 2016 23:20:18 +0100

 -----BEGIN PGP SIGNED MESSAGE-----
 Hash: SHA256

 On 17.01.2016 17:50, Izumi Tsutsui wrote:
 > The following reply was made to PR bin/50638; it has been noted by
 > GNATS.
 > 
 > From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp> To: n54@gmx.com Cc:
 > gnats-bugs@NetBSD.org, tsutsui@ceres.dti.ne.jp Subject: Re:
 > PR/50638 CVS commit: src/sys/lib/libsa Date: Mon, 18 Jan 2016
 > 01:48:10 +0900
 > 
 >>> Your patch completely replace current implementation and in
 >>> such case it affects all ports.  Who have tested them?
 >>> 
 >> 
 >> Do you see any possible side effects in the crc32 patch?
 >> Undefined behavior?
 > 
 > It might trigger ramdisk overflow, compiler bugs etc. I would like
 > to avoid extra risks without particular benefits.
 > 

 RAMDISK overflow danger is a valid reason to remove crc32.

 If it happens it would be beneficial to remove it from all ports. For
 the sake of simplicity. It's not worth to make it MD.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2

 iQIcBAEBCAAGBQJWnBOgAAoJEEuzCOmwLnZsGTEP/Rwa1fRPfZjQPaXxgG1Hyb18
 sqE6yTI6bltokS9LdP4VwcyBW7v95Tth6ufYR9WfTHXa5OMC4PPuVszUFo50Ieyu
 NWfHLLU0T3BtyNKsIB7jGX87PaJIENqXF4pXT9WlosQckM1G9m+hxJDMAz6mi3kB
 aYyR1yT75n//vVZoLMn/vewZja5tWY3MG1y0Reyi79lVurKRunWidnAuI92jk/pk
 WihPPVgL9813xIjEud4wBUVJ+ZZLl7d6i4TxdP41Vdg7nst/cp6/6rist+c/qhHM
 B5iE8TYiJUT3qHsRFUQ7xrrYN1BVQy7sGUCgdjRsBnUQHyM7AFyKzSlcNvAvdKQR
 m5ZKF3Iy8f/jzZavGahHyccdFCuUpJ1pv0QdKgk538E8EuV29F1p9LNo4nhFCmFM
 qVYiCBdAnQDdp22yCozCtBKW9mRvAIlHwM2JirptcyeEi8+TUxs+h/H4WG17t94F
 Ff8fxMU7GG1P+H/n0EsR1b2r7aBmy2BjgiCvT1Va1mF7Eh4efiV6nVpa5OfrosnB
 3heTVjSvyAd7A5Kc3DJvzxCyZknKK/Gxv1u5GFXhsF3+9pg5k23ES4EiO/j8VzMJ
 pAXS2s3BOQACHVHMqJQwy+wWjNKyIjTZXLzCfDs8WnVQK710EcWaBrXs/28GIWHF
 FZl8UPtaNeu4MvszJAY0
 =UGaT
 -----END PGP SIGNATURE-----

From: Joerg Sonnenberger <joerg@britannica.bec.de>
To: gnats-bugs@NetBSD.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, tsutsui@ceres.dti.ne.jp
Subject: Re: PR/50638 CVS commit: src/sys/lib/libsa
Date: Mon, 18 Jan 2016 00:20:46 +0100

 On Sun, Jan 17, 2016 at 10:25:01PM +0000, Kamil Rytarowski wrote:
 >  RAMDISK overflow danger is a valid reason to remove crc32.

 No, it is not. A size constrained first stage boot loader might be, but
 there isn't any evidence of that so far. Seriously, if you want to shave
 off a few Bytes from libsa using code, start with the zlib copyright.
 That doesn't even functionality. That alone is by my count 50 Bytes...

 Joerg

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.