NetBSD Problem Report #59329

From www@netbsd.org  Sat Apr 19 13:13:23 2025
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits)
	 client-signature RSA-PSS (2048 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 00E2F1A923D
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 19 Apr 2025 13:13:23 +0000 (UTC)
Message-Id: <20250419131321.AF6001A923E@mollari.NetBSD.org>
Date: Sat, 19 Apr 2025 13:13:21 +0000 (UTC)
From: nia@pkgsrc.org
Reply-To: nia@pkgsrc.org
To: gnats-bugs@NetBSD.org
Subject: NetBSD's OpenSSL 7x slower at AES than same upstream version on NetBSD
X-Send-Pr-Version: www-1.0

>Number:         59329
>Category:       lib
>Synopsis:       NetBSD's OpenSSL 7x slower at AES than same upstream version on NetBSD
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    lib-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Apr 19 13:15:00 +0000 2025
>Last-Modified:  Wed Apr 23 20:20:01 +0000 2025
>Originator:     nia
>Release:        10.1
>Organization:
The NetBSD FoundAESion
>Environment:
NetBSD siphon 10.1_STABLE NetBSD 10.1_STABLE (SIPHON) #2: Thu Mar  6 18:29:22 CET 2025  nia@siphon:/encrypt/src/obj/sys/arch/amd64/compile/SIPHON amd64
>Description:
A locally compiled OpenSSL using their own build system is around
7x faster (!) at doing AES on amd64 than the OpenSSL in NetBSD 10.1.
On other ports, an attempt was made to reproduce this and the
results were not so strongly different, suggesting the problem
is in OpenSSL's arch-dependent features.

Identical versions of the OpenSSL code base were tested (3.0.12).

Upstream OpenSSL claims the following CPUINFO:
$ LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/openssl version -a
CPUINFO: OPENSSL_ia32cap=0x7ffaf3bfffebffff:0x29c67af

The NetBSD supplied OpenSSL claims the same:

$ /usr/bin/openssl version -a
CPUINFO: OPENSSL_ia32cap=0x7ffaf3bfffebffff:0x29c67af

When using OpenSSL's own build system, the following arch-specific
CFLAGS are applied to every file:

-DAES_ASM -DBSAES_ASM -DCMLL_ASM -DECP_NISTZ256_ASM -DGHASH_ASM -DKECCAK1600_ASM -DMD5_ASM -DOPENSSL_BN_ASM_GF2m -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DPOLY1305_ASM -DRC4_ASM -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DX25519_ASM -fPIC -pthread -Wa,--noexecstack -Wall -O3 -DL_ENDIAN -DOPENSSL_PIC

On NetBSD it seems slightly harder to tell, since such flags
are applied on a per-file basis (error-prone when OpenSSL is
updated?). However, the full build log can be grepped:

- CMLL_ASM is missing (should probably be added to camellia.inc)
- OPENSSL_IA32_SSE2 is missing
- POLY1305_ASM is missing
- RC4_ASM is missing (should probably be added to rc4.inc)
- L_ENDIAN is missing
- OPENSSL_PIC is missing

Adding these unfortunately did not seem to help matters.
Even -O3 does not help.

What are we doing differently?
>How-To-Repeat:
You can see the huge speed difference:

$ LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/openssl speed -evp aes-128-cbc
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
AES-128-CBC     549358.63k   800841.78k   817445.24k   820471.34k   826307.24k   825191.49k
$ LD_LIBRARY_PATH=/usr/lib /usr/bin/openssl speed -evp aes-128-cbc
type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
AES-128-CBC      79949.63k    94754.28k    98679.08k   100314.28k   100723.39k   100676.99k
>Fix:
Yes, please.

>Audit-Trail:
From: RVP <rvp@SDF.ORG>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: lib/59329: NetBSD's OpenSSL 7x slower at AES than same upstream
 version on NetBSD
Date: Mon, 21 Apr 2025 07:02:15 +0000 (UTC)

 On Sat, 19 Apr 2025, nia@pkgsrc.org wrote:

 > When using OpenSSL's own build system, the following arch-specific
 > CFLAGS are applied to every file:
 >
 > -DAES_ASM -DBSAES_ASM -DCMLL_ASM -DECP_NISTZ256_ASM -DGHASH_ASM -DKECCAK1600_ASM -DMD5_ASM -DOPENSSL_BN_ASM_GF2m -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DPOLY1305_ASM -DRC4_ASM -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DX25519_ASM -fPIC -pthread -Wa,--noexecstack -Wall -O3 -DL_ENDIAN -DOPENSSL_PIC
 >
 > On NetBSD it seems slightly harder to tell, since such flags
 > are applied on a per-file basis (error-prone when OpenSSL is
 > updated?).
 >

 Yeah, that's pretty much what this is. I managed to get a 10x speedup:

 ```
 $ openssl speed -evp aes-128-cbc
 Doing AES-128-CBC for 3s on 16 size blocks: 22702707 AES-128-CBC's in 3.00s
 Doing AES-128-CBC for 3s on 64 size blocks: 6494628 AES-128-CBC's in 3.00s
 Doing AES-128-CBC for 3s on 256 size blocks: 1684133 AES-128-CBC's in 3.00s
 Doing AES-128-CBC for 3s on 1024 size blocks: 429110 AES-128-CBC's in 3.00s
 Doing AES-128-CBC for 3s on 8192 size blocks: 53846 AES-128-CBC's in 3.00s
 Doing AES-128-CBC for 3s on 16384 size blocks: 26979 AES-128-CBC's in 3.00s
 version: 3.0.15
 NetBSD 10.99.14
 options: bn(64,64)
 gcc version 12.4.0 (NetBSD nb1 20240630)
 CPUINFO: OPENSSL_ia32cap=0x7ffaf3bfffebffff:0x40405f4ef2bf27ef
 The 'numbers' are in 1000s of bytes per second processed.
 type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
 AES-128-CBC     121121.48k   138552.06k   143664.79k   146518.39k   147133.57k   147390.44k

 $ /tmp/obj/usr/src/crypto/external/bsd/openssl/bin/openssl speed -evp aes-128-cbc
 Doing AES-128-CBC for 3s on 16 size blocks: 228551884 AES-128-CBC's in 3.00s
 Doing AES-128-CBC for 3s on 64 size blocks: 73155615 AES-128-CBC's in 3.00s
 Doing AES-128-CBC for 3s on 256 size blocks: 18991561 AES-128-CBC's in 3.00s
 Doing AES-128-CBC for 3s on 1024 size blocks: 4788704 AES-128-CBC's in 3.00s
 Doing AES-128-CBC for 3s on 8192 size blocks: 599986 AES-128-CBC's in 3.00s
 Doing AES-128-CBC for 3s on 16384 size blocks: 300192 AES-128-CBC's in 3.00s
 version: 3.0.15
 NetBSD 10.99.14
 options: bn(64,64)
 gcc version 12.4.0 (NetBSD nb1 20240630)
 CPUINFO: OPENSSL_ia32cap=0x7ffaf3bfffebffff:0x40405f4ef2bf27ef
 The 'numbers' are in 1000s of bytes per second processed.
 type             16 bytes     64 bytes    256 bytes   1024 bytes   8192 bytes  16384 bytes
 AES-128-CBC    1220163.54k  1560653.12k  1622235.44k  1634544.30k  1639454.74k  1639448.58k

 $
 ```

 using the following patch (only for AES):

 ```
 diff -urN a/src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/aes.inc b/src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/aes.inc
 --- a/src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/aes.inc	2018-02-08 21:57:24.000000000 +0000
 +++ b/src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/aes.inc	2025-04-21 06:26:06.653418871 +0000
 @@ -9,5 +9,6 @@
   vpaes-x86_64.S

   AESCPPFLAGS = -DAES_ASM -DVPAES_ASM -DBSAES_ASM
 +CPPFLAGS += ${AESCPPFLAGS}
   AESNI = yes
   .include "../../aes.inc"
 diff -urN a/src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/sha.inc b/src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/sha.inc
 --- a/src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/sha.inc	2024-07-16 03:11:54.778622078 +0000
 +++ b/src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/sha.inc	2025-04-21 06:42:36.777374180 +0000
 @@ -2,12 +2,10 @@
   SHA_SRCS = sha1-x86_64.S sha1-mb-x86_64.S keccak1600-x86_64.S
   SHACPPFLAGS = -DSHA1_ASM -DKECCAK1600_ASM
   KECCAKNI = yes
 -.if 0
   # This cannot be enabled until the SHA-2 symbol mess is resolved:
   # https://mail-index.netbsd.org/tech-userlevel/2024/03/17/msg014265.html
   # DO NOT TRY TO ENABLE IT, OR YOU MAY CAUSE NETBSD'S OPENSSL TO BE
   # VULNERABLE TO REMOTE CODE EXECUTION BY STACK BUFFER OVERRUNS.
   SHA_SRCS += sha512-x86_64.S sha256-mb-x86_64.S
 -SHACPPFLAGS+= -DSHA256_ASM -DSHA512_ASM
 -.endif
 +SHACPPFLAGS+= -DSHA256_ASM
   .include "../../sha.inc"
 ```

 but, note that I had to fiddle with the scary commented out section.

 -RVP

From: "Christos Zoulas" <christos@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/59329 CVS commit: src/crypto/external/bsd/openssl/lib/libcrypto
Date: Wed, 23 Apr 2025 16:14:59 -0400

 Module Name:	src
 Committed By:	christos
 Date:		Wed Apr 23 20:14:59 UTC 2025

 Modified Files:
 	src/crypto/external/bsd/openssl/lib/libcrypto: aes.inc
 	src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64: Makefile
 	    sha.inc sha512-x86_64.S
 Added Files:
 	src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64:
 	    sha256-x86_64.S

 Log Message:
 PR/59329: nia: OpenSSL 7x slower at AES. Thanks rvp.
 1. Add AESCPPFLAGS to CPPFLAGS for all files, not just the AES ones
 2. Enable and correct sha assembly generation


 To generate a diff of this commit:
 cvs rdiff -u -r1.5 -r1.6 \
     src/crypto/external/bsd/openssl/lib/libcrypto/aes.inc
 cvs rdiff -u -r1.13 -r1.14 \
     src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/Makefile
 cvs rdiff -u -r1.6 -r1.7 \
     src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/sha.inc
 cvs rdiff -u -r0 -r1.1 \
     src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/sha256-x86_64.S
 cvs rdiff -u -r1.11 -r1.12 \
     src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/sha512-x86_64.S

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.