NetBSD Problem Report #59329
From www@netbsd.org Sat Apr 19 13:13:23 2025
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits)
client-signature RSA-PSS (2048 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 00E2F1A923D
for <gnats-bugs@gnats.NetBSD.org>; Sat, 19 Apr 2025 13:13:23 +0000 (UTC)
Message-Id: <20250419131321.AF6001A923E@mollari.NetBSD.org>
Date: Sat, 19 Apr 2025 13:13:21 +0000 (UTC)
From: nia@pkgsrc.org
Reply-To: nia@pkgsrc.org
To: gnats-bugs@NetBSD.org
Subject: NetBSD's OpenSSL 7x slower at AES than same upstream version on NetBSD
X-Send-Pr-Version: www-1.0
>Number: 59329
>Category: lib
>Synopsis: NetBSD's OpenSSL 7x slower at AES than same upstream version on NetBSD
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: lib-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Apr 19 13:15:00 +0000 2025
>Last-Modified: Wed Apr 23 20:20:01 +0000 2025
>Originator: nia
>Release: 10.1
>Organization:
The NetBSD FoundAESion
>Environment:
NetBSD siphon 10.1_STABLE NetBSD 10.1_STABLE (SIPHON) #2: Thu Mar 6 18:29:22 CET 2025 nia@siphon:/encrypt/src/obj/sys/arch/amd64/compile/SIPHON amd64
>Description:
A locally compiled OpenSSL using their own build system is around
7x faster (!) at doing AES on amd64 than the OpenSSL in NetBSD 10.1.
On other ports, an attempt was made to reproduce this and the
results were not so strongly different, suggesting the problem
is in OpenSSL's arch-dependent features.
Identical versions of the OpenSSL code base were tested (3.0.12).
Upstream OpenSSL claims the following CPUINFO:
$ LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/openssl version -a
CPUINFO: OPENSSL_ia32cap=0x7ffaf3bfffebffff:0x29c67af
The NetBSD supplied OpenSSL claims the same:
$ /usr/bin/openssl version -a
CPUINFO: OPENSSL_ia32cap=0x7ffaf3bfffebffff:0x29c67af
When using OpenSSL's own build system, the following arch-specific
CFLAGS are applied to every file:
-DAES_ASM -DBSAES_ASM -DCMLL_ASM -DECP_NISTZ256_ASM -DGHASH_ASM -DKECCAK1600_ASM -DMD5_ASM -DOPENSSL_BN_ASM_GF2m -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DPOLY1305_ASM -DRC4_ASM -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DX25519_ASM -fPIC -pthread -Wa,--noexecstack -Wall -O3 -DL_ENDIAN -DOPENSSL_PIC
On NetBSD it seems slightly harder to tell, since such flags
are applied on a per-file basis (error-prone when OpenSSL is
updated?). However, the full build log can be grepped:
- CMLL_ASM is missing (should probably be added to camellia.inc)
- OPENSSL_IA32_SSE2 is missing
- POLY1305_ASM is missing
- RC4_ASM is missing (should probably be added to rc4.inc)
- L_ENDIAN is missing
- OPENSSL_PIC is missing
Adding these unfortunately did not seem to help matters.
Even -O3 does not help.
What are we doing differently?
>How-To-Repeat:
You can see the huge speed difference:
$ LD_LIBRARY_PATH=/usr/local/lib /usr/local/bin/openssl speed -evp aes-128-cbc
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
AES-128-CBC 549358.63k 800841.78k 817445.24k 820471.34k 826307.24k 825191.49k
$ LD_LIBRARY_PATH=/usr/lib /usr/bin/openssl speed -evp aes-128-cbc
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
AES-128-CBC 79949.63k 94754.28k 98679.08k 100314.28k 100723.39k 100676.99k
>Fix:
Yes, please.
>Audit-Trail:
From: RVP <rvp@SDF.ORG>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: lib/59329: NetBSD's OpenSSL 7x slower at AES than same upstream
version on NetBSD
Date: Mon, 21 Apr 2025 07:02:15 +0000 (UTC)
On Sat, 19 Apr 2025, nia@pkgsrc.org wrote:
> When using OpenSSL's own build system, the following arch-specific
> CFLAGS are applied to every file:
>
> -DAES_ASM -DBSAES_ASM -DCMLL_ASM -DECP_NISTZ256_ASM -DGHASH_ASM -DKECCAK1600_ASM -DMD5_ASM -DOPENSSL_BN_ASM_GF2m -DOPENSSL_BN_ASM_MONT -DOPENSSL_BN_ASM_MONT5 -DOPENSSL_CPUID_OBJ -DOPENSSL_IA32_SSE2 -DPOLY1305_ASM -DRC4_ASM -DSHA1_ASM -DSHA256_ASM -DSHA512_ASM -DVPAES_ASM -DWHIRLPOOL_ASM -DX25519_ASM -fPIC -pthread -Wa,--noexecstack -Wall -O3 -DL_ENDIAN -DOPENSSL_PIC
>
> On NetBSD it seems slightly harder to tell, since such flags
> are applied on a per-file basis (error-prone when OpenSSL is
> updated?).
>
Yeah, that's pretty much what this is. I managed to get a 10x speedup:
```
$ openssl speed -evp aes-128-cbc
Doing AES-128-CBC for 3s on 16 size blocks: 22702707 AES-128-CBC's in 3.00s
Doing AES-128-CBC for 3s on 64 size blocks: 6494628 AES-128-CBC's in 3.00s
Doing AES-128-CBC for 3s on 256 size blocks: 1684133 AES-128-CBC's in 3.00s
Doing AES-128-CBC for 3s on 1024 size blocks: 429110 AES-128-CBC's in 3.00s
Doing AES-128-CBC for 3s on 8192 size blocks: 53846 AES-128-CBC's in 3.00s
Doing AES-128-CBC for 3s on 16384 size blocks: 26979 AES-128-CBC's in 3.00s
version: 3.0.15
NetBSD 10.99.14
options: bn(64,64)
gcc version 12.4.0 (NetBSD nb1 20240630)
CPUINFO: OPENSSL_ia32cap=0x7ffaf3bfffebffff:0x40405f4ef2bf27ef
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
AES-128-CBC 121121.48k 138552.06k 143664.79k 146518.39k 147133.57k 147390.44k
$ /tmp/obj/usr/src/crypto/external/bsd/openssl/bin/openssl speed -evp aes-128-cbc
Doing AES-128-CBC for 3s on 16 size blocks: 228551884 AES-128-CBC's in 3.00s
Doing AES-128-CBC for 3s on 64 size blocks: 73155615 AES-128-CBC's in 3.00s
Doing AES-128-CBC for 3s on 256 size blocks: 18991561 AES-128-CBC's in 3.00s
Doing AES-128-CBC for 3s on 1024 size blocks: 4788704 AES-128-CBC's in 3.00s
Doing AES-128-CBC for 3s on 8192 size blocks: 599986 AES-128-CBC's in 3.00s
Doing AES-128-CBC for 3s on 16384 size blocks: 300192 AES-128-CBC's in 3.00s
version: 3.0.15
NetBSD 10.99.14
options: bn(64,64)
gcc version 12.4.0 (NetBSD nb1 20240630)
CPUINFO: OPENSSL_ia32cap=0x7ffaf3bfffebffff:0x40405f4ef2bf27ef
The 'numbers' are in 1000s of bytes per second processed.
type 16 bytes 64 bytes 256 bytes 1024 bytes 8192 bytes 16384 bytes
AES-128-CBC 1220163.54k 1560653.12k 1622235.44k 1634544.30k 1639454.74k 1639448.58k
$
```
using the following patch (only for AES):
```
diff -urN a/src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/aes.inc b/src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/aes.inc
--- a/src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/aes.inc 2018-02-08 21:57:24.000000000 +0000
+++ b/src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/aes.inc 2025-04-21 06:26:06.653418871 +0000
@@ -9,5 +9,6 @@
vpaes-x86_64.S
AESCPPFLAGS = -DAES_ASM -DVPAES_ASM -DBSAES_ASM
+CPPFLAGS += ${AESCPPFLAGS}
AESNI = yes
.include "../../aes.inc"
diff -urN a/src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/sha.inc b/src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/sha.inc
--- a/src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/sha.inc 2024-07-16 03:11:54.778622078 +0000
+++ b/src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/sha.inc 2025-04-21 06:42:36.777374180 +0000
@@ -2,12 +2,10 @@
SHA_SRCS = sha1-x86_64.S sha1-mb-x86_64.S keccak1600-x86_64.S
SHACPPFLAGS = -DSHA1_ASM -DKECCAK1600_ASM
KECCAKNI = yes
-.if 0
# This cannot be enabled until the SHA-2 symbol mess is resolved:
# https://mail-index.netbsd.org/tech-userlevel/2024/03/17/msg014265.html
# DO NOT TRY TO ENABLE IT, OR YOU MAY CAUSE NETBSD'S OPENSSL TO BE
# VULNERABLE TO REMOTE CODE EXECUTION BY STACK BUFFER OVERRUNS.
SHA_SRCS += sha512-x86_64.S sha256-mb-x86_64.S
-SHACPPFLAGS+= -DSHA256_ASM -DSHA512_ASM
-.endif
+SHACPPFLAGS+= -DSHA256_ASM
.include "../../sha.inc"
```
but, note that I had to fiddle with the scary commented out section.
-RVP
From: "Christos Zoulas" <christos@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/59329 CVS commit: src/crypto/external/bsd/openssl/lib/libcrypto
Date: Wed, 23 Apr 2025 16:14:59 -0400
Module Name: src
Committed By: christos
Date: Wed Apr 23 20:14:59 UTC 2025
Modified Files:
src/crypto/external/bsd/openssl/lib/libcrypto: aes.inc
src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64: Makefile
sha.inc sha512-x86_64.S
Added Files:
src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64:
sha256-x86_64.S
Log Message:
PR/59329: nia: OpenSSL 7x slower at AES. Thanks rvp.
1. Add AESCPPFLAGS to CPPFLAGS for all files, not just the AES ones
2. Enable and correct sha assembly generation
To generate a diff of this commit:
cvs rdiff -u -r1.5 -r1.6 \
src/crypto/external/bsd/openssl/lib/libcrypto/aes.inc
cvs rdiff -u -r1.13 -r1.14 \
src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/Makefile
cvs rdiff -u -r1.6 -r1.7 \
src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/sha.inc
cvs rdiff -u -r0 -r1.1 \
src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/sha256-x86_64.S
cvs rdiff -u -r1.11 -r1.12 \
src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/sha512-x86_64.S
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.