NetBSD Problem Report #58539
From wiz@exadelic.gatalith.at Fri Aug 2 10:25:24 2024
Return-Path: <wiz@exadelic.gatalith.at>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
client-signature RSA-PSS (2048 bits) client-digest SHA256)
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 1FC321A923C
for <gnats-bugs@gnats.NetBSD.org>; Fri, 2 Aug 2024 10:25:24 +0000 (UTC)
Message-Id: <20240802102520.1316C2EBBAD7@exadelic.gatalith.at>
Date: Fri, 2 Aug 2024 12:25:20 +0200 (CEST)
From: Thomas Klausner <wiz@NetBSD.org>
Reply-To: Thomas Klausner <wiz@NetBSD.org>
To: gnats-bugs@NetBSD.org
Subject: AVX-512 support incomplete/broken
X-Send-Pr-Version: 3.95
>Number: 58539
>Category: toolchain
>Synopsis: AVX-512 support incomplete/broken
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: toolchain-manager
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Aug 02 10:30:00 +0000 2024
>Last-Modified: Sun Aug 11 13:15:01 +0000 2024
>Originator: Thomas Klausner
>Release: NetBSD 10.99.11
>Organization:
>Environment:
Architecture: x86_64
Machine: amd64
# cpuctl identify 0 | grep AVX
cpu0: features1 0x7ed8320b<MOVBE,POPCNT,AES,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
cpu0: features5 0xf1bf97a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,QM,PQE>
cpu0: features5 0xf1bf97a9<AVX512F,AVX512DQ,RDSEED,ADX,SMAP,AVX512_IFMA>
cpu0: features5 0xf1bf97a9<CLFLUSHOPT,CLWB,AVX512CD,SHA,AVX512BW,AVX512VL>
cpu0: features6 0x405fce<AVX512_VBMI,UMIP,PKU,AVX512_VBMI2,CET_SS,GFNI,VAES>
cpu0: features6 0x405fce<VPCLMULQDQ,AVX512_VNNI,AVX512_BITALG,AVX512_VPOPCNTDQ>
cpu0: xsave features 0x2e7<x87,SSE,AVX,Opmask,ZMM_Hi256,Hi16_ZMM,PKRU>
cpu0: enabled xsave 0xe7<x87,SSE,AVX,Opmask,ZMM_Hi256,Hi16_ZMM>
cpu0: SEF-subleaf1-eax 0x20<AVX512_BF16>
>Description:
While debugging a problem with lang/ghc98 that only I could reproduce
but pho@ not (see http://gnats.netbsd.org/58379), we contacted
upstream and they pointed out that they're using AVX-512 support, and
that that had lead to weird similar problems in the past (on OpenBSD,
it causes SIGILLs; on FreeBSD other unspecified issues).
Their proposed workaround of disabling AVX-512 support fixed the
problem for me, so there must be some problem in the NetBSD AVX-512
support.
>How-To-Repeat:
Build lang/ghc98 (currently 9.8.2) on a system support AVX-512, with
the following line removed from lang/ghc98/Makefile:
CFLAGS+= -D__STDC_NO_ATOMICS__
>Fix:
Yes, please.
OpenBSD fixed their AVX-512 support with, I think:
https://codeberg.org/openbsd/src/commit/c0f33c9875c4ab47e986b698610630b6cbf21c6c
>Audit-Trail:
From: Thomas Klausner <wiz@NetBSD.org>
To: NetBSD bugtracking <gnats-bugs@NetBSD.org>
Cc:
Subject: Re: toolchain/58539
Date: Sun, 11 Aug 2024 09:24:09 +0200
----- Forwarded message from Thomas Klausner <wiz@NetBSD.org> -----
Date: Sun, 11 Aug 2024 09:23:39 +0200
From: Thomas Klausner <wiz@NetBSD.org>
To: tech-toolchain@NetBSD.org
Subject: Re: State of AVX512 support in NetBSD?
On Mon, Jul 29, 2024 at 11:59:12PM +0200, Thomas Klausner wrote:
> For my recent build trouble of lang/ghc98 that only I can see but pho@
> can't[1], upstream has suggested it might be a problem with AVX512
> support[2] in NetBSD. (My CPU supports AVX512, pho's doesn't.)
>
> I'll give their suggestion (using -D__STDC_NO_ATOMICS__) a try, but
> does anyone know the actual state of AVX512 support on NetBSD?
This also affects py-numpy.
On a system with:
NumPy CPU features: SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* F16C* FMA3* AVX2* AVX512F* AVX512CD* AVX512_KNL? AVX512_KNM? AVX512_SKX* AVX512_CLX* AVX512_CNL* AVX512_ICL* AVX512_SPR?
the self tests fail because of a Python coredump:
.......................Fatal Python error: Segmentation fault
Current thread 0x000070758c49e800 (most recent call first):
File "/scratch/math/py-numpy/work/.destdir/usr/pkg/lib/python3.12/site-packages/numpy/_core/fromnumeric.py", line 1077 in sort
File "/scratch/math/py-numpy/work/.destdir/usr/pkg/lib/python3.12/site-packages/numpy/_core/tests/test_multiarray.py", line 2307 in test_sort_degraded
File "/usr/pkg/lib/python3.12/site-packages/_pytest/python.py", line 162 in pytest_pyfunc_call
File "/usr/pkg/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/usr/pkg/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/usr/pkg/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/usr/pkg/lib/python3.12/site-packages/_pytest/python.py", line 1632 in runtest
File "/usr/pkg/lib/python3.12/site-packages/_pytest/runner.py", line 173 in pytest_runtest_call
File "/usr/pkg/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/usr/pkg/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/usr/pkg/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/usr/pkg/lib/python3.12/site-packages/_pytest/runner.py", line 241 in <lambda>
File "/usr/pkg/lib/python3.12/site-packages/_pytest/runner.py", line 341 in from_call
File "/usr/pkg/lib/python3.12/site-packages/_pytest/runner.py", line 240 in call_and_report
File "/usr/pkg/lib/python3.12/site-packages/_pytest/runner.py", line 135 in runtestprotocol
File "/usr/pkg/lib/python3.12/site-packages/_pytest/runner.py", line 116 in pytest_runtest_protocol
File "/usr/pkg/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/usr/pkg/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/usr/pkg/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/usr/pkg/lib/python3.12/site-packages/_pytest/main.py", line 364 in pytest_runtestloop
File "/usr/pkg/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/usr/pkg/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/usr/pkg/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/usr/pkg/lib/python3.12/site-packages/_pytest/main.py", line 339 in _main
File "/usr/pkg/lib/python3.12/site-packages/_pytest/main.py", line 285 in wrap_session
File "/usr/pkg/lib/python3.12/site-packages/_pytest/main.py", line 332 in pytest_cmdline_main
File "/usr/pkg/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
File "/usr/pkg/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
File "/usr/pkg/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
File "/usr/pkg/lib/python3.12/site-packages/_pytest/config/__init__.py", line 178 in main
File "/scratch/math/py-numpy/work/.destdir/usr/pkg/lib/python3.12/site-packages/numpy/_pytesttester.py", line 195 in __call__
File "<string>", line 1 in <module>
Extension modules: numpy._core._multiarray_umath, numpy._core._multiarray_tests, numpy.linalg._umath_linalg, numpy._core._rational_tests, numpy._core._umath_tests, cython.cimports.libc.math, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, numpy._core._simd, numpy._core._operand_flag_tests, numpy.linalg.lapack_lite, checks, limited_api2 (total: 20)
*** Signal 11
and the backtrace contains stats with:
#0 0x000070758c66d2fa in _lwp_kill () from /usr/lib/libc.so.12
#1 <signal handler called>
#2 0x000070758b8f9a95 in _mm512_mask_compressstoreu_epi64(void*, unsigned char, long long __vector(8)) (__A=...,
__U=<error reading variable: Unable to access DWARF register number 118>, __P=0x707569bfcff0) at /usr/include/gcc-12/avx512fintrin.h:10911
#3 zmm_vector<long>::mask_compressstoreu(void*, unsigned char, long long __vector(8)) (x=...,
mask=<error reading variable: Unable to access DWARF register number 118>, mem=0x707569bfcff0)
at ../numpy/_core/src/npysort/x86-simd-sort/src/avx512-64bit-common.h:703
#4 avx512_double_compressstore<zmm_vector<long>, long, long long __vector(8)>(long*, long*, zmm_vector<long>::opmask_t, long long __vector(8)) (reg=...,
k=<error reading variable: Unable to access DWARF register number 118>, right_addr=0x707569bfd008, left_addr=0x707569ba57c8)
at ../numpy/_core/src/npysort/x86-simd-sort/src/xss-common-qsort.h:211
#5 zmm_vector<long>::double_compressstore(long*, long*, unsigned char, long long __vector(8)) (reg=...,
k=<error reading variable: Unable to access DWARF register number 118>, right_addr=0x707569bfd008, left_addr=0x707569ba57c8)
at ../numpy/_core/src/npysort/x86-simd-sort/src/avx512-64bit-common.h:778
#6 partition_vec<zmm_vector<long int>, Comparator<zmm_vector<long int>, false>, long int, __vector(8) long long int> (biggest_vec=<synthetic pointer>...,
smallest_vec=<synthetic pointer>..., pivot_vec=..., curr_vec=..., r_store=0x707569bfd008, l_store=0x707569ba57c8)
at ../numpy/_core/src/npysort/x86-simd-sort/src/xss-common-qsort.h:232
#7 partition_unrolled<zmm_vector<long>, Comparator<zmm_vector<long>, false>, 8, long> (arr=arr@entry=0x707569bfda40, left=955200, left@entry=0,
right=999872, right@entry=1000000, pivot=pivot@entry=375000, smallest=smallest@entry=0x7f7fffbcd610, biggest=biggest@entry=0x7f7fffbcd618)
at ../numpy/_core/src/npysort/x86-simd-sort/src/xss-common-qsort.h:462
#8 0x000070758b95c93c in qsort_<zmm_vector<long>, Comparator<zmm_vector<long>, false>, long> (arr=0x707569bfda40, left=<optimized out>, right=999999,
max_iters=<optimized out>) at ../numpy/_core/src/npysort/x86-simd-sort/src/xss-common-qsort.h:553
Can someone please fix or disable AVX 512 support on NetBSD?
Thomas
----- End forwarded message -----
From: "Thomas Klausner" <wiz@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/58539 CVS commit: pkgsrc/math/py-numpy
Date: Sun, 11 Aug 2024 13:12:08 +0000
Module Name: pkgsrc
Committed By: wiz
Date: Sun Aug 11 13:12:08 UTC 2024
Modified Files:
pkgsrc/math/py-numpy: Makefile
Log Message:
py-numpy: disable AVX512 support
This does not work on NetBSD, see PR 58539.
When reporting a similar problem for ghc, I was told it's troubled on
other operating systems too, so disable it for all.
If you know it's safe on your operating system, please add an
appropriate ifdef.
Update test status.
Bump PKGREVISION.
To generate a diff of this commit:
cvs rdiff -u -r1.123 -r1.124 pkgsrc/math/py-numpy/Makefile
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.