NetBSD Problem Report #58539

From wiz@exadelic.gatalith.at  Fri Aug  2 10:25:24 2024
Return-Path: <wiz@exadelic.gatalith.at>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 1FC321A923C
	for <gnats-bugs@gnats.NetBSD.org>; Fri,  2 Aug 2024 10:25:24 +0000 (UTC)
Message-Id: <20240802102520.1316C2EBBAD7@exadelic.gatalith.at>
Date: Fri,  2 Aug 2024 12:25:20 +0200 (CEST)
From: Thomas Klausner <wiz@NetBSD.org>
Reply-To: Thomas Klausner <wiz@NetBSD.org>
To: gnats-bugs@NetBSD.org
Subject: AVX-512 support incomplete/broken
X-Send-Pr-Version: 3.95

>Number:         58539
>Category:       toolchain
>Synopsis:       AVX-512 support incomplete/broken
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    toolchain-manager
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Aug 02 10:30:00 +0000 2024
>Last-Modified:  Sun Aug 11 13:15:01 +0000 2024
>Originator:     Thomas Klausner
>Release:        NetBSD 10.99.11
>Organization:

>Environment:


Architecture: x86_64
Machine: amd64

# cpuctl identify 0 | grep AVX
cpu0: features1 0x7ed8320b<MOVBE,POPCNT,AES,XSAVE,OSXSAVE,AVX,F16C,RDRAND>
cpu0: features5 0xf1bf97a9<FSGSBASE,BMI1,AVX2,SMEP,BMI2,ERMS,INVPCID,QM,PQE>
cpu0: features5 0xf1bf97a9<AVX512F,AVX512DQ,RDSEED,ADX,SMAP,AVX512_IFMA>
cpu0: features5 0xf1bf97a9<CLFLUSHOPT,CLWB,AVX512CD,SHA,AVX512BW,AVX512VL>
cpu0: features6 0x405fce<AVX512_VBMI,UMIP,PKU,AVX512_VBMI2,CET_SS,GFNI,VAES>
cpu0: features6 0x405fce<VPCLMULQDQ,AVX512_VNNI,AVX512_BITALG,AVX512_VPOPCNTDQ>
cpu0: xsave features 0x2e7<x87,SSE,AVX,Opmask,ZMM_Hi256,Hi16_ZMM,PKRU>
cpu0: enabled xsave 0xe7<x87,SSE,AVX,Opmask,ZMM_Hi256,Hi16_ZMM>
cpu0: SEF-subleaf1-eax 0x20<AVX512_BF16>

>Description:

While debugging a problem with lang/ghc98 that only I could reproduce
but pho@ not (see http://gnats.netbsd.org/58379), we contacted
upstream and they pointed out that they're using AVX-512 support, and
that that had lead to weird similar problems in the past (on OpenBSD,
it causes SIGILLs; on FreeBSD other unspecified issues).

Their proposed workaround of disabling AVX-512 support fixed the
problem for me, so there must be some problem in the NetBSD AVX-512
support.

>How-To-Repeat:

Build lang/ghc98 (currently 9.8.2) on a system support AVX-512, with
the following line removed from lang/ghc98/Makefile:

CFLAGS+=        -D__STDC_NO_ATOMICS__
>Fix:
Yes, please.

OpenBSD fixed their AVX-512 support with, I think:

https://codeberg.org/openbsd/src/commit/c0f33c9875c4ab47e986b698610630b6cbf21c6c

>Audit-Trail:
From: Thomas Klausner <wiz@NetBSD.org>
To: NetBSD bugtracking <gnats-bugs@NetBSD.org>
Cc: 
Subject: Re: toolchain/58539
Date: Sun, 11 Aug 2024 09:24:09 +0200

 ----- Forwarded message from Thomas Klausner <wiz@NetBSD.org> -----

 Date: Sun, 11 Aug 2024 09:23:39 +0200
 From: Thomas Klausner <wiz@NetBSD.org>
 To: tech-toolchain@NetBSD.org
 Subject: Re: State of AVX512 support in NetBSD?

 On Mon, Jul 29, 2024 at 11:59:12PM +0200, Thomas Klausner wrote:
 > For my recent build trouble of lang/ghc98 that only I can see but pho@
 > can't[1], upstream has suggested it might be a problem with AVX512
 > support[2] in NetBSD. (My CPU supports AVX512, pho's doesn't.)
 > 
 > I'll give their suggestion (using -D__STDC_NO_ATOMICS__) a try, but
 > does anyone know the actual state of AVX512 support on NetBSD?

 This also affects py-numpy.

 On a system with:

 NumPy CPU features:  SSE SSE2 SSE3 SSSE3* SSE41* POPCNT* SSE42* AVX* F16C* FMA3* AVX2* AVX512F* AVX512CD* AVX512_KNL? AVX512_KNM? AVX512_SKX* AVX512_CLX* AVX512_CNL* AVX512_ICL* AVX512_SPR?

 the self tests fail because of a Python coredump:


 .......................Fatal Python error: Segmentation fault

 Current thread 0x000070758c49e800 (most recent call first):
   File "/scratch/math/py-numpy/work/.destdir/usr/pkg/lib/python3.12/site-packages/numpy/_core/fromnumeric.py", line 1077 in sort
   File "/scratch/math/py-numpy/work/.destdir/usr/pkg/lib/python3.12/site-packages/numpy/_core/tests/test_multiarray.py", line 2307 in test_sort_degraded
   File "/usr/pkg/lib/python3.12/site-packages/_pytest/python.py", line 162 in pytest_pyfunc_call
   File "/usr/pkg/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
   File "/usr/pkg/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
   File "/usr/pkg/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
   File "/usr/pkg/lib/python3.12/site-packages/_pytest/python.py", line 1632 in runtest
   File "/usr/pkg/lib/python3.12/site-packages/_pytest/runner.py", line 173 in pytest_runtest_call
   File "/usr/pkg/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
   File "/usr/pkg/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
   File "/usr/pkg/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
   File "/usr/pkg/lib/python3.12/site-packages/_pytest/runner.py", line 241 in <lambda>
   File "/usr/pkg/lib/python3.12/site-packages/_pytest/runner.py", line 341 in from_call
   File "/usr/pkg/lib/python3.12/site-packages/_pytest/runner.py", line 240 in call_and_report
   File "/usr/pkg/lib/python3.12/site-packages/_pytest/runner.py", line 135 in runtestprotocol
   File "/usr/pkg/lib/python3.12/site-packages/_pytest/runner.py", line 116 in pytest_runtest_protocol
   File "/usr/pkg/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
   File "/usr/pkg/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
   File "/usr/pkg/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
   File "/usr/pkg/lib/python3.12/site-packages/_pytest/main.py", line 364 in pytest_runtestloop
   File "/usr/pkg/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
   File "/usr/pkg/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
   File "/usr/pkg/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
   File "/usr/pkg/lib/python3.12/site-packages/_pytest/main.py", line 339 in _main
   File "/usr/pkg/lib/python3.12/site-packages/_pytest/main.py", line 285 in wrap_session
   File "/usr/pkg/lib/python3.12/site-packages/_pytest/main.py", line 332 in pytest_cmdline_main
   File "/usr/pkg/lib/python3.12/site-packages/pluggy/_callers.py", line 103 in _multicall
   File "/usr/pkg/lib/python3.12/site-packages/pluggy/_manager.py", line 120 in _hookexec
   File "/usr/pkg/lib/python3.12/site-packages/pluggy/_hooks.py", line 513 in __call__
   File "/usr/pkg/lib/python3.12/site-packages/_pytest/config/__init__.py", line 178 in main
   File "/scratch/math/py-numpy/work/.destdir/usr/pkg/lib/python3.12/site-packages/numpy/_pytesttester.py", line 195 in __call__
   File "<string>", line 1 in <module>

 Extension modules: numpy._core._multiarray_umath, numpy._core._multiarray_tests, numpy.linalg._umath_linalg, numpy._core._rational_tests, numpy._core._umath_tests, cython.cimports.libc.math, numpy.random._common, numpy.random.bit_generator, numpy.random._bounded_integers, numpy.random._mt19937, numpy.random.mtrand, numpy.random._philox, numpy.random._pcg64, numpy.random._sfc64, numpy.random._generator, numpy._core._simd, numpy._core._operand_flag_tests, numpy.linalg.lapack_lite, checks, limited_api2 (total: 20)
 *** Signal 11

 and the backtrace contains stats with:

 #0  0x000070758c66d2fa in _lwp_kill () from /usr/lib/libc.so.12
 #1  <signal handler called>
 #2  0x000070758b8f9a95 in _mm512_mask_compressstoreu_epi64(void*, unsigned char, long long __vector(8)) (__A=...,
     __U=<error reading variable: Unable to access DWARF register number 118>, __P=0x707569bfcff0) at /usr/include/gcc-12/avx512fintrin.h:10911
 #3  zmm_vector<long>::mask_compressstoreu(void*, unsigned char, long long __vector(8)) (x=...,
     mask=<error reading variable: Unable to access DWARF register number 118>, mem=0x707569bfcff0)
     at ../numpy/_core/src/npysort/x86-simd-sort/src/avx512-64bit-common.h:703
 #4  avx512_double_compressstore<zmm_vector<long>, long, long long __vector(8)>(long*, long*, zmm_vector<long>::opmask_t, long long __vector(8)) (reg=...,
     k=<error reading variable: Unable to access DWARF register number 118>, right_addr=0x707569bfd008, left_addr=0x707569ba57c8)
     at ../numpy/_core/src/npysort/x86-simd-sort/src/xss-common-qsort.h:211
 #5  zmm_vector<long>::double_compressstore(long*, long*, unsigned char, long long __vector(8)) (reg=...,
     k=<error reading variable: Unable to access DWARF register number 118>, right_addr=0x707569bfd008, left_addr=0x707569ba57c8)
     at ../numpy/_core/src/npysort/x86-simd-sort/src/avx512-64bit-common.h:778
 #6  partition_vec<zmm_vector<long int>, Comparator<zmm_vector<long int>, false>, long int, __vector(8) long long int> (biggest_vec=<synthetic pointer>...,
     smallest_vec=<synthetic pointer>..., pivot_vec=..., curr_vec=..., r_store=0x707569bfd008, l_store=0x707569ba57c8)
     at ../numpy/_core/src/npysort/x86-simd-sort/src/xss-common-qsort.h:232
 #7  partition_unrolled<zmm_vector<long>, Comparator<zmm_vector<long>, false>, 8, long> (arr=arr@entry=0x707569bfda40, left=955200, left@entry=0,
     right=999872, right@entry=1000000, pivot=pivot@entry=375000, smallest=smallest@entry=0x7f7fffbcd610, biggest=biggest@entry=0x7f7fffbcd618)
     at ../numpy/_core/src/npysort/x86-simd-sort/src/xss-common-qsort.h:462
 #8  0x000070758b95c93c in qsort_<zmm_vector<long>, Comparator<zmm_vector<long>, false>, long> (arr=0x707569bfda40, left=<optimized out>, right=999999,
     max_iters=<optimized out>) at ../numpy/_core/src/npysort/x86-simd-sort/src/xss-common-qsort.h:553



 Can someone please fix or disable AVX 512 support on NetBSD?
  Thomas

 ----- End forwarded message -----

From: "Thomas Klausner" <wiz@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/58539 CVS commit: pkgsrc/math/py-numpy
Date: Sun, 11 Aug 2024 13:12:08 +0000

 Module Name:	pkgsrc
 Committed By:	wiz
 Date:		Sun Aug 11 13:12:08 UTC 2024

 Modified Files:
 	pkgsrc/math/py-numpy: Makefile

 Log Message:
 py-numpy: disable AVX512 support

 This does not work on NetBSD, see PR 58539.
 When reporting a similar problem for ghc, I was told it's troubled on
 other operating systems too, so disable it for all.

 If you know it's safe on your operating system, please add an
 appropriate ifdef.

 Update test status.

 Bump PKGREVISION.


 To generate a diff of this commit:
 cvs rdiff -u -r1.123 -r1.124 pkgsrc/math/py-numpy/Makefile

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.