NetBSD Problem Report #59808

From www@netbsd.org  Sun Nov 30 11:32:15 2025
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits)
	 client-signature RSA-PSS (2048 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 447081A9239
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 30 Nov 2025 11:32:15 +0000 (UTC)
Message-Id: <20251130113214.482431A923A@mollari.NetBSD.org>
Date: Sun, 30 Nov 2025 11:32:14 +0000 (UTC)
From: twaldmann@thinkmo.de
Reply-To: twaldmann@thinkmo.de
To: gnats-bugs@NetBSD.org
Subject: random python3 process crashes in NetBSD VMs
X-Send-Pr-Version: www-1.0

>Number:         59808
>Notify-List:    riastradh@NetBSD.org
>Category:       pkg
>Synopsis:       random python3 process crashes in NetBSD VMs
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-amd64-maintainer
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Nov 30 11:35:00 +0000 2025
>Closed-Date:    
>Last-Modified:  Tue Jan 20 11:28:08 +0000 2026
>Originator:     Thomas Waldmann
>Release:        10.1
>Organization:
borgbackup.org
>Environment:
>Description:
We are seeing frequent Python3 process crashes only in NetBSD VMs. The same testsuite never has any such gw process crashes on Linux, macOS (not in a VM) or OpenBSD, FreeBSD or Haiku (running in same kind of VM).

Currently, this happens on GitHub actions runner machines (VMs based on Linux + kvm), but I have seen the same thing also on local Linux + virtualbox VMs with NetBSD.

It happens now on 10.1, but I have also seen this on 9.x since a long time.

It crashes randomly in some unit tests, for example:

________________________ src/borg/testsuite/archiver.py ________________________
[gw0] netbsd10 -- Python 3.11.13 /home/runner/work/borg/borg/.tox/py311-none/bin/python
worker 'gw0' crashed while running 'src/borg/testsuite/archiver.py::ArchiverTestCase::test_extract_with_pattern'

>How-To-Repeat:
Run the borgbackup tox tests in a NetBSD VM (crashes are quite frequent, but might need multiple runs to see).

>Fix:

>Release-Note:

>Audit-Trail:

State-Changed-From-To: open->feedback
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Sun, 30 Nov 2025 14:00:57 +0000
State-Changed-Why:
What is the crash?  Is gw0 a process running that terminates on a
signal or something, or is gw0 a VM and the kernel crashes with some
console output, or what?  Is there any output you can share?  Is there
a core dump?


From: twaldmann@thinkmo.de
To: gnats-bugs@netbsd.org, port-amd64-maintainer@netbsd.org,
 pkgsrc-bugs@netbsd.org, gnats-admin@netbsd.org, riastradh@NetBSD.org,
 twaldmann@thinkmo.de
Cc: 
Subject: Re: pkg/59808 (random python3 process crashes in NetBSD VMs)
Date: Sun, 30 Nov 2025 16:59:09 +0100

 > What is the crash?  Is gw0 a process running that terminates on a
 > signal or something, or is gw0 a VM and the kernel crashes with some
 > console output, or what?

 That is an error message from a tox / pytest-xdist run. It's a python 
 based test runner that parallelizes by creating some python worker 
 processes (it calls them gwN) and then dispatches python tests to these 
 workers.

 So I suspect the gw process has crashed.

 > Is there any output you can share?  Is there a core dump?

 Currently I don't have anything more. It's hard to debug on the github 
 runners, but I can also run a netbsd vagrant virtualbox VM locally and 
 reproduce if you like.

 I am not really familiar with NetBSD. I just want to make sure that 
 borgbackup runs well on this platform, but I am no (real) user myself.

 So, can you tell me what to look for, how and where?

 Cheers, Thomas

From: Taylor R Campbell <riastradh@NetBSD.org>
To: twaldmann@thinkmo.de
Cc: gnats-bugs@netbsd.org, port-amd64-maintainer@netbsd.org,
	pkgsrc-bugs@netbsd.org, gnats-admin@netbsd.org, twaldmann@thinkmo.de
Subject: Re: pkg/59808 (random python3 process crashes in NetBSD VMs)
Date: Sun, 30 Nov 2025 20:45:48 +0000

 > Date: Sun, 30 Nov 2025 16:59:09 +0100
 > From: twaldmann@thinkmo.de
 >=20
 > > Is there any output you can share?  Is there a core dump?
 >=20
 > Currently I don't have anything more. It's hard to debug on the github=20
 > runners, but I can also run a netbsd vagrant virtualbox VM locally and=20
 > reproduce if you like.
 >=20
 > I am not really familiar with NetBSD. I just want to make sure that=20
 > borgbackup runs well on this platform, but I am no (real) user myself.
 >=20
 > So, can you tell me what to look for, how and where?

 I think the next diagnostics are likely to be more about the
 borgbackup or tox test harness than about NetBSD, and how they handle
 a process crashing generally.

 - Is there stdout/stderr stored anywhere?

 - Did the parent process determine it terminated on a signal, and if
   so, what signal, and did it dump core?

   (These are standard Unix questions answered by the wait family of
   system calls and associated predicates WIFSIGNALED, WTERMSIG,
   WCOREDUMP -- nothing different about NetBSD.)

 - Can you find a core dump?

   By default, it will be called %n.core where %n is the program name,
   and it will appear in the process's working directory -- unless it's
   run from a set-user-id or set-group-id executable, in which case a
   core dump will go in /var/crash but only if you set the sysctl knob
   kern.coredump.setid.dump=3D1.

   (See `sysctl kern.defcorename' and `sysctl kern.coredump' for all
   the parameters.  Enable setid core dumps with `sysctl -w
   kern.coredump.setid.dump=3D1', though I doubt you're using any setid
   processes.  More information:
   https://man.NetBSD.org/NetBSD-10.1/core.5,
   https://man.NetBSD.org/NetBSD-10.1/sysctl.7)

   If you can find a core dump, and it's (say) from a program called
   /usr/pkg/bin/foo, can you get a stack trace out of gdb?

   # gdb /usr/pkg/bin/foo /path/to/foo.core
   (gdb) bt
   (gdb) info registers
   (gdb) frame apply all info locals

 - If it terminated on a signal but there's no core dump: Are you
   running with any resource limits (getrlimit/setrlimit, or the shell
   ulimit builtin -- all standard POSIX) that prevent core dumps, or
   are you running a set-user-id or set-group-id executable?

 - Is there any relevant output in `dmesg'?  It might say whether a
   core was dumped, in case the parent process ignores that
   information.

 I ran the test suite three times in a VM by loosely following the
 instructions at
 https://github.com/borgbackup/borg/blob/9a0122995c32aa657a2b1cac7a015cec6d1=
 a89ab/.github/workflows/ci.yml#L432-L468
 but so far I haven't seen any crashes.  Takes about an hour to run;
 how often do the crashes occur?

From: Thomas Waldmann <tw@waldmann-edv.de>
To: gnats-bugs@netbsd.org, port-amd64-maintainer@netbsd.org,
 gnats-admin@netbsd.org, pkgsrc-bugs@netbsd.org, twaldmann@thinkmo.de
Cc: riastradh@NetBSD.org
Subject: Re: pkg/59808 (random python3 process crashes in NetBSD VMs)
Date: Tue, 2 Dec 2025 14:36:35 +0100

 >   - Is there stdout/stderr stored anywhere?

 That's what I have (from a new run in a virtualbox VM):

      netbsd9: =================================== FAILURES 
 ===================================
      netbsd9: ________________________ src/borg/testsuite/archiver.py 
 ________________________
      netbsd9: [gw9] netbsd9 -- Python 3.11.13 
 /vagrant/borg/borg/.tox/py/bin/python
      netbsd9: worker 'gw9' crashed while running 
 'src/borg/testsuite/archiver.py::ArchiverTestCase::test_unknown_feature_on_rename'
      netbsd9: ================================ tests coverage 
 ================================
      netbsd9: ______________ coverage: platform netbsd9, python 
 3.11.13-final-0 ______________
      netbsd9:
      netbsd9: ___________________________ coverage: failed workers 
 ___________________________
      netbsd9:
      netbsd9: The following workers failed to return coverage data, 
 ensure that pytest-cov is installed on these workers.
      netbsd9: gw9

 >   - Did the parent process determine it terminated on a signal, and if
 >     so, what signal,

 I don't know, I am not a developer of pytest(-xdist).

 > and did it dump core?

 I found exactly 1 core dump:

 -rw-------  1 vagrant  wheel  55814056 Dec  2 08:40 
 /tmp/tmp6pedxq5l/python.core

 >     If you can find a core dump, and it's (say) from a program called
 >     /usr/pkg/bin/foo, can you get a stack trace out of gdb?
 >   
 >     # gdb /usr/pkg/bin/foo /path/to/foo.core
 >     (gdb) bt
 >     (gdb) info registers
 >     (gdb) frame apply all info locals

 $ pwd
 /vagrant/borg/borg-env/bin

 $ ls -l python
 lrwxrwxr-x  1 vagrant  wheel  23 Dec  2 08:32 python -> 
 /usr/pkg/bin/python3.11

 $ gdb python /tmp/tmp6pedxq5l/python.core

 GNU gdb (GDB) 8.3
 ...
 Reading symbols from python...
 (No debugging symbols found in python)
 [New process 1]
 [New process 2]
 Core was generated by `python'.
 Program terminated with signal SIGSEGV, Segmentation fault.
 #0  0x00007355f6967dea in _lwp_kill () from /usr/lib/libc.so.12
 [Current thread is 1 (process 1)]

 (gdb) bt
 #0  0x00007355f6967dea in _lwp_kill () from /usr/lib/libc.so.12
 #1  0x00007355f80525f1 in faulthandler_fatal_error () from 
 /usr/pkg/lib/libpython3.11.so.1.0
 #2  0x00007355f68a21a0 in opendir () from /usr/lib/libc.so.12
 #3  0x000000010000000b in ?? ()
 #4  0x0000000000000000 in ?? ()

 (gdb) info registers
 rax            0x0                 0
 rbx            0x7355f81226f6      126813071353590
 rcx            0x7355f6967dea      126813046472170
 rdx            0x0                 0
 rsi            0xb                 11
 rdi            0x1                 1
 rbp            0xb                 0xb
 rsp            0x7355f3e4a338      0x7355f3e4a338
 r8             0x0                 0
 r9             0x0                 0
 r10            0x7355f6967dca      126813046472138
 r11            0x206               518
 r12            0x7355f6bd0680      126813048997504
 r13            0xc                 12
 r14            0x0                 0
 r15            0x7355f84162a0      126813074449056
 rip            0x7355f6967dea      0x7355f6967dea <_lwp_kill+10>
 eflags         0x206               [ PF IF ]
 cs             0x47                71
 ss             0x3f                63
 ds             0x23                35
 es             0x23                35
 fs             0x0                 0
 gs             0x0                 0
 fs_base        <unavailable>
 gs_base        <unavailable>

 (gdb) frame apply all info locals
 #0  0x00007355f6967dea in _lwp_kill () from /usr/lib/libc.so.12
 No symbol table info available.
 #1  0x00007355f80525f1 in faulthandler_fatal_error () from 
 /usr/pkg/lib/libpython3.11.so.1.0
 No symbol table info available.
 #2  0x00007355f68a21a0 in opendir () from /usr/lib/libc.so.12
 No symbol table info available.
 #3  0x000000010000000b in ?? ()
 No symbol table info available.
 #4  0x0000000000000000 in ?? ()
 No symbol table info available.
 (gdb)

 >   - Is there any relevant output in `dmesg'?

 Nothing related, just the boot messages and a few unrelated msgs.

 >   I ran the test suite three times in a VM by loosely following the
 >   instructions at
 >   https://github.com/borgbackup/borg/blob/9a0122995c32aa657a2b1cac7a015cec6d1=
 >   a89ab/.github/workflows/ci.yml#L432-L468
 >   but so far I haven't seen any crashes.  Takes about an hour to run;
 >   how often do the crashes occur?

 I think I currently see them in most testsuite runs on github CI on 
 netbsd 10.

 I also needed only 1 try now to get one process crashing in the 
 virtualbox VM with netbsd 9.

 Sometimes, multiple process crashes in 1 testsuite run.

 In the past, I have also seen them frequently on netbsd 9.

 Thanks for your detailled help!


From: Thomas Waldmann <tw@waldmann-edv.de>
To: gnats-bugs@gnats.netbsd.org
Cc: 
Subject: Re: pkg/59808
Date: Tue, 20 Jan 2026 11:36:41 +0100

 Guess I gave a lot of information above, so please remove the "feedback" 
 state so that the issue tracker does not spam me regularly for updates.

State-Changed-From-To: feedback->open
State-Changed-By: leot@NetBSD.org
State-Changed-When: Tue, 20 Jan 2026 11:28:08 +0000
State-Changed-Why:
Feedback provided


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2026 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.