NetBSD Problem Report #55652
From www@netbsd.org Thu Sep 10 09:52:20 2020
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id DEF381A9239
for <gnats-bugs@gnats.NetBSD.org>; Thu, 10 Sep 2020 09:52:20 +0000 (UTC)
Message-Id: <20200910095219.D57211A923A@mollari.NetBSD.org>
Date: Thu, 10 Sep 2020 09:52:19 +0000 (UTC)
From: rokuyama.rk@gmail.com
Reply-To: rokuyama.rk@gmail.com
To: gnats-bugs@NetBSD.org
Subject: bytes_transfer_eof_piod_write_i test causes kernel freeze
X-Send-Pr-Version: www-1.0
>Number: 55652
>Category: port-alpha
>Synopsis: bytes_transfer_eof_piod_write_i test causes kernel freeze
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: thorpej
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Sep 10 09:55:00 +0000 2020
>Closed-Date: Wed Jul 07 05:42:43 +0000 2021
>Last-Modified: Wed Jul 07 05:42:43 +0000 2021
>Originator: Rin Okuyama
>Release: 9.99.72
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD ds10 9.99.72 NetBSD 9.99.72 (GENERIC-$Revision: 1.407 $) #3: Thu Sep 10 18:11:06 JST 2020 rin@latipes:/sys/arch/alpha/compile/DS10 alpha
>Description:
bytes_transfer_eof_piod_write_i test in tests/lib/libc/sys/t_ptrace_wait
causes kernel freeze on my DS10 (single-core 21264A processor), in which
no operation is possible except for entering DDB from console:
----
ds10# cd /usr/tests/lib/libc/sys
ds10# atf-run t_ptrace_wait
...
tc-end: 1599729844.489525, bytes_transfer_eof_piod_write_d, passed
tc-start: 1599729844.490780, bytes_transfer_eof_piod_write_i
----
Here's backtrace etc. obtained by DDB:
----
tc-start: 1599729844.490780, bytes_transfer_eof_piod_write_i
~Stopped in pid 637.637 (t_ptrace_wait) at netbsd:cpu_Debugger+0x4:
ret zero,(ra)
db> bt
cpu_Debugger() at netbsd:cpu_Debugger+0x4
comintr() at netbsd:comintr+0xb84
alpha_shared_intr_dispatch() at netbsd:alpha_shared_intr_dispatch+0x5c
sio_iointr() at netbsd:sio_iointr+0x44
interrupt() at netbsd:interrupt+0x214
XentInt() at netbsd:XentInt+0x1c
--- interrupt (from ipl 0) ---
rw_enter() at netbsd:rw_enter+0x1a0
vm_map_lock_read() at netbsd:vm_map_lock_read+0x20
uvm_fault_internal() at netbsd:uvm_fault_internal+0x10c
trap() at netbsd:trap+0x604
XentMM() at netbsd:XentMM+0x20
--- memory management fault ---
kcopyerr() at netbsd:kcopyerr+0xc
kcopy() at netbsd:kcopy+0x44
copyout_vmspace() at netbsd:copyout_vmspace+0x94
uiomove() at netbsd:uiomove+0xb8
uvm_io() at netbsd:uvm_io+0x140
process_domem() at netbsd:process_domem+0xa8
ptrace_doio() at netbsd:ptrace_doio+0x144
do_ptrace() at netbsd:do_ptrace+0xce4
sys_ptrace() at netbsd:sys_ptrace+0x3c
syscall() at netbsd:syscall+0x300
XentSys() at netbsd:XentSys+0x60
--- syscall (26) ---
--- user mode ---
db> ps
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
2108 2108 4 0 1000000 fffffc003f51ae80 t_ptrace_wait
637 > 637 7 0 0 fffffc003f51a600 t_ptrace_wait
184 184 3 0 80 fffffc003f51b2c0 atf-run poll
...
db> trace/a fffffc003f51a600
trace: pid 637 lid 637 at 0xfffffc000046dde8
mi_switch() at netbsd:mi_switch+0x228
db>
----
Full console log including dmesg is provided here:
http://www.netbsd.org/~rin/alpha_log_20200910.txt
This should be a regression introduced after 2020-05-20, when I carried
out ATF and that test successfully passed.
>How-To-Repeat:
cd /usr/tests/lib/libc/sys && atf-run t_ptrace_wait
>Fix:
N/A
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: port-alpha-maintainer->thorpej
Responsible-Changed-By: thorpej@NetBSD.org
Responsible-Changed-When: Sat, 19 Sep 2020 15:41:27 +0000
Responsible-Changed-Why:
Take.
From: Rin Okuyama <rokuyama.rk@gmail.com>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@NetBSD.org>
Cc:
Subject: Re: port-alpha/55652 (bytes_transfer_eof_piod_write_i test causes
kernel freeze)
Date: Tue, 6 Oct 2020 08:31:20 +0900
I'm not sure whether this is related to the original problem or not,
for -current as of yesterday on DS10,
fs/vfs/t_renamerace:*
also causes hang (``*'' above changes from time to time? sometimes
ffslog_renamerace_dirs, other times lfs_renamerace_cycle):
----
# cd /usr/tests/fs/vfs && atf-run t_renamerace
Content-Type: application/X-atf-tps; version="3"
...
tc-start: 1601937221.477572, ffslog_renamerace_cycle
tc-so:[ 1.0000000] entropy: no seed from bootloader
tc-so:[ 1.0000000] entropy: ready
tc-end: 1601937232.173994, ffslog_renamerace_cycle, passed
tc-start: 1601937232.178267, ffslog_renamerace_dirs
^C^C^C
~Stopped in pid 705.1974 (t_renamerace) at netbsd:cpu_Debugger+0x4:
ret zero,(ra)
----
``bt'' and ``trace/a'' for curlwp isn't interesting:
----
db{0}> bt
cpu_Debugger() at netbsd:cpu_Debugger+0x4
comintr() at netbsd:comintr+0xb84
alpha_shared_intr_wrapper() at netbsd:alpha_shared_intr_wrapper+0x38
alpha_shared_intr_dispatch() at netbsd:alpha_shared_intr_dispatch+0x5c
sio_iointr() at netbsd:sio_iointr+0x44
interrupt() at netbsd:interrupt+0x268
XentInt() at netbsd:XentInt+0x1c
--- interrupt (from ipl 0) ---
--- user mode ---
db{0}> ps
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
705 >1974 7 0 0 fffffc003edbb300 t_renamerace
705 1399 2 0 40000 fffffc00375bd280 t_renamerace
705 691 3 0 80 fffffc003bb00ac0 t_renamerace parked
705 2591 3 0 80 fffffc00375f0140 vmem_rehash parked
705 720 5 0 0 fffffc00375f1ac0 (zombie)
705 707 5 0 0 fffffc0037aa5580 (zombie)
705 726 3 0 80 fffffc003bb01780 pmfsuspend parked
...
db{0}> trace/a fffffc003edbb300
trace: pid 705 lid 1974 at 0xfffffc0000495ea8
alpha_softint_return() at netbsd:alpha_softint_return
--- root of call graph ---
db{0}> reboot 0x4
----
Full console log is provided here:
http://www.netbsd.org/~rin/alpha_log_20201006.txt
This occurs for kernel compiled by both GCC9 and GCC8.
Thanks,
rin
From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-alpha/55652 (bytes_transfer_eof_piod_write_i test causes
kernel freeze)
Date: Wed, 7 Oct 2020 16:14:10 +0900
I've confirmed that at least the following tests hang kernel in a
similar manner as I reported in the previous message:
- lib/libpthread/t_kill
- fs/tmpfs/t_renamerace
- fs/vfs/t_renamerace
- fs/vfs/t_rmdirrace
Kernel hang is not 100% reproducible, but almost certainly takes place.
It occurs for the systems both compiled by GCC8 and GCC9. Never seen
before May 2020.
I will send another PR for this, if appropriate.
Thanks,
rin
State-Changed-From-To: open->closed
State-Changed-By: rin@NetBSD.org
State-Changed-When: Wed, 07 Jul 2021 05:42:43 +0000
State-Changed-Why:
Fixed. (Duplicate of port-alpha/56197)
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.