NetBSD Problem Report #55652

From www@netbsd.org  Thu Sep 10 09:52:20 2020
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id DEF381A9239
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 10 Sep 2020 09:52:20 +0000 (UTC)
Message-Id: <20200910095219.D57211A923A@mollari.NetBSD.org>
Date: Thu, 10 Sep 2020 09:52:19 +0000 (UTC)
From: rokuyama.rk@gmail.com
Reply-To: rokuyama.rk@gmail.com
To: gnats-bugs@NetBSD.org
Subject: bytes_transfer_eof_piod_write_i test causes kernel freeze
X-Send-Pr-Version: www-1.0

>Number:         55652
>Category:       port-alpha
>Synopsis:       bytes_transfer_eof_piod_write_i test causes kernel freeze
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    thorpej
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Sep 10 09:55:00 +0000 2020
>Closed-Date:    Wed Jul 07 05:42:43 +0000 2021
>Last-Modified:  Wed Jul 07 05:42:43 +0000 2021
>Originator:     Rin Okuyama
>Release:        9.99.72
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD ds10 9.99.72 NetBSD 9.99.72 (GENERIC-$Revision: 1.407 $) #3: Thu Sep 10 18:11:06 JST 2020  rin@latipes:/sys/arch/alpha/compile/DS10 alpha
>Description:
bytes_transfer_eof_piod_write_i test in tests/lib/libc/sys/t_ptrace_wait
causes kernel freeze on my DS10 (single-core 21264A processor), in which
no operation is possible except for entering DDB from console:

----
ds10# cd /usr/tests/lib/libc/sys
ds10# atf-run t_ptrace_wait
...
tc-end: 1599729844.489525, bytes_transfer_eof_piod_write_d, passed
tc-start: 1599729844.490780, bytes_transfer_eof_piod_write_i
----

Here's backtrace etc. obtained by DDB:

----
tc-start: 1599729844.490780, bytes_transfer_eof_piod_write_i
~Stopped in pid 637.637 (t_ptrace_wait) at       netbsd:cpu_Debugger+0x4:
ret     zero,(ra)
db> bt
cpu_Debugger() at netbsd:cpu_Debugger+0x4
comintr() at netbsd:comintr+0xb84
alpha_shared_intr_dispatch() at netbsd:alpha_shared_intr_dispatch+0x5c
sio_iointr() at netbsd:sio_iointr+0x44
interrupt() at netbsd:interrupt+0x214
XentInt() at netbsd:XentInt+0x1c
--- interrupt (from ipl 0) ---
rw_enter() at netbsd:rw_enter+0x1a0
vm_map_lock_read() at netbsd:vm_map_lock_read+0x20
uvm_fault_internal() at netbsd:uvm_fault_internal+0x10c
trap() at netbsd:trap+0x604
XentMM() at netbsd:XentMM+0x20
--- memory management fault ---
kcopyerr() at netbsd:kcopyerr+0xc
kcopy() at netbsd:kcopy+0x44
copyout_vmspace() at netbsd:copyout_vmspace+0x94
uiomove() at netbsd:uiomove+0xb8
uvm_io() at netbsd:uvm_io+0x140
process_domem() at netbsd:process_domem+0xa8
ptrace_doio() at netbsd:ptrace_doio+0x144
do_ptrace() at netbsd:do_ptrace+0xce4
sys_ptrace() at netbsd:sys_ptrace+0x3c
syscall() at netbsd:syscall+0x300
XentSys() at netbsd:XentSys+0x60
--- syscall (26) ---
--- user mode ---
db> ps
PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
2108  2108 4   0   1000000   fffffc003f51ae80      t_ptrace_wait
637  > 637 7   0         0   fffffc003f51a600      t_ptrace_wait
184    184 3   0        80   fffffc003f51b2c0            atf-run poll
...
db> trace/a fffffc003f51a600
trace: pid 637 lid 637 at 0xfffffc000046dde8
mi_switch() at netbsd:mi_switch+0x228
db>
----

Full console log including dmesg is provided here:
	http://www.netbsd.org/~rin/alpha_log_20200910.txt

This should be a regression introduced after 2020-05-20, when I carried
out ATF and that test successfully passed.
>How-To-Repeat:
cd /usr/tests/lib/libc/sys && atf-run t_ptrace_wait
>Fix:
N/A

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: port-alpha-maintainer->thorpej
Responsible-Changed-By: thorpej@NetBSD.org
Responsible-Changed-When: Sat, 19 Sep 2020 15:41:27 +0000
Responsible-Changed-Why:
Take.


From: Rin Okuyama <rokuyama.rk@gmail.com>
To: "gnats-bugs@NetBSD.org" <gnats-bugs@NetBSD.org>
Cc: 
Subject: Re: port-alpha/55652 (bytes_transfer_eof_piod_write_i test causes
 kernel freeze)
Date: Tue, 6 Oct 2020 08:31:20 +0900

 I'm not sure whether this is related to the original problem or not,
 for -current as of yesterday on DS10,

 	fs/vfs/t_renamerace:*

 also causes hang (``*'' above changes from time to time? sometimes
 ffslog_renamerace_dirs, other times lfs_renamerace_cycle):

 ----
 # cd /usr/tests/fs/vfs && atf-run t_renamerace
 Content-Type: application/X-atf-tps; version="3"
 ...
 tc-start: 1601937221.477572, ffslog_renamerace_cycle
 tc-so:[   1.0000000] entropy: no seed from bootloader
 tc-so:[   1.0000000] entropy: ready
 tc-end: 1601937232.173994, ffslog_renamerace_cycle, passed
 tc-start: 1601937232.178267, ffslog_renamerace_dirs
 ^C^C^C
 ~Stopped in pid 705.1974 (t_renamerace) at       netbsd:cpu_Debugger+0x4:
 ret     zero,(ra)
 ----

 ``bt'' and ``trace/a'' for curlwp isn't interesting:

 ----
 db{0}> bt
 cpu_Debugger() at netbsd:cpu_Debugger+0x4
 comintr() at netbsd:comintr+0xb84
 alpha_shared_intr_wrapper() at netbsd:alpha_shared_intr_wrapper+0x38
 alpha_shared_intr_dispatch() at netbsd:alpha_shared_intr_dispatch+0x5c
 sio_iointr() at netbsd:sio_iointr+0x44
 interrupt() at netbsd:interrupt+0x268
 XentInt() at netbsd:XentInt+0x1c
 --- interrupt (from ipl 0) ---
 --- user mode ---
 db{0}> ps
 PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
 705  >1974 7   0         0   fffffc003edbb300       t_renamerace
 705   1399 2   0     40000   fffffc00375bd280       t_renamerace
 705    691 3   0        80   fffffc003bb00ac0       t_renamerace parked
 705   2591 3   0        80   fffffc00375f0140        vmem_rehash parked
 705    720 5   0         0   fffffc00375f1ac0           (zombie)
 705    707 5   0         0   fffffc0037aa5580           (zombie)
 705    726 3   0        80   fffffc003bb01780         pmfsuspend parked
 ...
 db{0}> trace/a fffffc003edbb300
 trace: pid 705 lid 1974 at 0xfffffc0000495ea8
 alpha_softint_return() at netbsd:alpha_softint_return
 --- root of call graph ---
 db{0}> reboot 0x4
 ----

 Full console log is provided here:

 	http://www.netbsd.org/~rin/alpha_log_20201006.txt

 This occurs for kernel compiled by both GCC9 and GCC8.

 Thanks,
 rin

From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-alpha/55652 (bytes_transfer_eof_piod_write_i test causes
 kernel freeze)
Date: Wed, 7 Oct 2020 16:14:10 +0900

 I've confirmed that at least the following tests hang kernel in a
 similar manner as I reported in the previous message:

 - lib/libpthread/t_kill
 - fs/tmpfs/t_renamerace
 - fs/vfs/t_renamerace
 - fs/vfs/t_rmdirrace

 Kernel hang is not 100% reproducible, but almost certainly takes place.
 It occurs for the systems both compiled by GCC8 and GCC9. Never seen
 before May 2020.

 I will send another PR for this, if appropriate.

 Thanks,
 rin

State-Changed-From-To: open->closed
State-Changed-By: rin@NetBSD.org
State-Changed-When: Wed, 07 Jul 2021 05:42:43 +0000
State-Changed-Why:
Fixed. (Duplicate of port-alpha/56197)


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.