NetBSD Problem Report #56382

From www@netbsd.org  Mon Aug 30 02:16:36 2021
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 33E3E1A9239
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 30 Aug 2021 02:16:36 +0000 (UTC)
Message-Id: <20210830021634.54DE01A923A@mollari.NetBSD.org>
Date: Mon, 30 Aug 2021 02:16:34 +0000 (UTC)
From: rokuyama.rk@gmail.com
Reply-To: rokuyama.rk@gmail.com
To: gnats-bugs@NetBSD.org
Subject: Kernel crash by calling through null pointer
X-Send-Pr-Version: www-1.0

>Number:         56382
>Category:       port-sh3
>Synopsis:       Kernel crash by calling through null pointer
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-sh3-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Aug 30 02:20:00 +0000 2021
>Closed-Date:    Fri Jul 21 09:00:46 +0000 2023
>Last-Modified:  Fri Jul 21 09:00:46 +0000 2023
>Originator:     Rin Okuyama
>Release:        9.99.88
>Organization:
Department of Physics, Meiji University
>Environment:
NetBSD hdlu 9.99.88 NetBSD 9.99.88 (GENERIC) #2: Thu Aug 26 20:34:10 JST 2021  rin@latipes:/sys/arch/landisk/compile/GENERIC landisk
>Description:
Sometimes kernel crashes at 0x0 (both on HDL-U and USL-5P):

----
kernel mode trap: address error (load) code = 0x0
Stopped in pid 22998.22998 (sh) at      0:      ????
db> bt
calling through null pointer?
pmap_page_protect() at 8fe55e00
symbol not found
db> show reg
r0          e
r1          deaddead
r2          ffffffe9
r3          0
r4          75801000
r5          75801000
r6          8eea3000
r7          1
r8          8ebfe960
r9          8c687748
r10         75801000
r11         0
r12         8c00ad2c    __pmap_pte_lookup
r13         8c00ad70    pmap_remove
r14         c2f2de38
r15         c2f2de38
pr          8c00b078    pmap_page_protect+0x94
pc          0
sr          400001c0
mach        515
macl        54359fbd
0:      ????
----

It seems that the jump to NULL occurs when __pmap_pte_lookup() is called
from pmap_remove():

----
8c00ad70 <pmap_remove>:
8c00ad70:       86 2f           mov.l   r8,@-r15
8c00ad72:       96 2f           mov.l   r9,@-r15
8c00ad74:       a6 2f           mov.l   r10,@-r15
8c00ad76:       b6 2f           mov.l   r11,@-r15
8c00ad78:       c6 2f           mov.l   r12,@-r15
8c00ad7a:       d6 2f           mov.l   r13,@-r15
8c00ad7c:       e6 2f           mov.l   r14,@-r15
8c00ad7e:       22 4f           sts.l   pr,@-r15
8c00ad80:       f8 7f           add     #-8,r15
8c00ad82:       f3 6e           mov     r15,r14
8c00ad84:       61 1e           mov.l   r6,@(4,r14)
8c00ad86:       43 69           mov     r4,r9
8c00ad88:       53 68           mov     r5,r8
8c00ad8a:       e1 51           mov.l   @(4,r14),r1
8c00ad8c:       12 38           cmp/hs  r1,r8
8c00ad8e:       0b 8f           bf.s    8c00ada8 <pmap_remove+0x38>
8c00ad90:       83 65           mov     r8,r5
8c00ad92:       08 7e           add     #8,r14
8c00ad94:       e3 6f           mov     r14,r15
8c00ad96:       26 4f           lds.l   @r15+,pr
8c00ad98:       f6 6e           mov.l   @r15+,r14
8c00ad9a:       f6 6d           mov.l   @r15+,r13
8c00ad9c:       f6 6c           mov.l   @r15+,r12
8c00ad9e:       f6 6b           mov.l   @r15+,r11
8c00ada0:       f6 6a           mov.l   @r15+,r10
8c00ada2:       f6 69           mov.l   @r15+,r9
8c00ada4:       0b 00           rts
8c00ada6:       f6 68           mov.l   @r15+,r8
8c00ada8:       28 d1           mov.l   8c00ae4c <pmap_remove+0xdc>,r1  ! 8c00ad
2c <__pmap_pte_lookup>
8c00adaa:       0b 41           jsr     @r1
...
----

But I'm not sure whether it is possible or not...

Also, there's magic number in r1: 0xdeaddead. This may be PI_MAGIC in
subr_pool.c (DIAGNOSTIC is enabled for this kernel), or come from
somewhere else...
>How-To-Repeat:
Make pkgsrc's for few hours to one day on landisk.
>Fix:
No idea...

>Release-Note:

>Audit-Trail:
From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/56382 CVS commit: src/sys/arch/sh3/sh3
Date: Wed, 15 Sep 2021 11:03:25 +0000

 Module Name:	src
 Committed By:	rin
 Date:		Wed Sep 15 11:03:25 UTC 2021

 Modified Files:
 	src/sys/arch/sh3/sh3: exception.c

 Log Message:
 For kernel mode address error, do not overwrite tf->tf_spc and tf->tf_r0
 *before* checking pcb->pbc_onfault != NULL.

 Should fix part of

 PR port-sh3/56382
 PR port-sh3/56401

 i.e., DDB will no longer wrongly indicate NULL as fault PC for kernel mode
 address error (and 0xe == EFAULT as r0).

 Yes, we have another bugs that cause panics described in the two PRs, but
 now we can examine them more easily :).


 To generate a diff of this commit:
 cvs rdiff -u -r1.73 -r1.74 src/sys/arch/sh3/sh3/exception.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: rin@NetBSD.org
State-Changed-When: Fri, 21 Jul 2023 09:00:46 +0000
State-Changed-Why:
Jump to NULL itself has been fixed, and will be a part of NetBSD 10.0.
Related symptoms have not been observed recently.


>Unformatted:
Home
PR Database Search
(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.