NetBSD Problem Report #51219

From tls@panix.com  Sun Jun  5 22:46:21 2016
Return-Path: <tls@panix.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id E27F87ABE0
	for <gnats-bugs@gnats.NetBSD.org>; Sun,  5 Jun 2016 22:46:21 +0000 (UTC)
Message-Id: <20160605224618.DAD12242AA@panix5.panix.com>
Date: Sun,  5 Jun 2016 18:46:18 -0400 (EDT)
From: tls@NetBSD.ORG
Reply-To: tls@NetBSD.ORG
To: gnats-bugs@NetBSD.org
Subject: TLB miss panic in in_cksum
X-Send-Pr-Version: 3.95

>Number:         51219
>Category:       port-evbmips
>Synopsis:       TLB miss panic in in_cksum triggered by piping gzip through ssh.
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    port-evbmips-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Jun 05 22:50:01 +0000 2016
>Closed-Date:    Thu May 25 06:14:15 +0000 2017
>Last-Modified:  Thu May 25 06:14:15 +0000 2017
>Originator:     tls@NetBSD.ORG
>Release:        NetBSD 7.99.30
>Organization:
	The NetBSD Foundation, Inc.
>Environment:


System: NetBSD 7.99.30 NetBSD 7.99.30 (ERLITE.201606051010Z) evbmips
Architecture: mips64eb
Machine: evbmips
>Description:
# dd if=/dev/rsd0c bs=64k | gzip -1 | ssh root@192.168.100.1 dd bs=128k of=/diskless-mips64eb/erlite-factory.dd.gz
The authenticity of host '192.168.100.1 (192.168.100.1)' can't be established.
ECDSA key fingerprint is SHA256:u54QRuCPUUI+WZ6P1cQIz/bQawAjq6IVK4FAugd/nXo.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added '192.168.100.1' (ECDSA) to the list of known hosts.
root@192.168.100.1's password: 
pid 0(system): trap: cpu0, TLB miss (load or instr. fetch) in kernel mode
status=0x80fc83, cause=0x208, epc=0xffffffff8045cb98, vaddr=0xc00000000e5e4844
tf=0x9800000410003800 ksp=0x9800000410003940 ra=0xffffffff8025cf38 ppl=0
kernel: TLB miss (load or instr. fetch) trap
Stopped in pid 0.3 (system) at  netbsd:cpu_in_cksum+0x128:      lwu     a6,-60(v
0)
db> trace
0x9800000410003940: cpu_in_cksum+128 (568,2c7e3a000,2785ba000,c8e4a832) ra fffff
fff8025cf38 sz 48
0x9800000410003970: in_delayed_cksum+40 (568,2c7e3a000,2785ba000,c8e4a832) ra ff
ffffff8025e760 sz 48
0x98000004100039a0: ip_output+f68 (980000000fee6400,2c7e3a000,2785ba000,c8e4a832
) ra ffffffff802697d4 sz 192
0x9800000410003a60: tcp_output+14e4 (980000000fee6400,2c7e3a000,2785ba000,c8e4a8
32) ra ffffffff80265b90 sz 320
0x9800000410003ba0: tcp_input+dc0 (980000000fee6400,14,6,20) ra ffffffff8025b8a0
 sz 512
0x9800000410003da0: ipintr+9f0 (980000000fee6400,14,6,20) ra ffffffff803a0fc4 sz
 160
0x9800000410003e40: softint_dispatch+114 (980000000fee6400,14,6,20) ra ffffffff8
0200244 sz 96
0x9800000410003ea0: softint_fast_dispatch+7c (0,14,6,20) ra 0 sz 32
User-level: pid 0.3
db>

Note the traceback through ipintr->tcp_input->tcp_output: is this an ack?

>How-To-Repeat:
	Boot the ERLITE kernel.  Try to back up the factory image using
	dd if=/dev/rsd0c bs=64k | gzip -1 | ssh somehost dd of=img.dd.gz

	Wham.
>Fix:
	Building the kernel with options SOSEND_NO_LOAN makes the problem
	disappear.  A pmap issue?  Whatever it is, it's been there for at
	least a year -- a kernel from last May has the problem.

>Release-Note:

>Audit-Trail:

From: Thor Lancelot Simon <tls@panix.com>
To: gnats-bugs@NetBSD.org
Cc: tls@NetBSD.ORG
Subject: Re: port-evbmips/51219: TLB miss panic in in_cksum
Date: Sun, 5 Jun 2016 19:24:59 -0400

 It can be triggered even more simply:
 	cat /dev/zero | ssh somehost dd of=/dev/null

 The trace through ipintr->tcp_input->tcp_output suggests this is triggered
 by sending an ack, I think.

 id 0(system): trap: cpu0, TLB miss (load or instr. fetch) in kernel mode
 status=0x80fc83, cause=0x208, epc=0xffffffff8045cb98, vaddr=0xc00000000e5c86a4
 tf=0x9800000410003800 ksp=0x9800000410003940 ra=0xffffffff8025cf38 ppl=0
 kernel: TLB miss (load or instr. fetch) trap
 Stopped in pid 0.3 (system) at  netbsd:cpu_in_cksum+0x128:      lwu
 a6,-60(v
 0)
 db> trace
 0x9800000410003940: cpu_in_cksum+128 (568,30a6616a2,2bad416a2,ee330d06) ra
 fffff
 fff8025cf38 sz 48
 0x9800000410003970: in_delayed_cksum+40 (568,30a6616a2,2bad416a2,ee330d06) ra
 ff
 ffffff8025e760 sz 48
 0x98000004100039a0: ip_output+f68
 (980000000fef3a00,30a6616a2,2bad416a2,ee330d06
 ) ra ffffffff802697d4 sz 192
 0x9800000410003a60: tcp_output+14e4
 (980000000fef3a00,30a6616a2,2bad416a2,ee330d
 06) ra ffffffff80265b90 sz 320
 0x9800000410003ba0: tcp_input+dc0 (980000000fef3a00,14,6,20) ra
 ffffffff8025b8a0
  sz 512
 0x9800000410003da0: ipintr+9f0 (980000000fef3a00,14,6,20) ra ffffffff803a0fc4
 sz
  160
 0x9800000410003e40: softint_dispatch+114 (980000000fef3a00,14,6,20) ra
 ffffffff8
 0200244 sz 96
 0x9800000410003ea0: softint_fast_dispatch+7c (0,14,6,20) ra 0 sz 32
 User-level: pid 0.3

State-Changed-From-To: open->feedback
State-Changed-By: maya@NetBSD.org
State-Changed-When: Mon, 31 Oct 2016 02:50:41 +0000
State-Changed-Why:
Seems like this was fixed with the rest of the problems and I can't reproduce it now on .41


State-Changed-From-To: feedback->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Thu, 25 May 2017 06:14:15 +0000
State-Changed-Why:
feedback timeout


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.