NetBSD Problem Report #50338

From www@NetBSD.org  Fri Oct 16 20:15:43 2015
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id F0178A65E6
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 16 Oct 2015 20:15:43 +0000 (UTC)
Message-Id: <20151016201542.B7B70A65E6@mollari.NetBSD.org>
Date: Fri, 16 Oct 2015 20:15:42 +0000 (UTC)
From: scole_mail@gmx.com
Reply-To: scole_mail@gmx.com
To: gnats-bugs@NetBSD.org
Subject: panic with ping -f and rtk0
X-Send-Pr-Version: www-1.0

>Number:         50338
>Category:       port-macppc
>Synopsis:       panic with ping -f and rtk0
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    port-macppc-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Oct 16 20:20:00 +0000 2015
>Closed-Date:    Tue Dec 18 19:44:57 +0000 2018
>Last-Modified:  Tue Dec 18 19:44:57 +0000 2018
>Originator:     scole_mail
>Release:        7.0
>Organization:
none
>Environment:
NetBSD pm7200 7.99.21 NetBSD 7.99.21 (GENERIC-$Revision: 1.7 $) #1: Sat Oct 10 11:41:23 EDT 2015 scole@dstar:/home/scole/nbsd/src/sys/arch/macppc/compile/obj/GENERIC_601 macppc
>Description:
I'm getting a panic which I think is caused by high network activity:

trap: pid 590.1 (syslogd): kernel PGM trap @ 0x445c40 (SRR1=0x81030)
panic: trap
cpu0: Begin traceback...
0x00469f40: at vpanic+0x138
0x00469f70: at panic+0x4c
0x00469fb0: at trap+0x444
0x0046a030: kernel PGM trap by cpu_info+0: srr1=0x81030
           r1=0 cr=0x6f128 xer=0 ctr=0x1c6ca4 mq=0x6d40
cpu0: End traceback...
dumpsys: TBD
rebooting

This happens on current and 7.0.  I added a pci rtk0 card to my ppc601 machine to try to use it instead of the slower built-in mc0. 

I put both interfaces on the same network:
ifconfig_mc0="inet 10.0.0.19 netmask 255.255.255.0"
ifconfig_rtk0="inet 10.0.0.20 netmask 255.255.255.0"

"ping -f 10.0.0.20" (rtk) will load and eventually crash the machine.  Pinging the mc0 interface doesn't seem to cause a crash.

When both interfaces were up, with "netstat -i" I noticed pings would go in the mc0 but come out rtk0.  But pinging mc0 with or without rtk0 didn't seem to cause issues.  Pinging rtk0 caused the panic whether mc0 was up or not.

I think it also crashed once when doing a cvs update.
>How-To-Repeat:
add rtk interface and flood ping the machine.  
>Fix:

>Release-Note:

>Audit-Trail:
From: scole_mail <scole_mail@gmx.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-macppc/50338: panic with ping -f and rtk0
Date: Mon, 19 Oct 2015 11:28:44 -0400

 Don't know if this is helpful, but I did a ddb panic with symbols
 built in and got this.  cron and ntpd (and I would guess other
 processes) are causing panics when under ping load for me.

 If there is a way I could collect more useful info let me know.  I
 couldn't get ddb to print out anything after the panic.

 Thanks

 trap: pid 2167.1 (cron): kernel PGM trap @ 0x1504f70 (SRR1=0x89030) 
 panic: trap 
 Stopped in pid 2167.1 (cron) at netbsd:vpanic+0x13c:    or      r3, r29, r29 
 0x004687f0: at panic+0x4c 
 0x00468830: at trap+0x444 
 0x004688b0: kernel PGM trap by 0x1504f70: srr1=0x89030 
             r1=0x1505000 cr=0x22009022 xer=0 ctr=0x101980 mq=0xb0d16a1f 
 0x01505000: at 0x1504f6c 
 0x01505040: at splx+0x68 
 0x01505050: at pic_handle_intr+0x224 
 0x01505090: at trapstart+0x6b0 
 0x01505160: at pic_do_pending_int+0x220 
 0x015051a0: at splx+0x68 
 0x015051b0: at pic_handle_intr+0x224 
 0x015051f0: at trapstart+0x6b0 
 0x015052c0: at splx+0x68 
 0x015052d0: at pic_handle_intr+0x224 
 0x01505310: at trapstart+0x6b0 
 0x015053e0: at pic_do_pending_int+0x220 
 0x01505420: at splx+0x68 
 0x01505430: at pic_handle_intr+0x224 
 0x01505470: at trapstart+0x6b0 
 0x01505540: at pic_do_pending_int+0x220 
 0x01505580: at splx+0x68 
 0x01505590: at pic_handle_intr+0x224 
 0x015055d0: at trapstart+0x6b0 
 0x015056a0: at pic_do_pending_int+0x220 
 0x015056e0: at splx+0x68 
 0x015056f0: at pic_handle_intr+0x224 
 0x01505730: at trapstart+0x6b0 
 saved LR(0x14) is invalid. 

From: scole_mail <scole_mail@gmx.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-macppc/50338: panic with ping -f and rtk0
Date: Wed, 21 Oct 2015 09:29:02 -0400

 So I tried to get a full gdb kernel crash dump or backtrace but wasn't
 able to.  It seems like context switching while under heavy ping load
 is a problem.  I've gotten the panic with cron, sh, and ntp so far.
 Here are some more samples

 ###########################################
 [snipped]
 0x0150bad0: at trapstart+0x6b0
 0x0150bba0: at splx+0x68
 0x0150bbb0: at pic_handle_intr+0x224
 0x0150bbf0: at trapstart+0x6b0
 0x0150bcc0: at vn_stat+0x3c
 0x0150bd80: at do_sys_statat+0x90
 0x0150bdf0: at sys___stat50+0x24
 0x0150beb0: at syscall+0x27c
 0x0150bf20: user SC trap #439 by 0xfdcb19d4: srr1=0x2d030
             r1=0xffffd810 cr=0x24888042 xer=0x20000000 ctr=0xfdcb19cc mq=0

 ############################
 [snipped]
 0x01508a30: at trapstart+0x6b0
 0x01508b00: at pic_do_pending_int+0x1f8
 0x01508b40: at splx+0x68
 0x01508b50: at pic_handle_intr+0x224
 0x01508b90: at trapstart+0x6b0
 0x01508c60: at pic_do_pending_int+0x1f8
 0x01508ca0: at splx+0x68
 0x01508cb0: at pic_handle_intr+0x224
 0x01508cf0: at trapstart+0x6b0
 0x01508dc0: at pic_do_pending_int+0x1f8
 0x01508e00: at splx+0x68
 0x01508e10: at pic_handle_intr+0x224
 0x01508e50: at trapstart+0x6b0
 0x01508f20: at trapstart+0x6b0
 0xffffdce0: at 0x4bf6f998
 trap: pid 1807.1 (cron): kernel PGM trap @ 0 (SRR1=0x80030)
 trap: pid 1807.1 (cron): kernel PGM trap @ 0x468740 (SRR1=0x80030)
 Skipping crash dump on recursive panic
 panic: trap
 Faulted in DDB; continuing...
 trap: pid 1807.1 (cron): kernel PGM trap @ 0x468740 (SRR1=0x80030)
 Skipping crash dump on recursive panic
 panic: trap
 Faulted in DDB; continuing...
 trap: pid 1807.1 (cron): kernel PGM trap @ 0x468740 (SRR1=0x80030)
 Skipping crash dump on recursive panic
 panic: trap
 Faulted in DDB; continuing...
 trap: pid 1807.1 (cron): kernel PGM trap @ 0x468740 (SRR1=0x80030)
 Skipping crash dump on recursive panic
 panic: trap
 Faulted in DDB; continuing...

 ############################
 Stopped in pid 136.1 (sh) at    netbsd:cpu_switchto+0x138:      lwz     r31, 0xc
 (r1)
 Faulted in DDB; continuing...
 trap: kernel ISI by 0x7f8802a4 (SRR1 0x40000030), lr: 0x7f8802a6
 panic: trap
 Faulted in DDB; continuing...
 trap: pid 136.1 (sh): kernel PGM trap @ 0 (SRR1=0x80030)
 Skipping crash dump on recursive panic
 panic: trap
 Faulted in DDB; continuing...
 trap: pid 136.1 (sh): kernel PGM trap @ 0x480000 (SRR1=0x80030)
 Skipping crash dump on recursive panic
 panic: trap
 Faulted in DDB; continuing...
 [snipped]

State-Changed-From-To: open->closed
State-Changed-By: scole@NetBSD.org
State-Changed-When: Tue, 18 Dec 2018 11:44:57 -0800
State-Changed-Why:
cannot reproduce bug anymore on current so assume it is fixed


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.