NetBSD Problem Report #55121

From paul@whooppee.com  Sat Mar 28 13:59:24 2020
Return-Path: <paul@whooppee.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 3471A1A9217
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 28 Mar 2020 13:59:24 +0000 (UTC)
Message-Id: <20200328135850.D1CB530F2C3@speedy.whooppee.com>
Date: Sat, 28 Mar 2020 06:58:50 -0700 (PDT)
From: paul@whooppee.com
Reply-To: paul@whooppee.com
To: gnats-bugs@NetBSD.org
Subject: wm0/ihphy0 crash during reboot
X-Send-Pr-Version: 3.95

>Number:         55121
>Category:       kern
>Synopsis:       wm0/ihphy0 crash during reboot
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    thorpej
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Mar 28 14:00:00 +0000 2020
>Closed-Date:    Sat Mar 28 21:14:25 +0000 2020
>Last-Modified:  Sat Mar 28 21:14:25 +0000 2020
>Originator:     Paul Goyette
>Release:        NetBSD 9.99.46
>Organization:
+--------------------+--------------------------+-----------------------+
| Paul Goyette       | PGP Key fingerprint:     | E-mail addresses:     |
| (Retired)          | FA29 0E3B 35AF E8AE 6651 | paul@whooppee.com     |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette@netbsd.org   |
+--------------------+--------------------------+-----------------------+
>Environment:


System: NetBSD speedy.whooppee.com 9.99.46 NetBSD 9.99.46 (SPEEDY 2020-02-07 16:26:35 UTC) #1: Fri Feb 7 19:37:58 UTC 2020 paul@speedy.whooppee.com:/build/netbsd-local/obj/amd64/sys/arch/amd64/compile/SPEEDY amd64
Architecture: x86_64
M


achine: amd64
>Description:
When shutting down my system, it panics with

panic: kernel diagnostic assertion "(cf == NULL || cf->cf_fstate == FSTATE_FOUND || cf->cf_fstate == FSTATE_STAR)" failed: file "/build/netbsd-local/src_ro/sys/kern/subr_autoconf.c", line 1745 config_detach: ihphy0: bad device fstate: 0

I forced a crash-dump with ``reboot 0x100''

Using gdb to examine the dump, I get the following backtrace:

#0  0xffffffff80224095 in cpu_reboot (howto=howto@entry=256, 
    bootstr=bootstr@entry=0x0)
    at /build/netbsd-local/src_ro/sys/arch/amd64/amd64/machdep.c:720
#1  0xffffffff803fbe7f in kern_reboot (howto=256, bootstr=bootstr@entry=0x0)
    at /build/netbsd-local/src_ro/sys/kern/kern_reboot.c:73
#2  0xffffffff8033bb83 in db_reboot_cmd (addr=<optimized out>, 
    have_addr=<optimized out>, count=<optimized out>, modif=<optimized out>)
    at /build/netbsd-local/src_ro/sys/ddb/db_command.c:1432
#3  0xffffffff8033c39b in db_command (
    last_cmdp=last_cmdp@entry=0xffffffff80a07280 <db_last_command>)
    at /build/netbsd-local/src_ro/sys/ddb/db_command.c:940
#4  0xffffffff8033c706 in db_command_loop ()
    at /build/netbsd-local/src_ro/sys/ddb/db_command.c:599
#5  0xffffffff8034020a in db_trap (type=type@entry=1, code=code@entry=0)
    at /build/netbsd-local/src_ro/sys/ddb/db_trap.c:91
#6  0xffffffff80220b45 in kdb_trap (type=type@entry=1, code=code@entry=0, 
    regs=regs@entry=0xffffab8927d67b30)
    at /build/netbsd-local/src_ro/sys/arch/amd64/amd64/db_interface.c:247
#7  0xffffffff80225a2e in trap (frame=0xffffab8927d67b30)
    at /build/netbsd-local/src_ro/sys/arch/amd64/amd64/trap.c:315
#8  0xffffffff8021ec83 in alltraps ()
#9  0xffffffff8021f49d in breakpoint ()
#10 0xffffffff8043ee92 in vpanic (
    fmt=0xffffffff808e5538 "kernel %sassertion \"%s\" failed: file \"%s\", line %d config_detach: %s: bad device fstate: %d", ap=ap@entry=0xffffab8927d67c68)
    at /build/netbsd-local/src_ro/sys/kern/subr_prf.c:334
#11 0xffffffff8071cdc3 in kern_assert (
    fmt=fmt@entry=0xffffffff808e5538 "kernel %sassertion \"%s\" failed: file \"%s\", line %d config_detach: %s: bad device fstate: %d")
    at /build/netbsd-local/src_ro/sys/lib/libkern/kern_assert.c:51
#12 0xffffffff80422df7 in config_detach (dev=0xffff8115755f0a00, 
    flags=flags@entry=1)
    at /build/netbsd-local/src_ro/sys/kern/subr_autoconf.c:1742
#13 0xffffffff8034547e in mii_detach (mii=0xffff811575600848, phyloc=-1, 
    offloc=-1) at /build/netbsd-local/src_ro/sys/dev/mii/mii.c:259
#14 0xffffffff8028510c in wm_detach (self=<optimized out>, 
    flags=<optimized out>)
    at /build/netbsd-local/src_ro/sys/dev/pci/if_wm.c:3104
#15 0xffffffff80422e77 in config_detach (dev=dev@entry=0xffff8115755f06c0, 
    flags=flags@entry=4)
    at /build/netbsd-local/src_ro/sys/kern/subr_autoconf.c:1770
#16 0xffffffff80424ee8 in config_detach_all (how=<optimized out>)
    at /build/netbsd-local/src_ro/sys/kern/subr_autoconf.c:1916
#17 0xffffffff8022406f in cpu_reboot (howto=howto@entry=0, 
    bootstr=bootstr@entry=0x0) at ./machine/cpu.h:72
#18 0xffffffff803fbe7f in kern_reboot (howto=0, bootstr=0x0)
    at /build/netbsd-local/src_ro/sys/kern/kern_reboot.c:73
#19 0xffffffff803fbee2 in sys_reboot (l=<optimized out>, 
    uap=0xffffab8927d68000, retval=<optimized out>)
    at /build/netbsd-local/src_ro/sys/kern/kern_reboot.c:102
#20 0xffffffff80246919 in sy_call (rval=0xffffab8927d67fb0, 
    uap=0xffffab8927d68000, l=0xffff811576033800, 
    sy=0xffffffff80a100c0 <sysent+4992>)
    at /build/netbsd-local/src_ro/sys/sys/syscallvar.h:65
#21 sy_invoke (code=208, rval=0xffffab8927d67fb0, uap=0xffffab8927d68000, 
    l=0xffff811576033800, sy=0xffffffff80a100c0 <sysent+4992>)
    at /build/netbsd-local/src_ro/sys/sys/syscallvar.h:94
#22 syscall (frame=0xffffab8927d68000)
    at /build/netbsd-local/src_ro/sys/arch/x86/x86/syscall.c:138
#23 0xffffffff802096ad in handle_syscall ()


I rebooted the new kernel again and set breakpoints at three critical
locations

	ifmedia_fini
	mii_detach
	wm_detach

and again tried to reboot.  It printed (among other messages) the line

	ihphy0 detached

before hitting any breakpoints, and then hit the breakpoint at wm_detach.
This seems to indicate that the shutdown process has already (at least
partially) detached the ihphy0 before calling wm_detach() which will
eventually call ifmedia_fini().


>How-To-Repeat:

>Fix:


>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->thorpej
Responsible-Changed-By: thorpej@NetBSD.org
Responsible-Changed-When: Sat, 28 Mar 2020 18:15:10 +0000
Responsible-Changed-Why:
TAke.


State-Changed-From-To: open->analyzed
State-Changed-By: thorpej@NetBSD.org
State-Changed-When: Sat, 28 Mar 2020 18:15:10 +0000
State-Changed-Why:
Analyzed.  This is because PHYs have DVF_DETACH_SHUTDOWN, so the rug is
pulled out from under the MII layer.  The fix is to remove DVF_DETACH_SHUTDOWN
from the PHY drivers and let the MII driver manage it itself.  The
DVF_DETACH_SHUTDOWN flag is seriously misguided anyway.


From: "Jason R Thorpe" <thorpej@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/55121 CVS commit: src/sys
Date: Sat, 28 Mar 2020 18:37:19 +0000

 Module Name:	src
 Committed By:	thorpej
 Date:		Sat Mar 28 18:37:18 UTC 2020

 Modified Files:
 	src/sys/arch/arm/amlogic: gxlphy.c
 	src/sys/dev/mii: brgphy.c ihphy.c micphy.c nsphyter.c ukphy.c

 Log Message:
 Don't set DVF_DETACH_SHUTDOWN.  The MII layer wants to manage the lifecycle
 of the PHY devices, and if a NIC driver chooses not to detach its PHYs
 at shutdown, that's the driver's business.

 PR kern/55121.


 To generate a diff of this commit:
 cvs rdiff -u -r1.3 -r1.4 src/sys/arch/arm/amlogic/gxlphy.c
 cvs rdiff -u -r1.88 -r1.89 src/sys/dev/mii/brgphy.c
 cvs rdiff -u -r1.17 -r1.18 src/sys/dev/mii/ihphy.c
 cvs rdiff -u -r1.13 -r1.14 src/sys/dev/mii/micphy.c
 cvs rdiff -u -r1.45 -r1.46 src/sys/dev/mii/nsphyter.c
 cvs rdiff -u -r1.53 -r1.54 src/sys/dev/mii/ukphy.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: analyzed->feedback
State-Changed-By: thorpej@NetBSD.org
State-Changed-When: Sat, 28 Mar 2020 18:40:52 +0000
State-Changed-Why:
Please confirm ihphy.c 1.18 fixes the problem.


State-Changed-From-To: feedback->closed
State-Changed-By: pgoyette@NetBSD.org
State-Changed-When: Sat, 28 Mar 2020 21:14:25 +0000
State-Changed-Why:
Fix was committed - problem gone.  Thanks for quick turn-around.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.