NetBSD Problem Report #55121
From paul@whooppee.com Sat Mar 28 13:59:24 2020
Return-Path: <paul@whooppee.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 3471A1A9217
for <gnats-bugs@gnats.NetBSD.org>; Sat, 28 Mar 2020 13:59:24 +0000 (UTC)
Message-Id: <20200328135850.D1CB530F2C3@speedy.whooppee.com>
Date: Sat, 28 Mar 2020 06:58:50 -0700 (PDT)
From: paul@whooppee.com
Reply-To: paul@whooppee.com
To: gnats-bugs@NetBSD.org
Subject: wm0/ihphy0 crash during reboot
X-Send-Pr-Version: 3.95
>Number: 55121
>Category: kern
>Synopsis: wm0/ihphy0 crash during reboot
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: thorpej
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Mar 28 14:00:00 +0000 2020
>Closed-Date: Sat Mar 28 21:14:25 +0000 2020
>Last-Modified: Sat Mar 28 21:14:25 +0000 2020
>Originator: Paul Goyette
>Release: NetBSD 9.99.46
>Organization:
+--------------------+--------------------------+-----------------------+
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| (Retired) | FA29 0E3B 35AF E8AE 6651 | paul@whooppee.com |
| Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette@netbsd.org |
+--------------------+--------------------------+-----------------------+
>Environment:
System: NetBSD speedy.whooppee.com 9.99.46 NetBSD 9.99.46 (SPEEDY 2020-02-07 16:26:35 UTC) #1: Fri Feb 7 19:37:58 UTC 2020 paul@speedy.whooppee.com:/build/netbsd-local/obj/amd64/sys/arch/amd64/compile/SPEEDY amd64
Architecture: x86_64
M
achine: amd64
>Description:
When shutting down my system, it panics with
panic: kernel diagnostic assertion "(cf == NULL || cf->cf_fstate == FSTATE_FOUND || cf->cf_fstate == FSTATE_STAR)" failed: file "/build/netbsd-local/src_ro/sys/kern/subr_autoconf.c", line 1745 config_detach: ihphy0: bad device fstate: 0
I forced a crash-dump with ``reboot 0x100''
Using gdb to examine the dump, I get the following backtrace:
#0 0xffffffff80224095 in cpu_reboot (howto=howto@entry=256,
bootstr=bootstr@entry=0x0)
at /build/netbsd-local/src_ro/sys/arch/amd64/amd64/machdep.c:720
#1 0xffffffff803fbe7f in kern_reboot (howto=256, bootstr=bootstr@entry=0x0)
at /build/netbsd-local/src_ro/sys/kern/kern_reboot.c:73
#2 0xffffffff8033bb83 in db_reboot_cmd (addr=<optimized out>,
have_addr=<optimized out>, count=<optimized out>, modif=<optimized out>)
at /build/netbsd-local/src_ro/sys/ddb/db_command.c:1432
#3 0xffffffff8033c39b in db_command (
last_cmdp=last_cmdp@entry=0xffffffff80a07280 <db_last_command>)
at /build/netbsd-local/src_ro/sys/ddb/db_command.c:940
#4 0xffffffff8033c706 in db_command_loop ()
at /build/netbsd-local/src_ro/sys/ddb/db_command.c:599
#5 0xffffffff8034020a in db_trap (type=type@entry=1, code=code@entry=0)
at /build/netbsd-local/src_ro/sys/ddb/db_trap.c:91
#6 0xffffffff80220b45 in kdb_trap (type=type@entry=1, code=code@entry=0,
regs=regs@entry=0xffffab8927d67b30)
at /build/netbsd-local/src_ro/sys/arch/amd64/amd64/db_interface.c:247
#7 0xffffffff80225a2e in trap (frame=0xffffab8927d67b30)
at /build/netbsd-local/src_ro/sys/arch/amd64/amd64/trap.c:315
#8 0xffffffff8021ec83 in alltraps ()
#9 0xffffffff8021f49d in breakpoint ()
#10 0xffffffff8043ee92 in vpanic (
fmt=0xffffffff808e5538 "kernel %sassertion \"%s\" failed: file \"%s\", line %d config_detach: %s: bad device fstate: %d", ap=ap@entry=0xffffab8927d67c68)
at /build/netbsd-local/src_ro/sys/kern/subr_prf.c:334
#11 0xffffffff8071cdc3 in kern_assert (
fmt=fmt@entry=0xffffffff808e5538 "kernel %sassertion \"%s\" failed: file \"%s\", line %d config_detach: %s: bad device fstate: %d")
at /build/netbsd-local/src_ro/sys/lib/libkern/kern_assert.c:51
#12 0xffffffff80422df7 in config_detach (dev=0xffff8115755f0a00,
flags=flags@entry=1)
at /build/netbsd-local/src_ro/sys/kern/subr_autoconf.c:1742
#13 0xffffffff8034547e in mii_detach (mii=0xffff811575600848, phyloc=-1,
offloc=-1) at /build/netbsd-local/src_ro/sys/dev/mii/mii.c:259
#14 0xffffffff8028510c in wm_detach (self=<optimized out>,
flags=<optimized out>)
at /build/netbsd-local/src_ro/sys/dev/pci/if_wm.c:3104
#15 0xffffffff80422e77 in config_detach (dev=dev@entry=0xffff8115755f06c0,
flags=flags@entry=4)
at /build/netbsd-local/src_ro/sys/kern/subr_autoconf.c:1770
#16 0xffffffff80424ee8 in config_detach_all (how=<optimized out>)
at /build/netbsd-local/src_ro/sys/kern/subr_autoconf.c:1916
#17 0xffffffff8022406f in cpu_reboot (howto=howto@entry=0,
bootstr=bootstr@entry=0x0) at ./machine/cpu.h:72
#18 0xffffffff803fbe7f in kern_reboot (howto=0, bootstr=0x0)
at /build/netbsd-local/src_ro/sys/kern/kern_reboot.c:73
#19 0xffffffff803fbee2 in sys_reboot (l=<optimized out>,
uap=0xffffab8927d68000, retval=<optimized out>)
at /build/netbsd-local/src_ro/sys/kern/kern_reboot.c:102
#20 0xffffffff80246919 in sy_call (rval=0xffffab8927d67fb0,
uap=0xffffab8927d68000, l=0xffff811576033800,
sy=0xffffffff80a100c0 <sysent+4992>)
at /build/netbsd-local/src_ro/sys/sys/syscallvar.h:65
#21 sy_invoke (code=208, rval=0xffffab8927d67fb0, uap=0xffffab8927d68000,
l=0xffff811576033800, sy=0xffffffff80a100c0 <sysent+4992>)
at /build/netbsd-local/src_ro/sys/sys/syscallvar.h:94
#22 syscall (frame=0xffffab8927d68000)
at /build/netbsd-local/src_ro/sys/arch/x86/x86/syscall.c:138
#23 0xffffffff802096ad in handle_syscall ()
I rebooted the new kernel again and set breakpoints at three critical
locations
ifmedia_fini
mii_detach
wm_detach
and again tried to reboot. It printed (among other messages) the line
ihphy0 detached
before hitting any breakpoints, and then hit the breakpoint at wm_detach.
This seems to indicate that the shutdown process has already (at least
partially) detached the ihphy0 before calling wm_detach() which will
eventually call ifmedia_fini().
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->thorpej
Responsible-Changed-By: thorpej@NetBSD.org
Responsible-Changed-When: Sat, 28 Mar 2020 18:15:10 +0000
Responsible-Changed-Why:
TAke.
State-Changed-From-To: open->analyzed
State-Changed-By: thorpej@NetBSD.org
State-Changed-When: Sat, 28 Mar 2020 18:15:10 +0000
State-Changed-Why:
Analyzed. This is because PHYs have DVF_DETACH_SHUTDOWN, so the rug is
pulled out from under the MII layer. The fix is to remove DVF_DETACH_SHUTDOWN
from the PHY drivers and let the MII driver manage it itself. The
DVF_DETACH_SHUTDOWN flag is seriously misguided anyway.
From: "Jason R Thorpe" <thorpej@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/55121 CVS commit: src/sys
Date: Sat, 28 Mar 2020 18:37:19 +0000
Module Name: src
Committed By: thorpej
Date: Sat Mar 28 18:37:18 UTC 2020
Modified Files:
src/sys/arch/arm/amlogic: gxlphy.c
src/sys/dev/mii: brgphy.c ihphy.c micphy.c nsphyter.c ukphy.c
Log Message:
Don't set DVF_DETACH_SHUTDOWN. The MII layer wants to manage the lifecycle
of the PHY devices, and if a NIC driver chooses not to detach its PHYs
at shutdown, that's the driver's business.
PR kern/55121.
To generate a diff of this commit:
cvs rdiff -u -r1.3 -r1.4 src/sys/arch/arm/amlogic/gxlphy.c
cvs rdiff -u -r1.88 -r1.89 src/sys/dev/mii/brgphy.c
cvs rdiff -u -r1.17 -r1.18 src/sys/dev/mii/ihphy.c
cvs rdiff -u -r1.13 -r1.14 src/sys/dev/mii/micphy.c
cvs rdiff -u -r1.45 -r1.46 src/sys/dev/mii/nsphyter.c
cvs rdiff -u -r1.53 -r1.54 src/sys/dev/mii/ukphy.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: analyzed->feedback
State-Changed-By: thorpej@NetBSD.org
State-Changed-When: Sat, 28 Mar 2020 18:40:52 +0000
State-Changed-Why:
Please confirm ihphy.c 1.18 fixes the problem.
State-Changed-From-To: feedback->closed
State-Changed-By: pgoyette@NetBSD.org
State-Changed-When: Sat, 28 Mar 2020 21:14:25 +0000
State-Changed-Why:
Fix was committed - problem gone. Thanks for quick turn-around.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.