NetBSD Problem Report #11811
Received: (qmail 20199 invoked from network); 25 Dec 2000 05:44:17 -0000
Message-Id: <200012250116.eBP1Gp800296@zorkmid.mit.edu>
Date: Sun, 24 Dec 2000 20:16:51 -0500 (EST)
From: John Hawkinson <jhawk@mit.edu>
Reply-To: jhawk@mit.edu
To: gnats-bugs@gnats.netbsd.org
Subject: wddump kernel dumping failure
X-Send-Pr-Version: 3.95
>Number: 11811
>Category: kern
>Synopsis: wddump kernel dumping failure
>Confidential: no
>Severity: serious
>Priority: low
>Responsible: jdolecek
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Dec 25 05:45:00 +0000 2000
>Closed-Date: Sat Jun 16 10:40:20 +0000 2018
>Last-Modified: Sat Jun 16 10:40:20 +0000 2018
>Originator: John Hawkinson
>Release: netbsd-current of 23 Dec 2000
>Organization:
MIT
>Environment:
System: NetBSD zorkmid.mit.edu 1.5O NetBSD 1.5O (ZORKMID-$Revision: 1.5 $) #67: Sat Dec 23 17:45:30 EST 2000 jhawk@zorkmid.mit.edu:/usr/local/netbsd-current/src/sys/arch/i386/compile/ZORKMID i386
>Description:
I was single-stepping through some UBC code trying to figure out
why a process seemed to be hung (it was an executable under COMPAT_PE,
but I don't think that was really related. It was repeatedly getting
stuck in biowait(), and it appeared that uvm_fault was repeatedly
ubc_fault()-ing and calling genfs_getpages(); nevertheless, this is
probably not too relevent). I accidently single-stepped through a
trap and into apm 16-bit land, and so ddb died.
It then tried to dump core, but seemed to fail with:
dump panic: wddump: polled command has been queued
panic: wdc_exec_command: polled command not done
I'm really not sure I understand. Tracebacks follow.
>How-To-Repeat:
I ran /win98/wavelan/bin/Wsu10604.exe under COMPAT_PECOFF,
not expecting it to work, but just fooling around. My disk
light went solid and it sat there taking up loads of CPU
for no good reason, spinning around in uvm/ubc code.
I single-stepped at the wrong place, and the following
was left over in my message buffer:
uvm_fault(0xc0588e40, 0x5000, 0, 1) -> 1
fatal page fault in supervisor mode
trap type 6 code 0 eip c02f95ae cs 8 eflags 10046 cr2 5d6b cpl e000ffef
panic: trap
Begin traceback...
trap() at trap+0x1e5
--- trap (number 6) ---
db_read_bytes(5d6b,4,c6c42e0c,c0585a00,c6c42e48) at db_read_bytes+0x12
db_get_value(5d6b,4,0,0,c6c42f04) at db_get_value+0x18
db_stop_at_pc(c0585a00,c6c42e48) at db_stop_at_pc+0xee
db_trap(5,0,1,c6c42eb4,c07c5400) at db_trap+0x48
kdb_trap(5,0,c6c42eb4) at kdb_trap+0xc6
trap() at trap+0x168
--- trap (number 5) ---
param.c(b,c6c42f54,c6c42f54,c6c42f40,c03abfaa) at 0x5d6b
apmcall_debug(b,c6c42f54,281,c6c42f70,c03abfe4) at apmcall_debug+0x2d
apm_get_event(c6c42f54) at apm_get_event+0x12
apm_periodic_check(c07c5400,c07c5450,2,0,c07c5400) at apm_periodic_check+0x38
apm_thread(c07c5400) at apm_thread+0x20
End traceback...
syncing disks... 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 giving up
dumping to dev 0,1 offset 396196
dump panic: wddump: polled command has been queued
Begin traceback...
wddump(1,5115b8,c6c42b14,200,8081d) at wddump+0x1de
cpu_dump(100,c045791b,100,3,2) at cpu_dump+0x101
dumpsys(c6c42d80,c6c42d74,c01bc105,100,0) at dumpsys+0xed
cpu_reboot(100,0,c6c42db4,0,6) at cpu_reboot+0x63
panic(c045791b,e000ffef,4,5,bfbf9) at panic+0xcd
trap() at trap+0x1e5
--- trap (number 6) ---
db_read_bytes(5d6b,4,c6c42e0c,c0585a00,c6c42e48) at db_read_bytes+0x12
db_get_value(5d6b,4,0,0,c6c42f04) at db_get_value+0x18
db_stop_at_pc(c0585a00,c6c42e48) at db_stop_at_pc+0xee
db_trap(5,0,1,c6c42eb4,c07c5400) at db_trap+0x48
kdb_trap(5,0,c6c42eb4) at kdb_trap+0xc6
trap() at trap+0x168
--- trap (number 5) ---
param.c(b,c6c42f54,c6c42f54,c6c42f40,c03abfaa) at 0x5d6b
apmcall_debug(b,c6c42f54,281,c6c42f70,c03abfe4) at apmcall_debug+0x2d
apm_get_event(c6c42f54) at apm_get_event+0x12
apm_periodic_check(c07c5400,c07c5450,2,0,c07c5400) at apm_periodic_check+0x38
apm_thread(c07c5400) at apm_thread+0x20
End traceback...
dumping to dev 0,1 offset 396196
dump device not ready
panic: wdc_exec_command: polled command not done
Begin traceback...
wdc_exec_command(c07c5cf8,c6c42958) at wdc_exec_command+0xca
wd_flushcache(c07be000,10,c6c42994,c01b2c9d,c07be000) at wd_flushcache+0x4d
wd_shutdown(c07be000) at wd_shutdown+0xd
doshutdownhooks(c6c429c8,c6c429bc,c01bc105,104,0) at doshutdownhooks+0x25
cpu_reboot(104,0,c03159c4,c07be000,1) at cpu_reboot+0x68
panic(c045d460,2,c6c42b4c,c03157d0,1) at panic+0xcd
wddump(1,5115b8,c6c42b14,200,8081d) at wddump+0x1de
cpu_dump(100,c045791b,100,3,2) at cpu_dump+0x101
dumpsys(c6c42d80,c6c42d74,c01bc105,100,0) at dumpsys+0xed
cpu_reboot(100,0,c6c42db4,0,6) at cpu_reboot+0x63
panic(c045791b,e000ffef,4,5,bfbf9) at panic+0xcd
trap() at trap+0x1e5
--- trap (number 6) ---
db_read_bytes(5d6b,4,c6c42e0c,c0585a00,c6c42e48) at db_read_bytes+0x12
db_get_value(5d6b,4,0,0,c6c42f04) at db_get_value+0x18
db_stop_at_pc(c0585a00,c6c42e48) at db_stop_at_pc+0xee
db_trap(5,0,1,c6c42eb4,c07c5400) at db_trap+0x48
kdb_trap(5,0,c6c42eb4) at kdb_trap+0xc6
trap() at trap+0x168
--- trap (number 5) ---
param.c(b,c6c42f54,c6c42f54,c6c42f40,c03abfaa) at 0x5d6b
apmcall_debug(b,c6c42f54,281,c6c42f70,c03abfe4) at apmcall_debug+0x2d
apm_get_event(c6c42f54) at apm_get_event+0x12
apm_periodic_check(c07c5400,c07c5450,2,0,c07c5400) at apm_periodic_check+0x38
apm_thread(c07c5400) at apm_thread+0x20
End traceback...
dumping to dev 0,1 offset 396196
dump device not ready
rebooting...
>Fix:
Is something wrong with wddump? Is it unreasonable to expect it to
work from a trap in apmcall_debug()?
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->bouyer
Responsible-Changed-By: bouyer
Responsible-Changed-When: Mon Apr 7 09:36:21 PDT 2003
Responsible-Changed-Why:
I'll see how to improve this
From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/11811 CVS commit: [jdolecek-ncq] src/sys/dev
Date: Fri, 16 Jun 2017 20:40:49 +0000
Module Name: src
Committed By: jdolecek
Date: Fri Jun 16 20:40:49 UTC 2017
Modified Files:
src/sys/dev/ata [jdolecek-ncq]: ata.c ata_wdc.c atavar.h wd.c
src/sys/dev/ic [jdolecek-ncq]: ahcisata_core.c mvsata.c siisata.c wdc.c
src/sys/dev/scsipi [jdolecek-ncq]: atapi_wdc.c
Log Message:
adjust reset channel and dump paths
- channel reset now always kills active transfer, even on dump path, but
now doesn't touch the queued waiting transfers; also kill_xfer hook is
always called, so that HBA can free any private xfer resources and thus
the dump request has chance to work
- kill_xfer routines now always call ata_deactivate_xfer(); added KASSERT()s
to ata_free_xfer() to expect deactivated xfer
- when called during channel reset before dump, ata_kill_active() drops
any queued waiting transfers without processing
- do not (re)queue any transfers in wddone() when dumping
- kill AT_RST_NOCMD flag
This should also hopefully fix the 'polled command has been queued' panic
as reported in:
PR kern/11811 by John Hawkinson
PR kern/47041 by Taylor R Campbell
PR kern/51979 by Martin Husemann
dump tested working with piixide(4) and ahci(4). mvsata(4) dump times out,
but otherwise tested working, will be fixed separately. siisata(4) mechanically
changed and not tested.
To generate a diff of this commit:
cvs rdiff -u -r1.132.8.8 -r1.132.8.9 src/sys/dev/ata/ata.c
cvs rdiff -u -r1.105.6.3 -r1.105.6.4 src/sys/dev/ata/ata_wdc.c
cvs rdiff -u -r1.92.8.8 -r1.92.8.9 src/sys/dev/ata/atavar.h
cvs rdiff -u -r1.428.2.15 -r1.428.2.16 src/sys/dev/ata/wd.c
cvs rdiff -u -r1.57.6.12 -r1.57.6.13 src/sys/dev/ic/ahcisata_core.c
cvs rdiff -u -r1.35.6.10 -r1.35.6.11 src/sys/dev/ic/mvsata.c
cvs rdiff -u -r1.30.4.15 -r1.30.4.16 src/sys/dev/ic/siisata.c
cvs rdiff -u -r1.283.2.4 -r1.283.2.5 src/sys/dev/ic/wdc.c
cvs rdiff -u -r1.123.4.4 -r1.123.4.5 src/sys/dev/scsipi/atapi_wdc.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Wed, 26 Jul 2017 17:22:28 +0000
State-Changed-Why:
If you're still there and have thoughts on testing this issue at this
point, happy to hear them; otherwise I'll close the PR in a while.
Responsible-Changed-From-To: bouyer->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Sat, 07 Oct 2017 17:46:47 +0000
Responsible-Changed-Why:
Possibly fixed on -current with NCQ merge. Can you retest?
State-Changed-From-To: feedback->closed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Sat, 16 Jun 2018 10:40:20 +0000
State-Changed-Why:
This should have been fixed with NCQ merge (and associated fixes).
Feedback timeout. Thanks for report.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.