NetBSD Problem Report #11811

Received: (qmail 20199 invoked from network); 25 Dec 2000 05:44:17 -0000
Message-Id: <200012250116.eBP1Gp800296@zorkmid.mit.edu>
Date: Sun, 24 Dec 2000 20:16:51 -0500 (EST)
From: John Hawkinson <jhawk@mit.edu>
Reply-To: jhawk@mit.edu
To: gnats-bugs@gnats.netbsd.org
Subject: wddump kernel dumping failure
X-Send-Pr-Version: 3.95

>Number:         11811
>Category:       kern
>Synopsis:       wddump kernel dumping failure
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    jdolecek
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Dec 25 05:45:00 +0000 2000
>Closed-Date:    Sat Jun 16 10:40:20 +0000 2018
>Last-Modified:  Sat Jun 16 10:40:20 +0000 2018
>Originator:     John Hawkinson
>Release:        netbsd-current of 23 Dec 2000
>Organization:
MIT
>Environment:

System: NetBSD zorkmid.mit.edu 1.5O NetBSD 1.5O (ZORKMID-$Revision: 1.5 $) #67: Sat Dec 23 17:45:30 EST 2000 jhawk@zorkmid.mit.edu:/usr/local/netbsd-current/src/sys/arch/i386/compile/ZORKMID i386


>Description:

	I was single-stepping through some UBC code trying to figure out
why a process seemed to be hung (it was an executable under COMPAT_PE,
but I don't think that was really related. It was repeatedly getting
stuck in biowait(), and it appeared that uvm_fault was repeatedly
ubc_fault()-ing and calling genfs_getpages(); nevertheless, this is
probably not too relevent). I accidently single-stepped through a
trap and into apm 16-bit land, and so ddb died.

It then tried to dump core, but seemed to fail with:

dump panic: wddump: polled command has been queued
panic: wdc_exec_command: polled command not done

I'm really not sure I understand. Tracebacks follow.


>How-To-Repeat:

I ran /win98/wavelan/bin/Wsu10604.exe under COMPAT_PECOFF,
not expecting it to work, but just fooling around. My disk
light went solid and it sat there taking up loads of CPU
for no good reason, spinning around in uvm/ubc code.

I single-stepped at the wrong place, and the following
was left over in my message buffer:

uvm_fault(0xc0588e40, 0x5000, 0, 1) -> 1
fatal page fault in supervisor mode
trap type 6 code 0 eip c02f95ae cs 8 eflags 10046 cr2 5d6b cpl e000ffef
panic: trap
Begin traceback...
trap() at trap+0x1e5
--- trap (number 6) ---
db_read_bytes(5d6b,4,c6c42e0c,c0585a00,c6c42e48) at db_read_bytes+0x12
db_get_value(5d6b,4,0,0,c6c42f04) at db_get_value+0x18
db_stop_at_pc(c0585a00,c6c42e48) at db_stop_at_pc+0xee
db_trap(5,0,1,c6c42eb4,c07c5400) at db_trap+0x48
kdb_trap(5,0,c6c42eb4) at kdb_trap+0xc6
trap() at trap+0x168
--- trap (number 5) ---
param.c(b,c6c42f54,c6c42f54,c6c42f40,c03abfaa) at     0x5d6b
apmcall_debug(b,c6c42f54,281,c6c42f70,c03abfe4) at apmcall_debug+0x2d
apm_get_event(c6c42f54) at apm_get_event+0x12
apm_periodic_check(c07c5400,c07c5450,2,0,c07c5400) at apm_periodic_check+0x38
apm_thread(c07c5400) at apm_thread+0x20
End traceback...
syncing disks... 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 6 giving up

dumping to dev 0,1 offset 396196
dump panic: wddump: polled command has been queued
Begin traceback...
wddump(1,5115b8,c6c42b14,200,8081d) at wddump+0x1de
cpu_dump(100,c045791b,100,3,2) at cpu_dump+0x101
dumpsys(c6c42d80,c6c42d74,c01bc105,100,0) at dumpsys+0xed
cpu_reboot(100,0,c6c42db4,0,6) at cpu_reboot+0x63
panic(c045791b,e000ffef,4,5,bfbf9) at panic+0xcd
trap() at trap+0x1e5
--- trap (number 6) ---
db_read_bytes(5d6b,4,c6c42e0c,c0585a00,c6c42e48) at db_read_bytes+0x12
db_get_value(5d6b,4,0,0,c6c42f04) at db_get_value+0x18
db_stop_at_pc(c0585a00,c6c42e48) at db_stop_at_pc+0xee
db_trap(5,0,1,c6c42eb4,c07c5400) at db_trap+0x48
kdb_trap(5,0,c6c42eb4) at kdb_trap+0xc6
trap() at trap+0x168
--- trap (number 5) ---
param.c(b,c6c42f54,c6c42f54,c6c42f40,c03abfaa) at     0x5d6b
apmcall_debug(b,c6c42f54,281,c6c42f70,c03abfe4) at apmcall_debug+0x2d
apm_get_event(c6c42f54) at apm_get_event+0x12
apm_periodic_check(c07c5400,c07c5450,2,0,c07c5400) at apm_periodic_check+0x38
apm_thread(c07c5400) at apm_thread+0x20
End traceback...

dumping to dev 0,1 offset 396196
dump device not ready


panic: wdc_exec_command: polled command not done

Begin traceback...
wdc_exec_command(c07c5cf8,c6c42958) at wdc_exec_command+0xca
wd_flushcache(c07be000,10,c6c42994,c01b2c9d,c07be000) at wd_flushcache+0x4d
wd_shutdown(c07be000) at wd_shutdown+0xd
doshutdownhooks(c6c429c8,c6c429bc,c01bc105,104,0) at doshutdownhooks+0x25
cpu_reboot(104,0,c03159c4,c07be000,1) at cpu_reboot+0x68
panic(c045d460,2,c6c42b4c,c03157d0,1) at panic+0xcd
wddump(1,5115b8,c6c42b14,200,8081d) at wddump+0x1de
cpu_dump(100,c045791b,100,3,2) at cpu_dump+0x101
dumpsys(c6c42d80,c6c42d74,c01bc105,100,0) at dumpsys+0xed
cpu_reboot(100,0,c6c42db4,0,6) at cpu_reboot+0x63
panic(c045791b,e000ffef,4,5,bfbf9) at panic+0xcd
trap() at trap+0x1e5
--- trap (number 6) ---
db_read_bytes(5d6b,4,c6c42e0c,c0585a00,c6c42e48) at db_read_bytes+0x12
db_get_value(5d6b,4,0,0,c6c42f04) at db_get_value+0x18
db_stop_at_pc(c0585a00,c6c42e48) at db_stop_at_pc+0xee
db_trap(5,0,1,c6c42eb4,c07c5400) at db_trap+0x48
kdb_trap(5,0,c6c42eb4) at kdb_trap+0xc6
trap() at trap+0x168
--- trap (number 5) ---
param.c(b,c6c42f54,c6c42f54,c6c42f40,c03abfaa) at     0x5d6b
apmcall_debug(b,c6c42f54,281,c6c42f70,c03abfe4) at apmcall_debug+0x2d
apm_get_event(c6c42f54) at apm_get_event+0x12
apm_periodic_check(c07c5400,c07c5450,2,0,c07c5400) at apm_periodic_check+0x38
apm_thread(c07c5400) at apm_thread+0x20
End traceback...

dumping to dev 0,1 offset 396196
dump device not ready


rebooting...

>Fix:

	Is something wrong with wddump? Is it unreasonable to expect it to
work from a trap in apmcall_debug()?
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->bouyer 
Responsible-Changed-By: bouyer 
Responsible-Changed-When: Mon Apr 7 09:36:21 PDT 2003 
Responsible-Changed-Why:  
I'll see how to improve this 
From: "Jaromir Dolecek" <jdolecek@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/11811 CVS commit: [jdolecek-ncq] src/sys/dev
Date: Fri, 16 Jun 2017 20:40:49 +0000

 Module Name:	src
 Committed By:	jdolecek
 Date:		Fri Jun 16 20:40:49 UTC 2017

 Modified Files:
 	src/sys/dev/ata [jdolecek-ncq]: ata.c ata_wdc.c atavar.h wd.c
 	src/sys/dev/ic [jdolecek-ncq]: ahcisata_core.c mvsata.c siisata.c wdc.c
 	src/sys/dev/scsipi [jdolecek-ncq]: atapi_wdc.c

 Log Message:
 adjust reset channel and dump paths
 - channel reset now always kills active transfer, even on dump path, but
   now doesn't touch the queued waiting transfers; also kill_xfer hook is
   always called, so that HBA can free any private xfer resources and thus
   the dump request has chance to work
 - kill_xfer routines now always call ata_deactivate_xfer(); added KASSERT()s
   to ata_free_xfer() to expect deactivated xfer
 - when called during channel reset before dump, ata_kill_active() drops
   any queued waiting transfers without processing
 - do not (re)queue any transfers in wddone() when dumping
 - kill AT_RST_NOCMD flag

 This should also hopefully fix the 'polled command has been queued' panic
 as reported in:
 PR kern/11811 by John Hawkinson
 PR kern/47041 by Taylor R Campbell
 PR kern/51979 by Martin Husemann

 dump tested working with piixide(4) and ahci(4). mvsata(4) dump times out,
 but otherwise tested working, will be fixed separately. siisata(4) mechanically
 changed and not tested.


 To generate a diff of this commit:
 cvs rdiff -u -r1.132.8.8 -r1.132.8.9 src/sys/dev/ata/ata.c
 cvs rdiff -u -r1.105.6.3 -r1.105.6.4 src/sys/dev/ata/ata_wdc.c
 cvs rdiff -u -r1.92.8.8 -r1.92.8.9 src/sys/dev/ata/atavar.h
 cvs rdiff -u -r1.428.2.15 -r1.428.2.16 src/sys/dev/ata/wd.c
 cvs rdiff -u -r1.57.6.12 -r1.57.6.13 src/sys/dev/ic/ahcisata_core.c
 cvs rdiff -u -r1.35.6.10 -r1.35.6.11 src/sys/dev/ic/mvsata.c
 cvs rdiff -u -r1.30.4.15 -r1.30.4.16 src/sys/dev/ic/siisata.c
 cvs rdiff -u -r1.283.2.4 -r1.283.2.5 src/sys/dev/ic/wdc.c
 cvs rdiff -u -r1.123.4.4 -r1.123.4.5 src/sys/dev/scsipi/atapi_wdc.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Wed, 26 Jul 2017 17:22:28 +0000
State-Changed-Why:
If you're still there and have thoughts on testing this issue at this
point, happy to hear them; otherwise I'll close the PR in a while.


Responsible-Changed-From-To: bouyer->jdolecek
Responsible-Changed-By: jdolecek@NetBSD.org
Responsible-Changed-When: Sat, 07 Oct 2017 17:46:47 +0000
Responsible-Changed-Why:
Possibly fixed on -current with NCQ merge. Can you retest?


State-Changed-From-To: feedback->closed
State-Changed-By: jdolecek@NetBSD.org
State-Changed-When: Sat, 16 Jun 2018 10:40:20 +0000
State-Changed-Why:
This should have been fixed with NCQ merge (and associated fixes).
Feedback timeout. Thanks for report.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.