NetBSD Problem Report #56982

From tsutsui@ceres.dti.ne.jp  Sat Aug 27 20:17:29 2022
Return-Path: <tsutsui@ceres.dti.ne.jp>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 185031A923A
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 27 Aug 2022 20:17:29 +0000 (UTC)
Message-Id: <202208272017.27RKHItq024774@ceres.dti.ne.jp>
Date: Sun, 28 Aug 2022 05:17:18 +0900 (JST)
From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
Reply-To: tsutsui@ceres.dti.ne.jp
To: gnats-bugs@NetBSD.org
Cc: tsutsui@ceres.dti.ne.jp
Subject: mutex error (locking against myself) in wdc(4) NOIRQ case?
X-Send-Pr-Version: 3.95

>Number:         56982
>Category:       kern
>Synopsis:       mutex error (locking against myself) in wdc(4) NOIRQ case?
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Aug 27 20:20:01 +0000 2022
>Closed-Date:    Sat Dec 31 04:27:50 +0000 2022
>Last-Modified:  Sat Dec 31 04:27:50 +0000 2022
>Originator:     Izumi Tsutsui
>Release:        NetBSD 9.3 (patched per PR/56403)
>Organization:
>Environment:
Architecture: m68k, but maybe all related devices
Machine: mac68k, but maybe all related devices
>Description:
As noted in port-mac68k/56973, wdc(4) on NetBSD 9.3 has a problem
around NOIRQ case.

After applied fixes for it from PR/56403, I've got the following
mutex panic:

---
[   1.0000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
[   1.0000000]     2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
[   1.0000000]     2018, 2019, 2020, 2021, 2022
[   1.0000000]     The NetBSD Foundation, Inc.  All rights reserved.
[   1.0000000] Copyright (c) 1982, 1986, 1989, 1991, 1993
[   1.0000000]     The Regents of the University of California.  All rights reserved.

[   1.0000000] NetBSD 9.3 (INSTALL) #1: Sun Aug 28 03:42:18 JST 2022
[   1.0000000] 	tsutsui@mirage:/s/netbsd-9/src/sys/arch/mac68k/compile/obj.mac68k/INSTALL
[   1.0000000] Apple Macintosh Quadra 630  (68040)
[   1.0000000] cpu: delay factor 1059
[   1.0000000] fpu: emulator
[   1.0000000] total memory = 36864 KB
[   1.0000000] avail memory = 30168 KB
[   1.0000000] mrg: 'Quadra630 ROMs' ROM glue, tracing off, debug off, silent traps
[   1.0000000] mrg: I/O map kludge for ROMs that use hardware addresses directly.
[   1.0000000] mainbus0 (root)
[   1.0000000] obio0 at mainbus0
[   1.0000000] esp0 at obio0 addr 0 (quick): address 0x5a8000: NCR53C96, 16MHz, SCSI ID 7
[   1.0000000] scsibus0 at esp0: 8 targets, 8 luns per target
[   1.0000000] wdc0 at obio0 (Quadra/Performa series IDE interface)
[   1.0000000] atabus0 at wdc0 channel 0
[   1.0000000] adb0 at obio0
[   1.0000000] asc0 at obio0: Apple Sound Chip
[   1.0000000] intvid0 at obio0 @ f9001000: Valkyrie video subsystem
[   1.0000000] intvid0: 832 x 624, 256 color
[   1.0000000] macfb0 at intvid0
[   1.0000000] wsdisplay0 at macfb0 (kbdmux ignored)
[   1.0000000] sn0 at obio0: integrated SONIC Ethernet adapter
[   1.0000000] sn0: Ethernet address 08:00:07:9f:07:c6
[   1.0000000] iwm0 at obio0: Apple GCR floppy disk controller
[   1.0000000] iwm: Chip revision not supported (-77)
[   1.0000000] zsc0 at obio0 chip type 0 
[   1.0000000] zsc0 channel 0: d_speed   9600 DCD clk 0 CTS clk 0
[   1.0000000] zstty0 at zsc0 channel 0 (console i/o)
[   1.0000000] zsc0 channel 1: d_speed   9600 DCD clk 0 CTS clk 0
[   1.0000000] zstty1 at zsc0 channel 1
[   1.0000000] nubus0 at mainbus0
[   1.0083991] scsibus0: waiting 2 seconds for devices to settle...
[   1.0853573] adb0 (direct, Cuda): 2 targets
[   1.1437621] aed0 at adb0 addr 0: ADB Event device
[   1.2012555] akbd0 at adb0 addr 2: keyboard II (Japanese layout)
[   1.2729173] wskbd0 at akbd0 (mux ignored)
[   1.3225433] ams0 at adb0 addr 3: 1-button, 100 dpi mouse
[   1.4127240] wsmouse0 at ams0 (mux ignored)
[   3.1517508] sd0 at scsibus0 target 0 lun 0: <Logitec, LHD-U32H/E, 105S> disk fixed
[   3.2517469] sd0: 30533 MB, 62037 cyl, 16 head, 63 sec, 512 bytes/sect x 62533296 sectors
[   3.3685872] sd0: async, 8-bit transfers
[   3.8685298] cd0 at scsibus0 target 3 lun 0: <MATSHITA, CD-ROM CR-8008, 8.0e> cdrom removable
[   3.9736653] cd0: sync (248.00ns offset 15), 8-bit (4.032MB/s) transfers
[   7.1520444] wd0 at atabus0 drive 0
[   7.2187327] wd0: <IBM-DALA-3360>
[   7.2688515] wd0: 348 MB, 929 cyl, 16 head, 48 sec, 512 bytes/sect x 713472 sectors
[   7.2850956] boot device: sd0
[   7.3351658] root on md0a dumps on md0b
[   7.3896797] root file system type: ffs
[   7.4423464] kern.module.path=/stand/mac68k/9.3/modules
[   7.5184557] PRAM time does not appear to have been read correctly.
[   7.5964773] PRAM: 0x83da4f80, macos_boottime: 0xa6f18d0f.
Terminal type? [vt220] 
Erase set to backspace.
erase ^?, werase ^W, kill ^U, intr ^C

 NetBSD/mac68k 9.3

 This menu-driven tool is designed to help you install NetBSD to a hard disk,
 or upgrade an existing NetBSD system, with a minimum of work.
 In the following menus type the reference letter (a, b, c, ...) to select an
 item, or type CTRL+N/CTRL+P to select the next/previous item.
 The arrow keys and Page-up/Page-down may also work.
 Activate the current selection from the menu by typing the enter key.


 Thank you for using NetBSD!

[snip]

 Ok, let's upgrade NetBSD on your hard disk.  As always, this will change
 information on your hard disk.  You should have made a full backup before
 this procedure!  Do you really want to upgrade NetBSD?  (This is your last  
 warning before this procedure starts modifying your disks.)  


                               +---------------+
                               | Yes or no?    |
                               |               |
                               | a: No         |
                               |[  26.1879870] Mutex error: mutex_vector_enter,4
84: locking against myself     +---------------+

[  26.1879870] lock address : 0x0000000000a97148
[  26.1879870] current cpu  :                  0
[  26.1879870] current lwp  : 0x0000000000a2ee20
[  26.1879870] owner field  : 0x0000000000058000 wait/spin:                0/1

[  26.1879870] panic: lock error: Mutex: mutex_vector_enter,484: locking against
 myself: lock 0xa97148 cpu 0 lwp 0xa2ee20
[  26.1879870] cpu0: Begin traceback...
[  26.1879870] ?(?)
[  26.1879870] db_panic(a97148,1f6c14,1e4,4e0054,51758d8) at 0
[  26.1879870] vpanic(1fca5a,51758e4,5175924,131414,1fca5a) + 13a
[  26.1879870] panic(1fca5a,1f6bd0,1f6c14,1e4,1f6b23) + c
[  26.1879870] lockdebug_abort(1f6c14,1e4,a97148,4e0054,1f6b23) + 84
[  26.1879870] mutex_abort(?)
[  26.1879870] eventswitch(1f6c14,1e4,a97148,1f6b23) + 2c
[  26.1879870] mutex_spin_enter(a97148,4e92b8,a99f30,0,3ea16) + 6e
[  26.1879870] wddone(a32688,a99f30,98a0a0,a99f30) + 256
[  26.1879870] wdc_ata_bio_done(?)
[  26.1879870] wdcprobe1(98a098,a99f30,98a0a0) + 62
[  26.1879870] wdc_ata_bio_intr(?)
[  26.1879870] wdcprobe1(98a098,a99f30,0,1,98a098) + 92
[  26.1879870] wdc_ata_bio_poll(98a098,a99f30,0,98a008,1c1094) + 1c
[  26.1879870] ata_xfer_start(a99f30) + 114
[  26.1879870] atastart(?)
[  26.1879870] ata_exec_xfer(98a098,a99f30,a99f30,a97008,5175ae8) + 374
[  26.1879870] wdc_ata_bio(9e5008,a99f30) + 62
[  26.1879870] wdstart1(?)
[  26.1879870] smc91cxx_mii_writereg(a97008,a9cc20,a99f30,a970a0,12b462) + 208
[  26.1879870] wd_diskstart(a32688,a9cc20,a970a0,9ece28,c) + ce
[  26.1879870] dk_start(?)
[  26.1879870] dk_strategy(a97008,a9cc20) + d6
[  26.1879870] wdstrategy(a9cc20,4200,3102,3102,0) + 4c
[  26.1879870] readdisklabel(0,3102,399a8,9c2e08,996158) + 7e
[  26.1879870] dk_getdisklabel(a97008,0,3102) + 7e
[  26.1879870] dk_open(a97008,0,3102,1,2000,a2ee20) + dc
[  26.1879870] wdopen(?)
[  26.1879870] cdev_open(0,3102,1,2000,a2ee20) + 8e
[  26.1879870] spec_open(5175d34) + 118
[  26.1879870] VOP_OPEN(a30bd8,1,9d5ee0) + 46
[  26.1879870] vn_open(5175e64,1,0) + 18e
[  26.1879870] do_open(a2ee20,0,992c68,0,0,5175f00) + 96
[  26.1879870] do_sys_openat(?)
[  26.1879870] fifo_open(a2ee20,ffffff9c,ffff7f64,0,0,5175f00) + ee
[  26.1879870] sys_open(a2ee20,5175f38,5175f30,45,0) + 1e
[  26.1879870] syscall_plain(5,a2ee20,5175fb4,ffff7f64,0) + 82
[  26.1879870] syscall(5) + 70
[  26.1879870] trap0() + e
[  26.1879870] cpu0: End traceback...
Stopped in pid 9.1 (sysinst) at netbsd:cpu_Debugger+0x6:        unlk    a6
db> 

---

>How-To-Repeat:
As above, boot NetBSD/mac68k INSTALL kernel (that has options DIAGONSTIC)
on LC630 and other IDE based 68k Macs.

Maybe other NOIRQ wdc(4) variants are affected?
https://github.com/NetBSD/src/commit/d034868f5308ed012da088cd2a6112443061753a

Note installation works with INSTALL kernel built without options
DIAGNOSTIC. (not sure why mac68k INSTALL keeps DIAGNOSTIC even on releases)

>Fix:
Check mutex around wdc_ata_bio_done() or wddone()?

>Release-Note:

>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: tsutsui@ceres.dti.ne.jp
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/56982: mutex error (locking against myself) in wdc(4) NOIRQ case?
Date: Sat, 27 Aug 2022 22:00:28 +0000

 > [  26.1879870] mutex_spin_enter(a97148,4e92b8,a99f30,0,3ea16) + 6e
 > [  26.1879870] wddone(a32688,a99f30,98a0a0,a99f30) + 256

 Can you get a line number for wddone+0x256?

 $ gdb ./netbsd.gdb
 (gdb) info line *(wddone+0x256)

 Can you also show the full lockdebug info?  It looks like it got
 garbled in the console output.

 db> show lock 0x0000000000a97148

 This should display two lines of particular interest (with different
 addresses, obviously):

 initialized  : 0x0000000000abcdef
 ...
 last locked* : 0x0000000000123456 unlocked : 0x0000000abcd0123

 Can you show line numbers for the `initialized' and `last locked'
 addresses?

 (gdb) info line *(0x0000000000abcdef)
 (gdb) info line *(0x0000000000123456)

From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Cc: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>,
 Taylor R Campbell <riastradh@NetBSD.org>
Subject: Re: kern/56982: mutex error (locking against myself) in wdc(4) NOIRQ
 case?
Date: Sun, 28 Aug 2022 08:20:42 +0900

 wdc(4) with NOIRQ worked on DIGNOSTIC kernel for hpcsh/current a while ago.

 My guess is: This is because wdc(4) for mac68k is too hackish. As tsutsui@
 pointed out in port-mac68k/56973, this driver registers interrupt
 handler, *AND* it also enables polling mode (ATAC_CAP_NOIRQ).

 I don't know what exactly goes on, but I *imagine* that sometime interrupt
 is raised, and sometimes isn't, c.f.:

 http://cvsweb.netbsd.org/bsdweb.cgi/src/sys/arch/mac68k/obio/wdc_obio.c#rev1.1

 MI wdc(4)/ata(4) drivers may get confused, which results in inconsistent
 mutex states in the end.

 The real fix should be to make driver working for interrupt mode. But
 if it is too hard, can we switch to full polling mode instead? (IMO, MI
 wdc(4)/ata(4) drivers should not be more complicated...)

 Thanks,
 rin

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: riastradh@NetBSD.org
Cc: gnats-bugs@NetBSD.org, tsutsui@ceres.dti.ne.jp
Subject: Re: kern/56982: mutex error (locking against myself) in wdc(4) NOIRQ
	 case?
Date: Sun, 28 Aug 2022 20:17:24 +0900

 > > [  26.1879870] mutex_spin_enter(a97148,4e92b8,a99f30,0,3ea16) + 6e
 > > [  26.1879870] wddone(a32688,a99f30,98a0a0,a99f30) + 256
 > 
 > Can you get a line number for wddone+0x256?

 This was INSTALL kernel without LOCKDEBUG,
 so I've tried NetBSD/mac68k 9.3 GENERIC
 + options DIAGNOSTIC, DEBUG, LOCKDEBUG + makeoptions DEBUG="-g"
 + ata NOIRQ fixes:
  https://github.com/NetBSD/src/commit/d034868f5308ed012da088cd2a6112443061753a
 + wd(4) attach verbose message fix:
  https://github.com/NetBSD/src/commit/4a0ce12e9902cb0dfb2f2d8ca9e2d665028704c0
 + sys/arch/m68k/m68k/m68k_trap.c r1.3 for DEBUG builds:
  https://github.com/NetBSD/src/commit/9fa6e849aae6c36d6840abd3ec9e23a528c914f0

 ---
 [   1.0000000] Loaded initial symtab at 0x427c2c, strtab at 0x4854ac, # entries 23929
 [   1.0000000] Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
 [   1.0000000]     2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017,
 [   1.0000000]     2018, 2019, 2020, 2021, 2022
 [   1.0000000]     The NetBSD Foundation, Inc.  All rights reserved.
 [   1.0000000] Copyright (c) 1982, 1986, 1989, 1991, 1993
 [   1.0000000]     The Regents of the University of California.  All rights reserved.

 [   1.0000000] NetBSD 9.3 (DEBUG) #0: Sun Aug 28 19:39:11 JST 2022
 [   1.0000000]  tsutsui@mirage:/s/netbsd-9/src/sys/arch/mac68k/compile/DEBUG
 [   1.0000000] Apple Macintosh Quadra 630  (68040)
  :
 [   1.0000000] wdc0 at obio0 (Quadra/Performa series IDE interface)
 [   1.0000000] atabus0 at wdc0 channel 0
  :
 [   7.8156563] wd0 at atabus0 drive 0
 [   7.8990036] wd0: <IBM-DALA-3360>
 [   7.9393950] wd0: 348 MB, 929 cyl, 16 head, 48 sec, 512 bytes/sect x 713472 sectors
  :
 # disklabel wd0
 [  30.5354939] Mutex error: mutex_vector_enter,477: locking against myself

 [  30.5354939] lock address : 0x0000000000a1b948 type     :               spin
 [  30.5354939] initialized  : 0x0000000000047dac
 [  30.5354939] shared holds :                  0 exclusive:                  1
 [  30.5354939] shares wanted:                  0 exclusive:                  1
 [  30.5354939] current cpu  :                  0 last held:                  0
 [  30.5354939] current lwp  : 0x0000000000a2a600 last held: 0x0000000000a2a600
 [  30.5354939] last locked* : 0x0000000000047592 unlocked : 000000000000000000
 [  30.5354939] owner field  : 0x0000000000058000 wait/spin:                0/1

 [  30.5354939] panic: LOCKDEBUG: Mutex error: mutex_vector_enter,477: locking against myself
 [  30.5354939] cpu0: Begin traceback...
 [  30.5354939] ?(?)
 [  30.5354939] db_panic(ffff2600,600,365283,938580,52358ec) at 0
 [  30.5354939] vpanic(36bec8,52358f8,5235928,22ae62,36bec8) + 162
 [  30.5354939] panic(36bec8,365231,365283,1dd,365163) + c
 [  30.5354939] lockdebug_abort1(?)
 [  30.5354939] vmem_alloc(365283,1dd,938580,ffff2600,365163,1) + b0
 [  30.5354939] lockdebug_wantlock(365283,1dd,a1b948,46938,0) + 11a
 [  30.5354939] mutex_enter(a1b948,3fbbb0,a1df30,0,8d60a0) + 68
 [  30.5354939] wddone(99c688,a1df30,8d60a0,a1df30) + 2c4
 [  30.5354939] wdc_ata_bio_done(?)
 [  30.5354939] ncr53c9x_intr(8d6098,a1df30,8d60a0) + 62
 [  30.5354939] wdc_ata_bio_intr(?)
 [  30.5354939] ncr53c9x_intr(8d6098,a1df30,0,1,a1df30) + 1d4
 [  30.5354939] wdc_ata_bio_poll(8d6098,a1df30,0,1,4caf4) + 1c
 [  30.5354939] ata_xfer_start(a1df30) + 112
 [  30.5354939] atastart(?)
 [  30.5354939] ata_exec_xfer(8d6098,a1df30,a1df30,a1b808,5235b14) + 364
 [  30.5354939] wdc_ata_bio(99a008,a1df30) + 62
 [  30.5354939] wdstart1(?)
 [  30.5354939] ncr53c9x_msgin(a1b808,a20aa0,a1df30,a1b8a0,1f675a) + 212
 [  30.5354939] wd_diskstart(99c688,a20aa0,a1b8a0,963990,c) + c4
 [  30.5354939] dk_start(?)
 [  30.5354939] dk_strategy(a1b808,a20aa0) + ec
 [  30.5354939] wdstrategy(a20aa0,4200,2,c,0) + 52
 [  30.5354939] readdisklabel.part.1(?)
 [  30.5354939] readdisklabel(0,3102,473e0,90fbc8,8e21c8) + 62
 [  30.5354939] dk_getdisklabel(a1b808,0,3102) + 7e
 [  30.5354939] dk_open(a1b808,0,3102,1,2000,a2a600) + ce
 [  30.5354939] wdopen(?)
 [  30.5354939] cdev_open(0,3102,1,2000,a2a600) + 8e
 [  30.5354939] spec_open(5235d4c,377f34,a47dd4,1,921ee0) + 13c
 [  30.5354939] VOP_OPEN(a47dd4,1,921ee0) + 2c
 [  30.5354939] vn_open(5235e68,1,0) + 286
 [  30.5354939] do_open(a2a600,0,8df100,0,0,5235efc) + 92
 [  30.5354939] do_sys_openat(?)
 [  30.5354939] sys_fsetxattr(a2a600,ffffff9c,f104,0,0,5235efc) + e2
 [  30.5354939] sys_open(a2a600,5235f38,5235f30,19,0) + 1e
 [  30.5354939] syscall_plain(5,a2a600,5235fb4,f104,0) + d2
 [  30.5354939] syscall(5) + 70
 [  30.5354939] trap0() + e
 [  30.5354939] cpu0: End traceback...
 Stopped in pid 5.1 (disklabel) at       netbsd:cpu_Debugger+0x6:        unlk
 a6
 db>
 ---

 > $ gdb ./netbsd.gdb
 > (gdb) info line *(wddone+0x256)

 ---
 (gdb) info line *(wddone+0x2c4)
 Line 1017 of "../../../../dev/ata/wd.c" starts at address 0x46934 <wddone+706>
    and ends at 0x46938 <wddone+710>.
 (gdb) 
 ---

 This is here:
  https://nxr.netbsd.org/xref/src/sys/dev/ata/wd.c?r=1.452.2.2#1017
 ---
     862 static void
     863 wddone(device_t self, struct ata_xfer *xfer)
     864 {
      :
    1015 	ata_free_xfer(wd->drvp->chnl_softc, xfer);
    1016 
    1017 	mutex_enter(&wd->sc_lock);
    1018 	wd->inflight--;
    1019 	mutex_exit(&wd->sc_lock);
    1020 	dk_done(dksc, bp);
    1021 	dk_start(dksc, NULL);
    1022 }
 ---

 > Can you also show the full lockdebug info?  It looks like it got
 > garbled in the console output.
 > 
 > db> show lock 0x0000000000a97148

 ---
 db> show lock 0x0000000000a1b948
 lock address : 0x0000000000a1b948 type     :               spin
 initialized  : 0x0000000000047dac
 shared holds :                  0 exclusive:                  1
 shares wanted:                  0 exclusive:                  1
 current cpu  :                  0 last held:                  0
 current lwp  : 0x0000000000a2a600 last held: 0x0000000000a2a600
 last locked* : 0x0000000000047592 unlocked : 000000000000000000
 owner field  : 0x0000000000058000 wait/spin:                0/1
 db>
 ---

 > This should display two lines of particular interest (with different
 > addresses, obviously):
 > 
 > initialized  : 0x0000000000abcdef
 > ...
 > last locked* : 0x0000000000123456 unlocked : 0x0000000abcd0123
 > 
 > Can you show line numbers for the `initialized' and `last locked'
 > addresses?
 > 
 > (gdb) info line *(0x0000000000abcdef)
 > (gdb) info line *(0x0000000000123456)

 >> initialized  : 0x0000000000047dac
 ---
 (gdb) info line *(0x0000000000047dac)
 Line 323 of "../../../../dev/ata/wd.c" starts at address 0x47dac <wdattach+46>
    and ends at 0x47db0 <wdattach+50>.
 ---

 Maybe this one:
  https://nxr.netbsd.org/xref/src/sys/dev/ata/wd.c?r=1.452.2.2#319
 ---
     305 static void
     306 wdattach(device_t parent, device_t self, void *aux)
     307 {
      :
     319 	mutex_init(&wd->sc_lock, MUTEX_DEFAULT, IPL_BIO);
     320 #ifdef WD_SOFTBADSECT
     321 	SLIST_INIT(&wd->sc_bslist);
     322 #endif
     323 	wd->atabus = adev->adev_bustype;
 ---


 >> last locked* : 0x0000000000047592
 ---
 (gdb) info line *(0x0000000000047592)
 Line 789 of "../../../../dev/ata/wd.c"
    starts at address 0x47592 <wd_diskstart+36>
    and ends at 0x4759a <wd_diskstart+44>.
 ---

 Also looks this:
  https://nxr.netbsd.org/xref/src/sys/dev/ata/wd.c?r=1.452.2.2#787
 ---
     775 static int
     776 wd_diskstart(device_t dev, struct buf *bp)
     777 {
     778 	struct wd_softc *wd = device_private(dev);
     779 #ifdef ATADEBUG
     780 	struct dk_softc *dksc = &wd->sc_dksc;
     781 #endif
     782 	struct ata_xfer *xfer;
     783 	struct ata_channel *chp;
     784 	unsigned openings;
     785 	int ticks;
     786 
     787 	mutex_enter(&wd->sc_lock);
     788 
     789 	chp = wd->drvp->chnl_softc;
     790 
     791 	ata_channel_lock(chp);
 ---
 Izumi Tsutsui

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: rokuyama.rk@gmail.com
Cc: gnats-bugs@netbsd.org, riastradh@NetBSD.org, tsutsui@ceres.dti.ne.jp
Subject: Re: kern/56982: mutex error (locking against myself) in wdc(4) NOIRQcase?
Date: Sun, 28 Aug 2022 20:28:46 +0900

 > wdc(4) with NOIRQ worked on DIGNOSTIC kernel for hpcsh/current a while ago.

 Hmm. Maybe I should also check HEAD instead of patched 9.3..

 > My guess is: This is because wdc(4) for mac68k is too hackish. As tsutsui@
 > pointed out in port-mac68k/56973, this driver registers interrupt
 > handler, *AND* it also enables polling mode (ATAC_CAP_NOIRQ).
 > 
 > I don't know what exactly goes on, but I *imagine* that sometime interrupt
 > is raised, and sometimes isn't, c.f.:

 I don't think it's likely. AFAICT the IDE interrupt has never been
 triggered at least on my LC630, and Linux/m68k also removed these
 interrupt register checks:
  https://lore.kernel.org/linux-ide/11a56b3317df3bb2ddc15fd29b40b6820e9c7444.1623287706.git.fthain@linux-m68k.org/
  >> This was tested on my Quadra 630. I haven't tested it on my PowerBook 150
  >> because I don't have a RAM adapter board for it. It appears that the
  >> hardware I tested doesn't need macide_clear_irq() or macide_test_irq().
  >> If it did, the generic driver would not have worked. It's possible that
  >> those routines are needed for the PowerBook 150 but we can cross that
  >> bridge if and when we come to it.

 ---
 Izumi Tsutsui

From: Rin Okuyama <rokuyama.rk@gmail.com>
To: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
Cc: gnats-bugs@netbsd.org, riastradh@NetBSD.org, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/56982: mutex error (locking against myself) in wdc(4)
 NOIRQcase?
Date: Sun, 28 Aug 2022 21:00:03 +0900

 On 2022/08/28 20:28, Izumi Tsutsui wrote:
 >> wdc(4) with NOIRQ worked on DIGNOSTIC kernel for hpcsh/current a while ago.
 > 
 > Hmm. Maybe I should also check HEAD instead of patched 9.3..

 I've checked that -current kernel as of today with DIAGNOSTIC boots
 successfully into multiuser on Jornada 690. (Older) dmesg is:

 https://dmesgd.nycbug.org/index.cgi?do=view&id=6287

 >> My guess is: This is because wdc(4) for mac68k is too hackish. As tsutsui@
 >> pointed out in port-mac68k/56973, this driver registers interrupt
 >> handler, *AND* it also enables polling mode (ATAC_CAP_NOIRQ).
 >>
 >> I don't know what exactly goes on, but I *imagine* that sometime interrupt
 >> is raised, and sometimes isn't, c.f.:
 > 
 > I don't think it's likely. AFAICT the IDE interrupt has never been
 > triggered at least on my LC630, and Linux/m68k also removed these
 > interrupt register checks:
 >   https://lore.kernel.org/linux-ide/11a56b3317df3bb2ddc15fd29b40b6820e9c7444.1623287706.git.fthain@linux-m68k.org/
 >   >> This was tested on my Quadra 630. I haven't tested it on my PowerBook 150
 >   >> because I don't have a RAM adapter board for it. It appears that the
 >   >> hardware I tested doesn't need macide_clear_irq() or macide_test_irq().
 >   >> If it did, the generic driver would not have worked. It's possible that
 >   >> those routines are needed for the PowerBook 150 but we can cross that
 >   >> bridge if and when we come to it.

 Thanks for explanation. It would be useful to have some comments in
 the source code.

 Thanks,
 rin

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: rokuyama.rk@gmail.com
Cc: gnats-bugs@netbsd.org, riastradh@NetBSD.org, tsutsui@ceres.dti.ne.jp
Subject: Re: kern/56982: mutex error (locking against myself) in wdc(4)NOIRQcase?
Date: Sun, 28 Aug 2022 21:55:15 +0900

 > > Hmm. Maybe I should also check HEAD instead of patched 9.3..
 > 
 > I've checked that -current kernel as of today with DIAGNOSTIC boots
 > successfully into multiuser on Jornada 690. (Older) dmesg is:

 It looks HEAD already has a fix for this:
  https://github.com/NetBSD/src/commit/7a4a932319c396d15ac96ce84780fc0e51048edb
  >> drop wd lock in wdstart1() before calling the ata_bio hook; when called
  >> from ata thread context, that can still need to sleep for wdc attachments
  >> in wdcwait()

 I'll close this PR unless someone wants NOIRQ changes to netbsd-9.
 (sorry for a noise)

 ---
 Izumi Tsutsui

From: Taylor R Campbell <riastradh@NetBSD.org>
To: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
Cc: rokuyama.rk@gmail.com, gnats-bugs@netbsd.org, tsutsui@ceres.dti.ne.jp
Subject: Re: kern/56982: mutex error (locking against myself) in wdc(4)NOIRQcase?
Date: Sun, 28 Aug 2022 13:30:02 +0000

 > Date: Sun, 28 Aug 2022 21:55:15 +0900
 > From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
 >=20
 > It looks HEAD already has a fix for this:
 >  https://github.com/NetBSD/src/commit/7a4a932319c396d15ac96ce84780fc0e510=
 48edb
 >  >> drop wd lock in wdstart1() before calling the ata_bio hook; when call=
 ed
 >  >> from ata thread context, that can still need to sleep for wdc attachm=
 ents
 >  >> in wdcwait()
 >=20
 > I'll close this PR unless someone wants NOIRQ changes to netbsd-9.
 > (sorry for a noise)

 Unless the internal ATA API has changed substantially since netbsd-9,
 this is probably worth pulling up as is if the patch applies cleanly,
 if anyone wants to use netbsd-9 on this hardware.

From: Izumi Tsutsui <tsutsui@ceres.dti.ne.jp>
To: riastradh@NetBSD.org
Cc: rokuyama.rk@gmail.com, gnats-bugs@netbsd.org, tsutsui@ceres.dti.ne.jp
Subject: Re: kern/56982: mutex error (locking against myself) in wdc(4)NOIRQcase?
Date: Wed, 31 Aug 2022 00:43:19 +0900

 > Unless the internal ATA API has changed substantially since netbsd-9,
 > this is probably worth pulling up as is if the patch applies cleanly,
 > if anyone wants to use netbsd-9 on this hardware.

 I wonder the if following changes (for NOIRQ cases) are acceptable:
  https://github.com/NetBSD/src/commit/d034868f5308ed012da088cd2a6112443061753a
 because this changed a return value of c_poll() function in ata_xfer_ops.

 ---
 Izumi Tsutsui

State-Changed-From-To: open->closed
State-Changed-By: tsutsui@NetBSD.org
State-Changed-When: Sat, 31 Dec 2022 04:27:50 +0000
State-Changed-Why:
Already fixed in -current and pulled up to netbsd-9 via ticket #1557.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.