NetBSD Problem Report #56737

From www@netbsd.org  Thu Mar  3 02:26:16 2022
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 4B38E1A9239
	for <gnats-bugs@gnats.NetBSD.org>; Thu,  3 Mar 2022 02:26:16 +0000 (UTC)
Message-Id: <20220303022614.D65C31A923A@mollari.NetBSD.org>
Date: Thu,  3 Mar 2022 02:26:14 +0000 (UTC)
From: RNESTOR@MAC.COM
Reply-To: RNESTOR@MAC.COM
To: gnats-bugs@NetBSD.org
Subject: WDCTL_RST errors in 9.99.92 and 9.99.93
X-Send-Pr-Version: www-1.0

>Number:         56737
>Category:       kern
>Synopsis:       WDCTL_RST errors in 9.99.92 and 9.99.93
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Mar 03 02:30:00 +0000 2022
>Closed-Date:    Fri Jan 26 07:08:43 +0000 2024
>Last-Modified:  Fri Jan 26 07:08:43 +0000 2024
>Originator:     Bob Nestor
>Release:        -current, both 9.99.92 and 9.99.93
>Organization:
>Environment:
NetBSD testbed.home.net 9.99.93 NetBSD 9.99.93 (GENERIC) #0: Wed Mar  2 12:48:09 CST 2022  root@testbed.home.net:/usr/src/sys/arch/amd64/compile/GENERIC amd64

>Description:
The recent discussion in Oct/Nov of 2021 on the delay used in AHCISTA_EXTRA_DELAY is still an issue with some systems, especially my HP 6200 MT.  Without the option in a kernel, which I believe is now the default since 9.99.90, the system fails to boot because it usually can't identify the boot disk volume.  Inserting the option into a kernel build for both 9.99.92 and 9.99.93 appears to solve the problem.  Not sure if there was a smaller delay added as part of the resolution to the original problem, but if so the delay is still insufficient to allow the HP 6200 to boot reliably.
>How-To-Repeat:
Try booting a GENERIC distribution kernel on an HP 6200 off a SATA disk (either spinning rust or SSD), and most times the boot will fail reporting an "IDENTIFY failed" on the boot volume.  
>Fix:
Add AHCISATA_EXTRA_DELAY option back into the GENERIC build.

>Release-Note:

>Audit-Trail:
From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, RNESTOR@MAC.COM
Cc: 
Subject: Re: kern/56737: WDCTL_RST errors in 9.99.92 and 9.99.93
Date: Sat, 30 Apr 2022 06:26:29 +0900

 Can you share full dmesg for HP 6200 MT?

 Then, I will add your device to the quirk list.

 Thanks,
 rin

From: Robert Nestor <rnestor@mac.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56737: WDCTL_RST errors in 9.99.92 and 9.99.93
Date: Fri, 29 Apr 2022 17:03:51 -0500

 I can but I=E2=80=99m no longer sure this is the real problem. I=E2=80=99m s=
 eeing the same issue with 9.99.96 and adding the delays didn=E2=80=99t solve=
  the problem. I need to get a console log of the errors from a recent boot b=
 ut need a special serial cable to do that that I have on order. I=E2=80=99ll=
  post the data when I can get it

 Sent from my iPhone

 > On Apr 29, 2022, at 4:30 PM, Rin Okuyama <rokuyama.rk@gmail.com> wrote:
 >=20
 > =EF=BB=BFThe following reply was made to PR kern/56737; it has been noted b=
 y GNATS.
 >=20
 > From: Rin Okuyama <rokuyama.rk@gmail.com>
 > To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
 > gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, RNESTOR@MAC.COM
 > Cc:=20
 > Subject: Re: kern/56737: WDCTL_RST errors in 9.99.92 and 9.99.93
 > Date: Sat, 30 Apr 2022 06:26:29 +0900
 >=20
 > Can you share full dmesg for HP 6200 MT?
 >=20
 > Then, I will add your device to the quirk list.
 >=20
 > Thanks,
 > rin
 >=20

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56737: WDCTL_RST errors in 9.99.92 and 9.99.93
Date: Thu, 5 Jan 2023 14:49:07 +0100

 Could you try a kernel with the patch below?

 Martin


 Index: ahcisata_pci.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/pci/ahcisata_pci.c,v
 retrieving revision 1.68
 diff -u -p -r1.68 ahcisata_pci.c
 --- ahcisata_pci.c	12 Oct 2022 12:50:02 -0000	1.68
 +++ ahcisata_pci.c	5 Jan 2023 13:46:28 -0000
 @@ -206,6 +206,8 @@ static const struct ahci_pci_quirk ahci_
  	    AHCI_QUIRK_BADPMP },

      /* extra delay */
 +	{ PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_6SERIES_AHCI_1,
 +	    AHCI_QUIRK_EXTRA_DELAY },
  	{ PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_C600_AHCI,
  	    AHCI_QUIRK_EXTRA_DELAY },
  	{ PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_7SER_MO_SATA_AHCI,

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
        RNESTOR@MAC.COM
Subject: Re: kern/56737: WDCTL_RST errors in 9.99.92 and 9.99.93
Date: Thu, 5 Jan 2023 15:05:13 +0100

 On Thu, Jan 05, 2023 at 01:50:01PM +0000, Martin Husemann wrote:
 > The following reply was made to PR kern/56737; it has been noted by GNATS.
 > 
 > From: Martin Husemann <martin@duskware.de>
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: kern/56737: WDCTL_RST errors in 9.99.92 and 9.99.93
 > Date: Thu, 5 Jan 2023 14:49:07 +0100
 > 
 >  Could you try a kernel with the patch below?
 >  
 >  Martin
 >  
 >  
 >  Index: ahcisata_pci.c
 >  ===================================================================
 >  RCS file: /cvsroot/src/sys/dev/pci/ahcisata_pci.c,v
 >  retrieving revision 1.68
 >  diff -u -p -r1.68 ahcisata_pci.c
 >  --- ahcisata_pci.c	12 Oct 2022 12:50:02 -0000	1.68
 >  +++ ahcisata_pci.c	5 Jan 2023 13:46:28 -0000
 >  @@ -206,6 +206,8 @@ static const struct ahci_pci_quirk ahci_
 >   	    AHCI_QUIRK_BADPMP },
 >   
 >       /* extra delay */
 >  +	{ PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_6SERIES_AHCI_1,
 >  +	    AHCI_QUIRK_EXTRA_DELAY },
 >   	{ PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_C600_AHCI,
 >   	    AHCI_QUIRK_EXTRA_DELAY },
 >   	{ PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_7SER_MO_SATA_AHCI,

 Actually I wonder it these quirks (AHCI_QUIRK_EXTRA_DELAY) is right.
 The extra delay may depend on the drive connected to the adapter,
 not the adapter itself.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Martin Husemann <martin@duskware.de>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/56737: WDCTL_RST errors in 9.99.92 and 9.99.93
Date: Thu, 5 Jan 2023 15:28:22 +0100

 On Thu, Jan 05, 2023 at 03:05:13PM +0100, Manuel Bouyer wrote:
 > Actually I wonder it these quirks (AHCI_QUIRK_EXTRA_DELAY) is right.
 > The extra delay may depend on the drive connected to the adapter,
 > not the adapter itself.

 If so that is bad - the missing delay causes IDENTIFY failures.

 We could enable it automatically when the first reset fails after boot,
 or just always do the extra delays everywhere (and maybe have some hardware
 opt out instead).

 Martin

From: Robert Nestor <rnestor@mac.com>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: gnats-bugs@netbsd.org,
 kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org
Subject: Re: kern/56737: WDCTL_RST errors in 9.99.92 and 9.99.93
Date: Thu, 5 Jan 2023 09:33:13 -0600

 On Jan 5, 2023, at 8:05 AM, Manuel Bouyer <bouyer@antioche.eu.org> =
 wrote:

 > On Thu, Jan 05, 2023 at 01:50:01PM +0000, Martin Husemann wrote:
 >> The following reply was made to PR kern/56737; it has been noted by =
 GNATS.
 >>=20
 >> From: Martin Husemann <martin@duskware.de>
 >> To: gnats-bugs@netbsd.org
 >> Cc:=20
 >> Subject: Re: kern/56737: WDCTL_RST errors in 9.99.92 and 9.99.93
 >> Date: Thu, 5 Jan 2023 14:49:07 +0100
 >>=20
 >> Could you try a kernel with the patch below?
 >>=20
 >> Martin
 >>=20
 >>=20
 >> Index: ahcisata_pci.c
 >> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
 >> RCS file: /cvsroot/src/sys/dev/pci/ahcisata_pci.c,v
 >> retrieving revision 1.68
 >> diff -u -p -r1.68 ahcisata_pci.c
 >> --- ahcisata_pci.c	12 Oct 2022 12:50:02 -0000	1.68
 >> +++ ahcisata_pci.c	5 Jan 2023 13:46:28 -0000
 >> @@ -206,6 +206,8 @@ static const struct ahci_pci_quirk ahci_
 >>  	    AHCI_QUIRK_BADPMP },
 >>=20
 >>      /* extra delay */
 >> +	{ PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_6SERIES_AHCI_1,
 >> +	    AHCI_QUIRK_EXTRA_DELAY },
 >>  	{ PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_C600_AHCI,
 >>  	    AHCI_QUIRK_EXTRA_DELAY },
 >>  	{ PCI_VENDOR_INTEL, PCI_PRODUCT_INTEL_7SER_MO_SATA_AHCI,
 >=20
 > Actually I wonder it these quirks (AHCI_QUIRK_EXTRA_DELAY) is right.
 > The extra delay may depend on the drive connected to the adapter,
 > not the adapter itself.
 >=20
 > --=20
 > Manuel Bouyer <bouyer@antioche.eu.org>
 >     NetBSD: 26 ans d'experience feront toujours la difference

 In my case it appears that the problem is drive related, not controller =
 related.  Slow drives fail to boot while faster drives succeed using the =
 same controller.  The good news though is that with Martin=92s patch =
 installed I was able to successfully boot one of my slower drives a =
 dozen times without a single failure.  I used a mix of BIOS and UEFI =
 boot to when testing the patch.

 Thanks to Martin for the quick patch/fix!
 -bob

From: Martin Husemann <martin@duskware.de>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/56737: WDCTL_RST errors in 9.99.92 and 9.99.93
Date: Thu, 5 Jan 2023 16:46:36 +0100

 Manuel,

 I wonder if we should just make the two instances of AHCISATA_DO_EXTRA_DELAY
 in ahci_probe_drive() non-optional and see if that helps some of the affected
 machines - I have only ever seen the issue after a cold poweron (but I am
 not sure if channels are ever stopped or reset unless errors happen).

 Martin

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Martin Husemann <martin@duskware.de>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/56737: WDCTL_RST errors in 9.99.92 and 9.99.93
Date: Thu, 5 Jan 2023 17:11:25 +0100

 On Thu, Jan 05, 2023 at 04:46:36PM +0100, Martin Husemann wrote:
 > Manuel,
 > 
 > I wonder if we should just make the two instances of AHCISATA_DO_EXTRA_DELAY
 > in ahci_probe_drive() non-optional and see if that helps some of the affected
 > machines - I have only ever seen the issue after a cold poweron (but I am
 > not sure if channels are ever stopped or reset unless errors happen).

 Yes, I think we should make some of these delays non-optional again
 I don't have a setup where these delays are needed handy, so I can't really
 test this.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Martin Husemann <martin@duskware.de>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/56737: WDCTL_RST errors in 9.99.92 and 9.99.93
Date: Thu, 5 Jan 2023 17:45:32 +0100

 On Thu, Jan 05, 2023 at 05:11:25PM +0100, Manuel Bouyer wrote:
 > Yes, I think we should make some of these delays non-optional again
 > I don't have a setup where these delays are needed handy, so I can't really
 > test this.

 I have one where it happens but only like once a week or so, not quite good
 enough for testing this.

 Bob, can you test it quicker?
 If so I'll come up with another patch to try.

 Martin

From: Robert Nestor <rnestor@mac.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org
Subject: Re: kern/56737: WDCTL_RST errors in 9.99.92 and 9.99.93
Date: Thu, 5 Jan 2023 11:26:14 -0600

 On Jan 5, 2023, at 10:50 AM, Martin Husemann <martin@duskware.de> wrote:

 > The following reply was made to PR kern/56737; it has been noted by =
 GNATS.
 >=20
 > From: Martin Husemann <martin@duskware.de>
 > To: Manuel Bouyer <bouyer@antioche.eu.org>
 > Cc: gnats-bugs@netbsd.org
 > Subject: Re: kern/56737: WDCTL_RST errors in 9.99.92 and 9.99.93
 > Date: Thu, 5 Jan 2023 17:45:32 +0100
 >=20
 > On Thu, Jan 05, 2023 at 05:11:25PM +0100, Manuel Bouyer wrote:
 >> Yes, I think we should make some of these delays non-optional again
 >> I don't have a setup where these delays are needed handy, so I can't =
 really
 >> test this.
 >=20
 > I have one where it happens but only like once a week or so, not quite =
 good
 > enough for testing this.
 >=20
 > Bob, can you test it quicker?
 > If so I'll come up with another patch to try.
 >=20
 > Martin

 If it just involves building a new kernel and testing with the HW on =
 hand then, yes I can test with the slow drives I have.  That cycle =
 usually only takes me an hour or so to do.  Just let me know what =
 patches you=92d like me to apply and test with.

 -bob=

From: Martin Husemann <martin@duskware.de>
To: Robert Nestor <rnestor@mac.com>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/56737: WDCTL_RST errors in 9.99.92 and 9.99.93
Date: Fri, 6 Jan 2023 11:09:02 +0100

 On Thu, Jan 05, 2023 at 11:26:14AM -0600, Robert Nestor wrote:
 > If it just involves building a new kernel and testing with the HW on
 > hand then, yes I can test with the slow drives I have.  That cycle
 > usually only takes me an hour or so to do.  Just let me know what
 > patches you?d like me to apply and test with.

 OK, as a first experiment: could you please remove the quirks patch
 and try with only the patch below?

 Thanks,

 Martin

 Index: ahcisata_core.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/ic/ahcisata_core.c,v
 retrieving revision 1.107
 diff -u -p -r1.107 ahcisata_core.c
 --- ahcisata_core.c	1 Aug 2022 07:37:18 -0000	1.107
 +++ ahcisata_core.c	6 Jan 2023 10:07:15 -0000
 @@ -1079,7 +1079,7 @@ ahci_probe_drive(struct ata_channel *chp
  	switch (sata_reset_interface(chp, sc->sc_ahcit, achp->ahcic_scontrol,
  	    achp->ahcic_sstatus, AT_WAIT)) {
  	case SStatus_DET_DEV:
 -		AHCISATA_DO_EXTRA_DELAY(sc, chp, "ahcidv", AT_WAIT);
 +		ata_delay(chp, AHCISATA_EXTRA_DELAY_MS, "ahcidv", AT_WAIT);

  		/* Initial value, used in case the soft reset fails */
  		sig = AHCI_READ(sc, AHCI_P_SIG(chp->ch_channel));
 @@ -1119,10 +1119,10 @@ ahci_probe_drive(struct ata_channel *chp
  		    AHCI_P_IX_OFS | AHCI_P_IX_DPS | AHCI_P_IX_UFS |
  		    AHCI_P_IX_PSS | AHCI_P_IX_DHRS | AHCI_P_IX_SDBS);
  		/*
 -		 * optionally, wait AHCISATA_EXTRA_DELAY_MS msec before
 +		 * wait AHCISATA_EXTRA_DELAY_MS msec before
  		 * actually starting operations
  		 */
 -		AHCISATA_DO_EXTRA_DELAY(sc, chp, "ahciprb", AT_WAIT);
 +		ata_delay(chp, AHCISATA_EXTRA_DELAY_MS, "ahciprb", AT_WAIT);
  		break;

  	default:

From: Robert Nestor <rnestor@mac.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org
Subject: Re: kern/56737: WDCTL_RST errors in 9.99.92 and 9.99.93
Date: Fri, 6 Jan 2023 08:01:38 -0600

 On Jan 6, 2023, at 4:10 AM, Martin Husemann <martin@duskware.de> wrote:

 > The following reply was made to PR kern/56737; it has been noted by =
 GNATS.
 >=20
 > From: Martin Husemann <martin@duskware.de>
 > To: Robert Nestor <rnestor@mac.com>
 > Cc: gnats-bugs@netbsd.org
 > Subject: Re: kern/56737: WDCTL_RST errors in 9.99.92 and 9.99.93
 > Date: Fri, 6 Jan 2023 11:09:02 +0100
 >=20
 > On Thu, Jan 05, 2023 at 11:26:14AM -0600, Robert Nestor wrote:
 >> If it just involves building a new kernel and testing with the HW on
 >> hand then, yes I can test with the slow drives I have.  That cycle
 >> usually only takes me an hour or so to do.  Just let me know what
 >> patches you?d like me to apply and test with.
 >=20
 > OK, as a first experiment: could you please remove the quirks patch
 > and try with only the patch below?
 >=20
 > Thanks,
 >=20
 > Martin
 >=20
 > Index: ahcisata_core.c
 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
 > RCS file: /cvsroot/src/sys/dev/ic/ahcisata_core.c,v
 > retrieving revision 1.107
 > diff -u -p -r1.107 ahcisata_core.c
 > --- ahcisata_core.c	1 Aug 2022 07:37:18 -0000	1.107
 > +++ ahcisata_core.c	6 Jan 2023 10:07:15 -0000
 > @@ -1079,7 +1079,7 @@ ahci_probe_drive(struct ata_channel *chp
 >  	switch (sata_reset_interface(chp, sc->sc_ahcit, =
 achp->ahcic_scontrol,
 >  	    achp->ahcic_sstatus, AT_WAIT)) {
 >  	case SStatus_DET_DEV:
 > -		AHCISATA_DO_EXTRA_DELAY(sc, chp, "ahcidv", AT_WAIT);
 > +		ata_delay(chp, AHCISATA_EXTRA_DELAY_MS, "ahcidv", =
 AT_WAIT);
 >=20
 >  		/* Initial value, used in case the soft reset fails */
 >  		sig =3D AHCI_READ(sc, AHCI_P_SIG(chp->ch_channel));
 > @@ -1119,10 +1119,10 @@ ahci_probe_drive(struct ata_channel *chp
 >  		    AHCI_P_IX_OFS | AHCI_P_IX_DPS | AHCI_P_IX_UFS |
 >  		    AHCI_P_IX_PSS | AHCI_P_IX_DHRS | AHCI_P_IX_SDBS);
 >  		/*
 > -		 * optionally, wait AHCISATA_EXTRA_DELAY_MS msec before
 > +		 * wait AHCISATA_EXTRA_DELAY_MS msec before
 >  		 * actually starting operations
 >  		 */
 > -		AHCISATA_DO_EXTRA_DELAY(sc, chp, "ahciprb", AT_WAIT);
 > +		ata_delay(chp, AHCISATA_EXTRA_DELAY_MS, "ahciprb", =
 AT_WAIT);
 >  		break;
 >=20
 >  	default:
 >=20

 I removed the previous patch you sent me to the ahcisata_pci.c file, =
 installed this one, rebuilt the kernel and installed it on one of my =
 slow disks that has previously exhibited the boot failures. (Verified =
 that with the GENERIC distribution kernel on this disk I was still =
 seeing boot failures.)   I then did a dozen boots of this new kernel off =
 that disk alternating between using UEFI and BIOS for booting.  Didn=92t =
 see a single failure in booting, but did see one kernel crash on a =
 =93shutdown -r now=94 between boots.  Unfortunately it didn=92t take a =
 dump but it seemed to be near where the boot disk was being detached.

 -bob=

From: Martin Husemann <martin@duskware.de>
To: Robert Nestor <rnestor@mac.com>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/56737: WDCTL_RST errors in 9.99.92 and 9.99.93
Date: Fri, 6 Jan 2023 16:33:04 +0100

 On Fri, Jan 06, 2023 at 08:01:38AM -0600, Robert Nestor wrote:

 > Didn?t see a single failure in booting, but did see one kernel crash on
 > a ?shutdown -r now? between boots.  Unfortunately it didn?t take a dump
 > but it seemed to be near where the boot disk was being detached.

 That sounds a bit like one of the other two extra delays - I wonder if we
 should just make them all non-optional. Manuel, what do you think?

 Martin

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
        RNESTOR@MAC.COM
Subject: Re: kern/56737: WDCTL_RST errors in 9.99.92 and 9.99.93
Date: Fri, 6 Jan 2023 18:30:15 +0100

 On Fri, Jan 06, 2023 at 03:35:01PM +0000, Martin Husemann wrote:
 > The following reply was made to PR kern/56737; it has been noted by GNATS.
 > 
 > From: Martin Husemann <martin@duskware.de>
 > To: Robert Nestor <rnestor@mac.com>
 > Cc: gnats-bugs@netbsd.org
 > Subject: Re: kern/56737: WDCTL_RST errors in 9.99.92 and 9.99.93
 > Date: Fri, 6 Jan 2023 16:33:04 +0100
 > 
 >  On Fri, Jan 06, 2023 at 08:01:38AM -0600, Robert Nestor wrote:
 >  
 >  > Didn?t see a single failure in booting, but did see one kernel crash on
 >  > a ?shutdown -r now? between boots.  Unfortunately it didn?t take a dump
 >  > but it seemed to be near where the boot disk was being detached.
 >  
 >  That sounds a bit like one of the other two extra delays - I wonder if we
 >  should just make them all non-optional. Manuel, what do you think?

 The delays were probably there for a reason, although I don't remember
 exactly why. I don't know how the delays could affect shutdown though
 (I should look again at the code but I don't have time right now)

 Making then all non-optional again is probably the safest.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: David Brownlee <abs@absd.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/56737: WDCTL_RST errors in 9.99.92 and 9.99.93
Date: Thu, 31 Aug 2023 19:32:23 +0100

 I am in the process of moving an 8 disk ZFS setup from an older Dell
 T320 with an mpii controller to a newer machine with onboard Intel 600
 SATA and pcie AM1061 SATA cards.

 On a standard netbsd-10 kernel some of the disks (usually, but not
 exclusively 8TB Seagate Barracuda Compute) sometimes fail to probe on
 boot with:

 autoconfiguration error: ahcisata0 port 5: setting WDCTL_RST failed for drive 0
 wd1 at atabus1 drive 0
 autoconfiguration error: ahcisata0 port 5: setting WDCTL_RST failed for drive 0
 wd1: autoconfiguration error: IDENTIFY failed
 wd1: fixing 0 sector size
 wd1: secperunit and ncylinders are zero
 wd1: secperunit and (sectors or tracks) are zero
 autoconfiguration error: wd1: unable to open device, error = 19
 wd1(ahcisata0:5:0): using PIO mode 0

 This happens on both the onboard Intel and AM1061 SATA cards

 Adding AHCISATA_EXTRA_DELAY appears to allows the disks to reliably
 probe on every boot.

 I think in the short term we should default AHCISATA_EXTRA_DELAY on
 for netbsd-10, and possibly drop AHCI_QUIRK_EXTRA_DELAY as it seems to
 be more drive than controller related.

 It may make sense to have an opt-in faster timeout enabled on
 virtualised interfaces, or potentially if a timeout is too short have
 extra logic to reset the drive and retry the whole process with very
 generous timeouts.

 David

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: David Brownlee <abs@absd.org>
Cc: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org
Subject: Re: kern/56737: WDCTL_RST errors in 9.99.92 and 9.99.93
Date: Thu, 31 Aug 2023 21:08:33 +0200

 On Thu, Aug 31, 2023 at 07:32:23PM +0100, David Brownlee wrote:
 > [...]
 >
 > Adding AHCISATA_EXTRA_DELAY appears to allows the disks to reliably
 > probe on every boot.
 > 
 > I think in the short term we should default AHCISATA_EXTRA_DELAY on
 > for netbsd-10, and possibly drop AHCI_QUIRK_EXTRA_DELAY as it seems to
 > be more drive than controller related.

 Seconded

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Cc: 
Subject: Re: kern/56737 (WDCTL_RST errors in 9.99.92 and 9.99.93)
Date: Fri, 26 Jan 2024 16:02:54 +0900

 -------- Forwarded Message --------
 Subject: CVS commit: src/sys
 Date: Sun, 10 Sep 2023 14:04:28 +0000
 From: David Brownlee <abs@netbsd.org>
 Reply-To: source-changes-d@NetBSD.org
 To: source-changes-full@NetBSD.org

 Module Name:	src
 Committed By:	abs
 Date:		Sun Sep 10 14:04:28 UTC 2023

 Modified Files:
 	src/sys/conf: files
 	src/sys/dev/ic: ahcisata_core.c ahcisatavar.h
 	src/sys/dev/pci: ahcisata_pci.c

 Log Message:
 Rework AHCISATA_EXTRA_DELAY for kern/56737

 - Remove AHCI_QUIRK_EXTRA_DELAY as issue appears to be drive and
    not controller related
 - Replace AHCISATA_EXTRA_DELAY with AHCISATA_REMOVE_EXTRA_DELAY,
    so defaulting to enabling the extra delay, as the downside of
    slower probing on systems which do not need it is less than having
    other systems intermittently fail to probe and attach drives
 - Also allow disabling extra delay with AHCISATA_EXTRA_DELAY_MS = 0

 We should return to this code to work out which of the extra delays
 are needed, and how long they need to be. It may be that faster
 systems are more likely to trigger the issue (I've only seen it on
 a 13th gen i7-13700, though only tested on a limited set)

 XXX pullup -10


 To generate a diff of this commit:
 cvs rdiff -u -r1.1308 -r1.1309 src/sys/conf/files
 cvs rdiff -u -r1.107 -r1.108 src/sys/dev/ic/ahcisata_core.c
 cvs rdiff -u -r1.27 -r1.28 src/sys/dev/ic/ahcisatavar.h
 cvs rdiff -u -r1.69 -r1.70 src/sys/dev/pci/ahcisata_pci.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.


From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Cc: 
Subject: Re: kern/56737 (WDCTL_RST errors in 9.99.92 and 9.99.93)
Date: Fri, 26 Jan 2024 16:04:21 +0900

 -------- Forwarded Message --------
 Subject: CVS commit: [netbsd-10] src/sys
 Date: Mon, 11 Sep 2023 14:39:21 +0000
 From: Martin Husemann <martin@netbsd.org>
 Reply-To: source-changes-d@NetBSD.org
 To: source-changes-full@NetBSD.org

 Module Name:	src
 Committed By:	martin
 Date:		Mon Sep 11 14:39:21 UTC 2023

 Modified Files:
 	src/sys/conf [netbsd-10]: files
 	src/sys/dev/ic [netbsd-10]: ahcisata_core.c ahcisatavar.h
 	src/sys/dev/pci [netbsd-10]: ahcisata_pci.c

 Log Message:
 Pull up following revision(s) (requested by abs in ticket #366):

 	sys/dev/pci/ahcisata_pci.c: revision 1.70
 	sys/dev/ic/ahcisata_core.c: revision 1.108
 	sys/dev/ic/ahcisatavar.h: revision 1.28
 	sys/conf/files: revision 1.1309

 Rework AHCISATA_EXTRA_DELAY for kern/56737
 - Remove AHCI_QUIRK_EXTRA_DELAY as issue appears to be drive and
    not controller related
 - Replace AHCISATA_EXTRA_DELAY with AHCISATA_REMOVE_EXTRA_DELAY,
    so defaulting to enabling the extra delay, as the downside of
    slower probing on systems which do not need it is less than having
    other systems intermittently fail to probe and attach drives
 - Also allow disabling extra delay with AHCISATA_EXTRA_DELAY_MS = 0

 We should return to this code to work out which of the extra delays
 are needed, and how long they need to be. It may be that faster
 systems are more likely to trigger the issue (I've only seen it on
 a 13th gen i7-13700, though only tested on a limited set)


 To generate a diff of this commit:
 cvs rdiff -u -r1.1304.2.1 -r1.1304.2.2 src/sys/conf/files
 cvs rdiff -u -r1.107 -r1.107.4.1 src/sys/dev/ic/ahcisata_core.c
 cvs rdiff -u -r1.27 -r1.27.4.1 src/sys/dev/ic/ahcisatavar.h
 cvs rdiff -u -r1.68.2.1 -r1.68.2.2 src/sys/dev/pci/ahcisata_pci.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.


State-Changed-From-To: open->closed
State-Changed-By: rin@NetBSD.org
State-Changed-When: Fri, 26 Jan 2024 07:08:43 +0000
State-Changed-Why:
Pulled up to netbsd-10. netbsd-[98] are not affected.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.