NetBSD Problem Report #57063

From brad@anduin.eldar.org  Mon Oct 17 17:47:14 2022
Return-Path: <brad@anduin.eldar.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id B48581A921F
	for <gnats-bugs@gnats.NetBSD.org>; Mon, 17 Oct 2022 17:47:14 +0000 (UTC)
Message-Id: <202210171747.29HHl9vu011620@anduin.eldar.org>
Date: Mon, 17 Oct 2022 13:47:09 -0400 (EDT)
From: brad@anduin.eldar.org
Reply-To: brad@anduin.eldar.org
To: gnats-bugs@NetBSD.org
Subject: Kernel panic in -current in iic_attach
X-Send-Pr-Version: 3.95

>Number:         57063
>Category:       kern
>Synopsis:       Kernel panic in -current in iic_attach
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Oct 17 17:50:00 +0000 2022
>Closed-Date:    Thu Oct 20 07:41:05 +0000 2022
>Last-Modified:  Thu Oct 20 07:41:05 +0000 2022
>Originator:     brad@anduin.eldar.org
>Release:        NetBSD 9.99.101
>Organization:
	eldar.org
>Environment:
System: NetBSD anduin.eldar.org 9.99.101 NetBSD 9.99.101 (XEN3_DOM0) amd64
Architecture: x86_64
Machine: amd64
>Description:

NetBSD-current panics on a AsRock Rack E3C246D2I motherboard with a
Intel Xeon E-2224 cpu.  The panic occurs before the USB keyboard
attaches so it is next to impossible to use DDB.

A video pulled from the BMC is available here:

http://www.netbd.org/~brad/public_html/video_17-10-2022_13-13-51_part1.avi

As best as I can tell the panic in the iic_attach routine for dwiic.

>How-To-Repeat:

Try -current on mentioned motherboard.  NetBSD 9.2_STABLE works fine.

>Fix:

Unknown.  Hopefully something simple.  I am currently building a
kernel without dwiic to see if that doesn't panic.

>Release-Note:

>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: brad@anduin.eldar.org
Cc: gnats-bugs@NetBSD.org, bouyer@NetBSD.org
Subject: Re: kern/57063: Kernel panic in -current in iic_attach
Date: Tue, 18 Oct 2022 10:16:28 +0000

 > kernel: page fault trap, code=3D0
 > Stopped in pid 0.0 (system) at  netbsd:iic_attach+0x64: movq    %rax,70(%=
 r12)
 > iic_attach() at netbsd:iic_attach

 This is almost certainly because something in dwiic_attach failed, but
 pci_dwiic_attach blithely ignored the failure and barged ahead trying
 to attach a child with unininitialized i2cbus_attach_args having a
 null i2c tag:

 https://nxr.netbsd.org/xref/src/sys/arch/x86/pci/dwiic_pci.c?r=3D1.8#280

     280 	dwiic_attach(&sc->sc_dwiic);
     281=20
     282 	config_found(self, &sc->sc_dwiic.sc_iba, iicbus_print, CFARGS_NONE=
 );

From: Brad Spencer <brad@anduin.eldar.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/57063: Kernel panic in -current in iic_attach
Date: Tue, 18 Oct 2022 10:17:44 -0400

 Taylor R Campbell <riastradh@NetBSD.org> writes:

 > The following reply was made to PR kern/57063; it has been noted by GNATS.
 >
 > From: Taylor R Campbell <riastradh@NetBSD.org>
 > To: brad@anduin.eldar.org
 > Cc: gnats-bugs@NetBSD.org, bouyer@NetBSD.org
 > Subject: Re: kern/57063: Kernel panic in -current in iic_attach
 > Date: Tue, 18 Oct 2022 10:16:28 +0000
 >
 >  > kernel: page fault trap, code=3D0
 >  > Stopped in pid 0.0 (system) at  netbsd:iic_attach+0x64: movq    %rax,70(%=
 >  r12)
 >  > iic_attach() at netbsd:iic_attach
 >  
 >  This is almost certainly because something in dwiic_attach failed, but
 >  pci_dwiic_attach blithely ignored the failure and barged ahead trying
 >  to attach a child with unininitialized i2cbus_attach_args having a
 >  null i2c tag:
 >  
 >  https://nxr.netbsd.org/xref/src/sys/arch/x86/pci/dwiic_pci.c?r=3D1.8#280
 >  
 >      280 	dwiic_attach(&sc->sc_dwiic);
 >      281=20
 >      282 	config_found(self, &sc->sc_dwiic.sc_iba, iicbus_print, CFARGS_NONE=
 >  );
 >  


 I looked into this some more.  If you leave out dwiic from the kernel,
 the panic goes away, so that confirms that it is dwiic.

 I messed with the code for a while until I really needed the DOM0
 functional and this is what I noted:

 o The driver wants a device that returns 0x44570140 in the
 DW_IC_COMP_TYPE register

 o My device returns 0xf000eef3 in that register, so the driver fails in
 dwiic_init

 o Upon failing in dwiic_init a 1 is returned back to dwiic_attach and
 back to pci_dwiic_attach.  However, none of the sub-drivers for PCI,
 ACPI or FDT catch this fail and do the config anyway as Taylor noted.

 o If you do the obvious:


 --- dwiic_pci.c.DIST    2021-10-28 06:59:10.187238063 -0400
 +++ dwiic_pci.c 2022-10-18 08:31:42.976592974 -0400
 @@ -277,11 +277,10 @@
                 aprint_verbose_dev(self, "no matching ACPI node\n");
         }

 -       dwiic_attach(&sc->sc_dwiic);
 -
 -       config_found(self, &sc->sc_dwiic.sc_iba, iicbus_print, CFARGS_NONE);
 -
 -       pmf_device_register(self, dwiic_suspend, dwiic_resume);
 +       if (dwiic_attach(&sc->sc_dwiic)) {
 +               config_found(self, &sc->sc_dwiic.sc_iba, iicbus_print, CFARGS_NONE);
 +               pmf_device_register(self, dwiic_suspend, dwiic_resume);
 +       }

  out:
         return;

 you will avoid the panic I send-pr'ed about, but will panic later
 because the interrupts are apparently set up and a crash occurs in
 dwiic_intr (assuming I read the screen correctly) at a later time.

 o If you comment out the type check in dwiic_init and and try to use the
 device I have anyway you do not appear to get a panic anywhere, but the
 driver reports fails in other ways indicating that the device I have
 really won't work with the driver:

 [     1.000003] dwiic0 at pci0 dev 21 function 0: I2C controller instance 0
 [     1.000003] dwiic0: interrupting at ioapic0 pin 16
 [     1.000003] dwiic0: failed to disable
 [     1.000003] dwiic0: failed to disable
 [     1.000003] iic0 at dwiic0: I2C bus
 [     1.000003] dwiic1 at pci0 dev 21 function 1autoconfiguration error: : can't map register space




 -- 
 Brad Spencer - brad@anduin.eldar.org - KC8VKS - http://anduin.eldar.org

From: Paul Goyette <paul@whooppee.com>
To: Brad Spencer <brad@anduin.eldar.org>
Cc: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org, 
    netbsd-bugs@netbsd.org
Subject: Re: kern/57063: Kernel panic in -current in iic_attach
Date: Tue, 18 Oct 2022 07:26:37 -0700 (PDT)

 On Tue, 18 Oct 2022, Brad Spencer wrote:

 <big snip>

 > o If you comment out the type check in dwiic_init and and try to use the
 > device I have anyway you do not appear to get a panic anywhere, but the
 > driver reports fails in other ways indicating that the device I have
 > really won't work with the driver:
 >
 > [     1.000003] dwiic0 at pci0 dev 21 function 0: I2C controller instance 0
 > [     1.000003] dwiic0: interrupting at ioapic0 pin 16
 > [     1.000003] dwiic0: failed to disable
 > [     1.000003] dwiic0: failed to disable
 > [     1.000003] iic0 at dwiic0: I2C bus
 > [     1.000003] dwiic1 at pci0 dev 21 function 1autoconfiguration error: : can't map register space

 Sounds to me like maybe the dwiic driver should make this ``type
 check'' during dwiic_match() and fail there.


 +--------------------+--------------------------+----------------------+
 | Paul Goyette       | PGP Key fingerprint:     | E-mail addresses:    |
 | (Retired)          | FA29 0E3B 35AF E8AE 6651 | paul@whooppee.com    |
 | Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette@netbsd.org  |
 | & Network Engineer |                          | pgoyette99@gmail.com |
 +--------------------+--------------------------+----------------------+

From: Jason Thorpe <thorpej@me.com>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org,
 gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org,
 brad@anduin.eldar.org
Subject: Re: kern/57063: Kernel panic in -current in iic_attach
Date: Tue, 18 Oct 2022 07:51:31 -0700

 > On Oct 18, 2022, at 7:30 AM, Paul Goyette <paul@whooppee.com> wrote:
 >=20
 > Sounds to me like maybe the dwiic driver should make this ``type
 > check'' during dwiic_match() and fail there.

 =E2=80=9CMatch=E2=80=9D routines for PCI drivers should not map the =
 space =E2=80=A6 they should rely solely on the PCI device ID.

 -- thorpej

From: Paul Goyette <paul@whooppee.com>
To: Jason Thorpe <thorpej@me.com>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/57063: Kernel panic in -current in iic_attach
Date: Tue, 18 Oct 2022 08:28:46 -0700 (PDT)

 > routines for PCI drivers should not map the 
 > space they should rely solely on the PCI device ID.

 Ah, of course.

 Which means that either the driver needs to learn how to handle
 this particular "type" or the failure path for dwiic_attach()
 needs to more completely disable the device.


 +--------------------+--------------------------+----------------------+
 | Paul Goyette       | PGP Key fingerprint:     | E-mail addresses:    |
 | (Retired)          | FA29 0E3B 35AF E8AE 6651 | paul@whooppee.com    |
 | Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette@netbsd.org  |
 | & Network Engineer |                          | pgoyette99@gmail.com |
 +--------------------+--------------------------+----------------------+

From: Brad Spencer <brad@anduin.eldar.org>
To: Paul Goyette <paul@whooppee.com>
Cc: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org
Subject: Re: kern/57063: Kernel panic in -current in iic_attach
Date: Tue, 18 Oct 2022 14:17:01 -0400

 Paul Goyette <paul@whooppee.com> writes:

 > On Tue, 18 Oct 2022, Brad Spencer wrote:
 >
 > <big snip>
 >
 >> o If you comment out the type check in dwiic_init and and try to use the
 >> device I have anyway you do not appear to get a panic anywhere, but the
 >> driver reports fails in other ways indicating that the device I have
 >> really won't work with the driver:
 >>
 >> [     1.000003] dwiic0 at pci0 dev 21 function 0: I2C controller instance 0
 >> [     1.000003] dwiic0: interrupting at ioapic0 pin 16
 >> [     1.000003] dwiic0: failed to disable
 >> [     1.000003] dwiic0: failed to disable
 >> [     1.000003] iic0 at dwiic0: I2C bus
 >> [     1.000003] dwiic1 at pci0 dev 21 function 1autoconfiguration error: : can't map register space
 >
 > Sounds to me like maybe the dwiic driver should make this ``type
 > check'' during dwiic_match() and fail there.
 >
 >
 > +--------------------+--------------------------+----------------------+
 > | Paul Goyette       | PGP Key fingerprint:     | E-mail addresses:    |
 > | (Retired)          | FA29 0E3B 35AF E8AE 6651 | paul@whooppee.com    |
 > | Software Developer | 0786 F758 55DE 53BA 7731 | pgoyette@netbsd.org  |
 > | & Network Engineer |                          | pgoyette99@gmail.com |
 > +--------------------+--------------------------+----------------------+


 What Jason said..  match routines probably should not do that.

 I can't work on this again right now, but looking at the driver code
 suggests that return values are ignored in a number of places where they
 probably should not be and this is true for all attachment types, as best
 as I can tell.

 I also suspect that someone else, someday, will run into this same thing
 again unless I got REALLY lucky in picking a motherboard with that
 particular chip variant in it (this isn't impossible, the motherboard
 was designed some time ago and it may be the case that the chip I have
 is older than what the driver was meant for).  There is nothing at all
 that suggests that this is limited to a Xen DOM0 kernel and probably
 will panic in the same way with GENERIC.




 -- 
 Brad Spencer - brad@anduin.eldar.org - KC8VKS - http://anduin.eldar.org

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57063 CVS commit: src/sys
Date: Wed, 19 Oct 2022 22:28:36 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Wed Oct 19 22:28:36 UTC 2022

 Modified Files:
 	src/sys/arch/x86/pci: dwiic_pci.c
 	src/sys/dev/acpi: dwiic_acpi.c
 	src/sys/dev/fdt: dwiic_fdt.c

 Log Message:
 dwiic(4): Don't try to attach children if dwiic_attach failed.

 PR kern/57063


 To generate a diff of this commit:
 cvs rdiff -u -r1.8 -r1.9 src/sys/arch/x86/pci/dwiic_pci.c
 cvs rdiff -u -r1.9 -r1.10 src/sys/dev/acpi/dwiic_acpi.c
 cvs rdiff -u -r1.4 -r1.5 src/sys/dev/fdt/dwiic_fdt.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 19 Oct 2022 22:35:21 +0000
State-Changed-Why:
candidate fix committed, please test and report back


From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57063 CVS commit: src/sys/dev/ic
Date: Wed, 19 Oct 2022 22:34:10 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Wed Oct 19 22:34:10 UTC 2022

 Modified Files:
 	src/sys/dev/ic: dwiic.c dwiic_var.h

 Log Message:
 dwiic(4): Don't try processing interrupts before attach completes.

 PR kern/57063


 To generate a diff of this commit:
 cvs rdiff -u -r1.8 -r1.9 src/sys/dev/ic/dwiic.c
 cvs rdiff -u -r1.2 -r1.3 src/sys/dev/ic/dwiic_var.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Brad Spencer <brad@anduin.eldar.org>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
        riastradh@NetBSD.org
Subject: Re: kern/57063 (Kernel panic in -current in iic_attach)
Date: Wed, 19 Oct 2022 22:12:57 -0400

 riastradh@NetBSD.org writes:

 > Synopsis: Kernel panic in -current in iic_attach
 >
 > State-Changed-From-To: open->feedback
 > State-Changed-By: riastradh@NetBSD.org
 > State-Changed-When: Wed, 19 Oct 2022 22:35:21 +0000
 > State-Changed-Why:
 > candidate fix committed, please test and report back

 The panic appears to be gone.  Thanks.


 The boot messages look like this:

 dwiic0 at pci0 dev 21 function 0: I2C controller instance 0
 dwiic0: interrupting at ioapic0 pin 16
 dwiic0: autoconfiguration error: failed initializing
 dwiic1 at pci0 dev 21 function 1autoconfiguration error: : can't map register space

 This is an acceptable output that can be improved upon later.






 -- 
 Brad Spencer - brad@anduin.eldar.org - KC8VKS - http://anduin.eldar.org

State-Changed-From-To: feedback->closed
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Thu, 20 Oct 2022 07:41:05 +0000
State-Changed-Why:
panic fixed


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2022 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.