NetBSD Problem Report #48733

From Wolfgang.Stukenbrock@nagler-company.com  Fri Apr 11 14:35:34 2014
Return-Path: <Wolfgang.Stukenbrock@nagler-company.com>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 3FB3CA5807
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 11 Apr 2014 14:35:34 +0000 (UTC)
Message-Id: <20140411131311.74AF4123B93@test-s0.nagler-company.com>
Date: Fri, 11 Apr 2014 15:13:11 +0200 (CEST)
From: Wolfgang.Stukenbrock@nagler-company.com
Reply-To: Wolfgang.Stukenbrock@nagler-company.com
To: gnats-bugs@gnats.NetBSD.org
Subject: deadlock in if_output() with interrupt on KERNEL_LOCK
X-Send-Pr-Version: 3.95

>Number:         48733
>Category:       kern
>Synopsis:       deadlock in if_output() with interrupt on KERNEL_LOCK
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Apr 11 14:40:00 +0000 2014
>Last-Modified:  Mon Jun 02 02:15:02 +0000 2014
>Originator:     Wolfgang Stukenbrock
>Release:        NetBSD 6.1 - MAIN
>Organization:
Dr. Nagler & Company GmbH
>Environment:


System: NetBSD e013 6.1 NetBSD 6.1 (NSW-E013) #64: Tue Oct 22 11:03:30 CEST 2013  wgstuken@s012:/export/NetBSD-6.1/N+C-build/.OBJDIR_amd64/export/NetBSD-6.1/src/sys/arch/amd64/compile/NSW-E013 amd64
Architecture: x86_64
Machine: amd64
>Description:
	Problem located in /src/sys/netinet/ip_output.c.
	Since file revision 1.208 the Kernel-Lock is locked prior calling if_output
	on the interface.
	Now - at least the wm-driver - will call splnet() and splx() inside the output
	routine.
	If any interrupt occurs in between splnet() and splx(), the interrupt is delayed and
	is processes in splx() when the level is released again.
	If such an interrupt is e.g. not MP-SAFE, the call stup in intr_biglock_wrapper() is
	used to call the interrupt routine and that one will lock the KERNEL-LOCK again.
	So we try to lock it again here -> dead-lock.

	Our system runs fine with 4 8257x interfaces, but after adding 2 additional 8254x
	interfaces, the system lock-up after a short time. Don't ask me, why the if_output
	call takes "to long" with theese two additonal interfaces, but it is reproducable.
	I've analysed this several times with DDB. Most times I've seen an USB-interrupt
	that dead-lock the system.
>How-To-Repeat:
	Run a lot of trafic over wm-interfaces and do shomething e.g. on USB at the same
	time. It is just a question of time till system-dead-lock.
>Fix:
	Fist guess: revert change done from 1.207 to 1.208.
	But I've no idea about side effects.

>Audit-Trail:
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
Date: Fri, 11 Apr 2014 17:59:09 +0200

 On Fri, Apr 11, 2014 at 02:40:00PM +0000, Wolfgang.Stukenbrock@nagler-company.com wrote:
 > >Description:
 > 	Problem located in /src/sys/netinet/ip_output.c.
 > 	Since file revision 1.208 the Kernel-Lock is locked prior calling if_output
 > 	on the interface.
 > 	Now - at least the wm-driver - will call splnet() and splx() inside the output
 > 	routine.
 > 	If any interrupt occurs in between splnet() and splx(), the interrupt is delayed and
 > 	is processes in splx() when the level is released again.
 > 	If such an interrupt is e.g. not MP-SAFE, the call stup in intr_biglock_wrapper() is
 > 	used to call the interrupt routine and that one will lock the KERNEL-LOCK again.
 > 	So we try to lock it again here -> dead-lock.
 > 
 > 	Our system runs fine with 4 8257x interfaces, but after adding 2 additional 8254x
 > 	interfaces, the system lock-up after a short time. Don't ask me, why the if_output
 > 	call takes "to long" with theese two additonal interfaces, but it is reproducable.
 > 	I've analysed this several times with DDB. Most times I've seen an USB-interrupt
 > 	that dead-lock the system.

 I think your analsys is wrong. the KERNEL_LOCK is special in the sense that
 it can be locked multiple time on the same CPU. So it's not a problem
 that splx() on the same CPU tries to get KERNEL_LOCK again, it will just
 increase the lock count. A splx() on another CPU will wait for the
 KERNEL_LOCK to be relased.

 I think your problem is more likely in the USB stack.
 Maybe one of your new ethernet interface shares an interrupt with the
 USB controller ?


 > >How-To-Repeat:
 > 	Run a lot of trafic over wm-interfaces and do shomething e.g. on USB at the same
 > 	time. It is just a question of time till system-dead-lock.
 > >Fix:
 > 	Fist guess: revert change done from 1.207 to 1.208.
 > 	But I've no idea about side effects.

 Very bad: the output queues are protected by the KERNEL_LOCK and splnet().
 If you revert ip_output 1.208, you'll also have to revert ip_input.c
 1.286 and 1.285, so that the whole IP stack runs under the KERNEL_LOCK again.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
Date: Sun, 27 Apr 2014 10:21:03 +0200

 Hi, sorry for the delay, but I'm working off-side most time at the moment.

 I have to look onto the system to validate the assumption, that the USB 
 shares the interupt. (I remember that there are some interrupts shared.)
 But I can do this not earlyer as 02.05.2014 or later ...
 And lowering the spl-level does not imply that the interrupt has to be 
 shared with the wm-driver to run into this problem.

 The USB-interupt is not the only one I've seen.
 And I've analysed the stack backtrace that has dead-locked the system 
 and it locks up in the interrupt code while trying to get the KERNEL_LOCK!
 I will have a look at this again, but as far as I remember it definitly 
 locks up in the splx while trying to get the KERNEL_LOCK. splx does not 
 return anymore and I've got the interrupt source by translating the jump 
 address to a symbol that has not been reached anymore.
 (I've still not commented out the LOCK/UNLOCK prior calling the 
 output-routine to validate my analyses - no time till now, sorry - but I 
 will do it as soon as possible.)

 I will send the results of my review as soon as I've be able to do it.

 best regards

 W. Stukenbrock

 Manuel Bouyer wrote:

 > The following reply was made to PR kern/48733; it has been noted by GNATS.
 > 
 > From: Manuel Bouyer <bouyer@antioche.eu.org>
 > To: gnats-bugs@NetBSD.org
 > Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
 > Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
 > Date: Fri, 11 Apr 2014 17:59:09 +0200
 > 
 >  On Fri, Apr 11, 2014 at 02:40:00PM +0000, Wolfgang.Stukenbrock@nagler-company.com wrote:
 >  > >Description:
 >  > 	Problem located in /src/sys/netinet/ip_output.c.
 >  > 	Since file revision 1.208 the Kernel-Lock is locked prior calling if_output
 >  > 	on the interface.
 >  > 	Now - at least the wm-driver - will call splnet() and splx() inside the output
 >  > 	routine.
 >  > 	If any interrupt occurs in between splnet() and splx(), the interrupt is delayed and
 >  > 	is processes in splx() when the level is released again.
 >  > 	If such an interrupt is e.g. not MP-SAFE, the call stup in intr_biglock_wrapper() is
 >  > 	used to call the interrupt routine and that one will lock the KERNEL-LOCK again.
 >  > 	So we try to lock it again here -> dead-lock.
 >  > 
 >  > 	Our system runs fine with 4 8257x interfaces, but after adding 2 additional 8254x
 >  > 	interfaces, the system lock-up after a short time. Don't ask me, why the if_output
 >  > 	call takes "to long" with theese two additonal interfaces, but it is reproducable.
 >  > 	I've analysed this several times with DDB. Most times I've seen an USB-interrupt
 >  > 	that dead-lock the system.
 >  
 >  I think your analsys is wrong. the KERNEL_LOCK is special in the sense that
 >  it can be locked multiple time on the same CPU. So it's not a problem
 >  that splx() on the same CPU tries to get KERNEL_LOCK again, it will just
 >  increase the lock count. A splx() on another CPU will wait for the
 >  KERNEL_LOCK to be relased.
 >  
 >  I think your problem is more likely in the USB stack.
 >  Maybe one of your new ethernet interface shares an interrupt with the
 >  USB controller ?
 >  
 >  
 >  > >How-To-Repeat:
 >  > 	Run a lot of trafic over wm-interfaces and do shomething e.g. on USB at the same
 >  > 	time. It is just a question of time till system-dead-lock.
 >  > >Fix:
 >  > 	Fist guess: revert change done from 1.207 to 1.208.
 >  > 	But I've no idea about side effects.
 >  
 >  Very bad: the output queues are protected by the KERNEL_LOCK and splnet().
 >  If you revert ip_output 1.208, you'll also have to revert ip_input.c
 >  1.286 and 1.285, so that the whole IP stack runs under the KERNEL_LOCK again.
 >  
 >  -- 
 >  Manuel Bouyer <bouyer@antioche.eu.org>
 >       NetBSD: 26 ans d'experience feront toujours la difference
 >  --
 >  
 > 
 > 
 > Received: from DBXPR07MB317.eurprd07.prod.outlook.com (10.141.12.139) by
 >  DBXPR07MB319.eurprd07.prod.outlook.com (10.141.12.141) with Microsoft SMTP
 >  Server (TLS) id 15.0.918.8 via Mailbox Transport; Fri, 11 Apr 2014 16:00:11
 >  +0000
 > Received: from DBXPR07CA001.eurprd07.prod.outlook.com (10.255.191.159) by
 >  DBXPR07MB317.eurprd07.prod.outlook.com (10.141.12.139) with Microsoft SMTP
 >  Server (TLS) id 15.0.918.8; Fri, 11 Apr 2014 16:00:10 +0000
 > Received: from DB3FFO11FD011.protection.gbl (2a01:111:f400:7e04::177) by
 >  DBXPR07CA001.outlook.office365.com (2a01:111:e400:9800::31) with Microsoft
 >  SMTP Server (TLS) id 15.0.918.8 via Frontend Transport; Fri, 11 Apr 2014
 >  16:00:09 +0000
 > Received: from e002.nagler-company.com (212.185.86.227) by
 >  DB3FFO11FD011.mail.protection.outlook.com (10.47.216.167) with Microsoft SMTP
 >  Server (TLS) id 15.0.918.6 via Frontend Transport; Fri, 11 Apr 2014 16:00:08
 >  +0000
 > Received: from mollari.NetBSD.org (mollari.netbsd.org [149.20.53.80])
 > 	by e002.nagler-company.com (8.14.7/8.14.7) with ESMTP id s3BG0378005931
 > 	for <Wolfgang.Stukenbrock@nagler-company.com>; Fri, 11 Apr 2014 18:00:06 +0200 (CEST)
 > Received: by mollari.NetBSD.org (Postfix, from userid 31008)
 > 	id C23A5A5828; Fri, 11 Apr 2014 16:00:01 +0000 (UTC)
 > From: Manuel Bouyer <bouyer@antioche.eu.org>
 > To: <kern-bug-people@netbsd.org>, <gnats-admin@netbsd.org>,
 > 	<netbsd-bugs@netbsd.org>, <Wolfgang.Stukenbrock@nagler-company.com>
 > Reply-To: <gnats-bugs@NetBSD.org>
 > Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
 > References: <pr-kern-48733@gnats.netbsd.org>
 >   <20140411131311.74AF4123B93@test-s0.nagler-company.com>
 > X-Gnats-Was-Stupid: no
 > CC:
 > Message-ID: <20140411160001.C23A5A5828@mollari.NetBSD.org>
 > Date: Fri, 11 Apr 2014 16:00:01 +0000
 > Return-Path: gnats@NetBSD.org
 > X-EOPAttributedMessage: 0
 > X-MS-Exchange-Organization-MessageDirectionality: Incoming
 > X-Forefront-Antispam-Report: CIP:212.185.86.227;CTRY:DE;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(979002)(6009001)(428001)(50944004)(51704005)(24454002)(199002)(189002)(16796002)(46386002)(74502001)(70486001)(50466002)(87836001)(46102001)(90896003)(77982001)(52956003)(54356999)(50986999)(76176999)(53806999)(81542001)(81342001)(74662001)(45336002)(43066001)(83072002)(85852003)(6806004)(19580395003)(83322001)(42186004)(44976005)(80976001)(19580405001)(76482001)(99396002)(4396001)(80022001)(33656001)(20776003)(47776003)(48376002)(2201001)(79102001)(92726001)(42882001)(90966001)(969003)(989001)(999001)(1009001)(1019001);DIR:INB;SFP:;SCL:1;SRVR:DBXPR07MB317;H:e002.nagler-company.com;FPR:BF74F31D.9C06D725.B1F32CB3.4CE95053.203B9;PTR:e002.nagler-company.com;A:1;MX:1;LANG:en;
 > Content-Type: text/plain
 > X-MS-Exchange-Organization-Network-Message-Id: ae18e11e-2fa8-44eb-8125-08d123b63ff9
 > X-MS-Exchange-Organization-AVStamp-Service: 1.0
 > Received-SPF: None (: NetBSD.org does not designate permitted sender hosts)
 > X-MS-Exchange-Organization-SCL: 1
 > X-MS-Exchange-Organization-AuthSource: DB3FFO11FD011.protection.gbl
 > X-MS-Exchange-Organization-AuthAs: Anonymous
 > MIME-Version: 1.0
 > 
 > The following reply was made to PR kern/48733; it has been noted by GNATS.
 > 
 > From: Manuel Bouyer <bouyer@antioche.eu.org>
 > To: gnats-bugs@NetBSD.org
 > Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
 > Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
 > Date: Fri, 11 Apr 2014 17:59:09 +0200
 > 
 >  On Fri, Apr 11, 2014 at 02:40:00PM +0000, Wolfgang.Stukenbrock@nagler-company.com wrote:
 >  > >Description:
 >  > 	Problem located in /src/sys/netinet/ip_output.c.
 >  > 	Since file revision 1.208 the Kernel-Lock is locked prior calling if_output
 >  > 	on the interface.
 >  > 	Now - at least the wm-driver - will call splnet() and splx() inside the output
 >  > 	routine.
 >  > 	If any interrupt occurs in between splnet() and splx(), the interrupt is delayed and
 >  > 	is processes in splx() when the level is released again.
 >  > 	If such an interrupt is e.g. not MP-SAFE, the call stup in intr_biglock_wrapper() is
 >  > 	used to call the interrupt routine and that one will lock the KERNEL-LOCK again.
 >  > 	So we try to lock it again here -> dead-lock.
 >  > 
 >  > 	Our system runs fine with 4 8257x interfaces, but after adding 2 additional 8254x
 >  > 	interfaces, the system lock-up after a short time. Don't ask me, why the if_output
 >  > 	call takes "to long" with theese two additonal interfaces, but it is reproducable.
 >  > 	I've analysed this several times with DDB. Most times I've seen an USB-interrupt
 >  > 	that dead-lock the system.
 >  
 >  I think your analsys is wrong. the KERNEL_LOCK is special in the sense that
 >  it can be locked multiple time on the same CPU. So it's not a problem
 >  that splx() on the same CPU tries to get KERNEL_LOCK again, it will just
 >  increase the lock count. A splx() on another CPU will wait for the
 >  KERNEL_LOCK to be relased.
 >  
 >  I think your problem is more likely in the USB stack.
 >  Maybe one of your new ethernet interface shares an interrupt with the
 >  USB controller ?
 >  
 >  
 >  > >How-To-Repeat:
 >  > 	Run a lot of trafic over wm-interfaces and do shomething e.g. on USB at the same
 >  > 	time. It is just a question of time till system-dead-lock.
 >  > >Fix:
 >  > 	Fist guess: revert change done from 1.207 to 1.208.
 >  > 	But I've no idea about side effects.
 >  
 >  Very bad: the output queues are protected by the KERNEL_LOCK and splnet().
 >  If you revert ip_output 1.208, you'll also have to revert ip_input.c
 >  1.286 and 1.285, so that the whole IP stack runs under the KERNEL_LOCK again.
 >  
 >  -- 
 >  Manuel Bouyer <bouyer@antioche.eu.org>
 >       NetBSD: 26 ans d'experience feront toujours la difference
 >  --
 >  
 > 


 -- 


 Dr. Nagler & Company GmbH
 Hauptstraße 9
 92253 Schnaittenbach

 Tel. +49 9622/71 97-42
 Fax +49 9622/71 97-50

 Wolfgang.Stukenbrock@nagler-company.com
 http://www.nagler-company.com


 Hauptsitz: Schnaittenbach
 Handelregister: Amberg HRB
 Gerichtsstand: Amberg
 Steuernummer: 201/118/51825
 USt.-ID-Nummer: DE 273143997
 Geschäftsführer: Dr. Martin Nagler


From: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
Date: Fri, 09 May 2014 14:18:18 +0200

 Hi again,

 I've tried to analyse the stackframes etc. a little bit more - but now I 
 need some help ...

 After entering DDB I see the following:

 db{0}> ps /l
 PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
 666  >   1 7   0         0   fffffe822c504b20               ftpd
 638      1 3   1        80   fffffe822e047280                csh ttyraw
 646      1 3   1        80   fffffe822dae06e0             bounce kqueue
 505      1 3   1        80   fffffe822dae02c0               smtp lockf
 615      1 3   0        80   fffffe822c504700               smtp kqueue
 632      1 3   0        80   fffffe822c605b60    trivial-rewrite kqueue
 639      1 3   1        80   fffffe81330ce240            cleanup kqueue
 690      1 3   1        80   fffffe81330cea80               qmgr kqueue
 644      1 3   0        80   fffffe81330ce660             pickup kqueue
 691      1 3   1        80   fffffe810f80f220              login wait
 507      1 3   0        80   fffffe822c8c6300              inetd kqueue
 628      1 3   1        80   fffffe822c8c6720             master kqueue
 437      1 3   0        80   fffffe822e047ac0            choparp select
 381      1 3   0        80   fffffe822c8c6b40             powerd kqueue
 337      1 3   0        80   fffffe822df43260             logger pipe_rd
 333      1 3   0        80   fffffe822df0b2a0                 sh wait
 183  >   1 7   1         0   fffffe822df0bae0            syslogd
 1        1 3   1        80   fffffe810f1d21a0               init wait
 0       66 3   1       200   fffffe822e0476a0            physiod physiod
 0       65 3   0       200   fffffe810f80f640           aiodoned aiodoned
 0       64 3   1       200   fffffe810f506200            ioflush syncer
 0       63 3   0       200   fffffe810f80fa60           pgdaemon pgdaemon
 0       60 3   0       200   fffffe810f506620            raidio1 raidiow
 0       59 3   0       200   fffffe810f506a40              raid1 rfnodeq
 0       58 3   1       200   fffffe810f21c1e0            raidio0 raidiow
 0       57 3   1       200   fffffe810f21c600              raid0 rfnodeq
 0       56 3   1       200   fffffe810f10d160            swcrthr swcr_wait
 0       55 3   1       200   fffffe810f1dda00          cryptoret crypto_w
 0       53 3   1       200   fffffe810f10d9a0               usb6 usbevt
 0       52 3   1       200   fffffe810f14e9c0               usb4 usbevt
 0       51 3   1       200   fffffe810f0cc560               usb7 usbevt
 0       50 3   1       200   fffffe810f10d580               usb2 usbevt
 0       49 3   1       200   fffffe810f0cc140               usb5 usbevt
 0       48 3   1       200   fffffe810f08a120               usb3 usbevt
 0       47 3   1       200   fffffe810f0cc980               usb0 usbevt
 0       46 3   1       200   fffffe810f21ca20         usbtask-dr usbtsk
 0       45 3   1       200   fffffe810f1dd1c0         usbtask-hc usbtsk
 0       44 3   1       200   fffffe810f1dd5e0               usb1 usbevt
 0       42 3   1       200   fffffe810f1d25c0              unpgc unpgc
 0       41 3   1       200   fffffe810f1d29e0        vmem_rehash vmem_rehash
 0       40 3   0       200   fffffe810f14e180          coretemp1 coretemp1
 0       39 3   1       200   fffffe810f14e5a0          coretemp0 coretemp0
 0       30 3   1       200   fffffe810f08a540            atabus5 atath
 0       29 3   0       200   fffffe810f08a960            atabus4 atath
 0       28 3   0       200   fffffe810f01e100               iic0 iicintr
 0       27 3   1       200   fffffe810f01e520            atabus3 atath
 0       26 3   1       200   fffffe810f01e940             atabus2 atath
 0       25 3   0       200   fffffe810ea580e0            atabus1 atath
 0       24 3   0       200   fffffe810ea58500            atabus0 atath
 0       23 3   0       200   fffffe810ea58920               apm0 apmev
 0       22 3   1       200   fffffe810e9f30c0            xcall/1 xcall
 0       21 1   1       200   fffffe810e9f34e0          softser/1
 0       20 3   1       200   fffffe810e9f3900          softclk/1 tstile
 0       19 1   1       200   fffffe810e9b50a0          softbio/1
 0       18 1   1       200   fffffe810e9b54c0          softnet/1
 0       17 1   1       201   fffffe810e9b58e0             idle/1
 0       16 3   0       200   fffffe822e11b080             sysmon smtaskq
 0       15 3   0       200   fffffe822e11b4a0         pmfsuspend pmfsuspend
 0       14 3   0       200   fffffe822e11b8c0           pmfevent pmfevent
 0       13 3   0       200   fffffe822ef24060         sopendfree sopendfr
 0       12 3   0       200   fffffe822ef24480           nfssilly nfssilly
 0       11 3   1       200   fffffe822ef248a0            cachegc cachegc
 0       10 3   1       200   fffffe822f72a040              vrele vrele
 0        9 3   1       200   fffffe822f72a460             vdrain vdrain
 0        8 3   0       200   fffffe822f72a880          modunload mod_unld
 0        7 3   0       200   fffffe822f736020            xcall/0 xcall
 0    >   6 7   0       200   fffffe822f736440          softser/0
 0        5 1   0       200   fffffe822f736860          softclk/0
 0        4 1   0       200   fffffe822f73f000          softbio/0
 0    >   3 7   0       200   fffffe822f73f420          softnet/0
 0        2 1   0       201   fffffe822f73f840             idle/0
 0        1 3   1       200   ffffffff80655480            swapper uvm


 OK - I thing the kernel lwp 6 and 3 are most relevant, but first the 
 other two processes marked with '>':

 db{0}> bt/a fffffe822c504b20
 trace: pid 666 lid 1 at 0xfffffe810f8a9b40
 soo_write() at netbsd:soo_write+0x2c
 dofilewrite() at netbsd:dofilewrite+0x7d
 sys_write() at netbsd:sys_write+0x62
 syscall() at netbsd:syscall+0xac

 No surprise - I'm running an ftp to the system to trigger the problem.
 ftpd is gooing to send some data.

 db{0}> bt/a fffffe822df0bae0
 trace: pid 183 lid 1 at 0xfffffe810f82abe0
 kevent1() at netbsd:kevent1+0x5c0
 sys___kevent50() at netbsd:sys___kevent50+0x33
 syscall() at netbsd:syscall+0xac

 Whatever syslogd is gooing to do - I do not think it is relevant.
 I think it is woken by a message added by either my print statements or 
 the unknown "interrupt" routine (see below).

 db{0}> bt/a fffffe822f73f420
 trace: pid 0 lid 3 at 0xfffffe810e967760
 ether_output() at netbsd:ether_output+0x2b6
 ip_output() at netbsd:ip_output+0xa8f
 tcp_output() at netbsd:tcp_output+0x1698
 tcp_input() at netbsd:tcp_input+0x15d9
 ip_input() at netbsd:ip_input+0x3ef
 ipintr() at netbsd:ipintr+0x109
 softint_dispatch() at netbsd:softint_dispatch+0xd9
 DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e967d70
 Xsoftintr() at netbsd:Xsoftintr+0x4f
 --- interrupt ---
 0:

 That is the part that is gooing to send a packet. I see the printout in 
 ip_output prior calling 'ifp->if_output()' - not the one behind.
 The location pointed to by the backtrace in ether_output() is the call 
 to "return ifq_enqueue(...)". I also see the printout I've added in 
 front of this call, but not the one behind.
 In ifq_enqueue() I see the output of the call to 'ifp->if_start' - the 
 wm-driver - in this routine and the printout in front of the splx(s) at 
 the end of the routine - not the printout behind it.
 This is the localtion where the deadlock happens while processing other 
 interrupts in Xspllower.
 This always looks the same ....



 db{0}> bt/a fffffe822f736440
 trace: pid 0 lid 6 at 0xfffffe810e9739c8
 breakpoint() at netbsd:breakpoint+0x5
 comintr() at netbsd:comintr+0x518
 Xintr_ioapic_edge1() at netbsd:Xintr_ioapic_edge1+0xea
 --- interrupt ---
 bus_space_read_4() at netbsd:bus_space_read_4+0xa
 intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x3b
 Xintr_ioapic_level6() at netbsd:Xintr_ioapic_level6+0xf2
 --- interrupt ---
 Xspllower() at netbsd:Xspllower+0xe
 DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e973d70
 Xsoftintr() at netbsd:Xsoftintr+0x4f
 --- interrupt ---
 0:


 Hmmm - not shure about it ...
 It looks like that during processing one pending interrupt in Xspllower 
 at the end of that routine an interrupt came im that takes the 
 KERNEL_LOCK in intr_biglock_wrapper() again and do what? Hangup in 
 bus_space_read_4() ???? Busy-loop for whatever reason in that interrupt 
 and the location where the DDB-enter occures in bus_space_read_4() is 
 just random ????
 The comintr looks like the break-interrupt on the serial console of the 
 system to enter DDB to me.
 Any idea to find out what interrupt routine it is???

 Where should I go on with debugging?
 Accedently the trace-back is not 100% the same in the interrupt-part 
 every time. (But it happens always during the splx(s) in ifq_enqueue() 
 and the other two processes also have the same backtrace.)

 The system has frosen at this point. If I try to reboot it from DDB it 
 gets lots of packet interrupts from the interfaces and freezes again 
 after all are processed. Break and reboot again will reboot the system.
 But I think this information is not relevant.

 Thanks in advance for help.


 By the way:
 The '/a' for the trace command is not documented in DDB manuals and the 
 online help of DDB !!!!!
 I've found it by a look into the sources while trying to get the 
 backtraces for the kernel-threads.
 Perhaps that should be changed ...
 Should I open a PR for this?

 Best regards

 W. Stukenbrock


 Wolfgang Stukenbrock wrote:

 > The following reply was made to PR kern/48733; it has been noted by GNATS.
 > 
 > From: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
 > To: gnats-bugs@NetBSD.org
 > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
 > Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
 > Date: Sun, 27 Apr 2014 10:21:03 +0200
 > 
 >  Hi, sorry for the delay, but I'm working off-side most time at the moment.
 >  
 >  I have to look onto the system to validate the assumption, that the USB 
 >  shares the interupt. (I remember that there are some interrupts shared.)
 >  But I can do this not earlyer as 02.05.2014 or later ...
 >  And lowering the spl-level does not imply that the interrupt has to be 
 >  shared with the wm-driver to run into this problem.
 >  
 >  The USB-interupt is not the only one I've seen.
 >  And I've analysed the stack backtrace that has dead-locked the system 
 >  and it locks up in the interrupt code while trying to get the KERNEL_LOCK!
 >  I will have a look at this again, but as far as I remember it definitly 
 >  locks up in the splx while trying to get the KERNEL_LOCK. splx does not 
 >  return anymore and I've got the interrupt source by translating the jump 
 >  address to a symbol that has not been reached anymore.
 >  (I've still not commented out the LOCK/UNLOCK prior calling the 
 >  output-routine to validate my analyses - no time till now, sorry - but I 
 >  will do it as soon as possible.)
 >  
 >  I will send the results of my review as soon as I've be able to do it.
 >  
 >  best regards
 >  
 >  W. Stukenbrock
 >  
 >  Manuel Bouyer wrote:
 >  
 >  > The following reply was made to PR kern/48733; it has been noted by GNATS.
 >  > 
 >  > From: Manuel Bouyer <bouyer@antioche.eu.org>
 >  > To: gnats-bugs@NetBSD.org
 >  > Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
 >  > Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
 >  > Date: Fri, 11 Apr 2014 17:59:09 +0200
 >  > 
 >  >  On Fri, Apr 11, 2014 at 02:40:00PM +0000, Wolfgang.Stukenbrock@nagler-company.com wrote:
 >  >  > >Description:
 >  >  > 	Problem located in /src/sys/netinet/ip_output.c.
 >  >  > 	Since file revision 1.208 the Kernel-Lock is locked prior calling if_output
 >  >  > 	on the interface.
 >  >  > 	Now - at least the wm-driver - will call splnet() and splx() inside the output
 >  >  > 	routine.
 >  >  > 	If any interrupt occurs in between splnet() and splx(), the interrupt is delayed and
 >  >  > 	is processes in splx() when the level is released again.
 >  >  > 	If such an interrupt is e.g. not MP-SAFE, the call stup in intr_biglock_wrapper() is
 >  >  > 	used to call the interrupt routine and that one will lock the KERNEL-LOCK again.
 >  >  > 	So we try to lock it again here -> dead-lock.
 >  >  > 
 >  >  > 	Our system runs fine with 4 8257x interfaces, but after adding 2 additional 8254x
 >  >  > 	interfaces, the system lock-up after a short time. Don't ask me, why the if_output
 >  >  > 	call takes "to long" with theese two additonal interfaces, but it is reproducable.
 >  >  > 	I've analysed this several times with DDB. Most times I've seen an USB-interrupt
 >  >  > 	that dead-lock the system.
 >  >  
 >  >  I think your analsys is wrong. the KERNEL_LOCK is special in the sense that
 >  >  it can be locked multiple time on the same CPU. So it's not a problem
 >  >  that splx() on the same CPU tries to get KERNEL_LOCK again, it will just
 >  >  increase the lock count. A splx() on another CPU will wait for the
 >  >  KERNEL_LOCK to be relased.
 >  >  
 >  >  I think your problem is more likely in the USB stack.
 >  >  Maybe one of your new ethernet interface shares an interrupt with the
 >  >  USB controller ?
 >  >  
 >  >  
 >  >  > >How-To-Repeat:
 >  >  > 	Run a lot of trafic over wm-interfaces and do shomething e.g. on USB at the same
 >  >  > 	time. It is just a question of time till system-dead-lock.
 >  >  > >Fix:
 >  >  > 	Fist guess: revert change done from 1.207 to 1.208.
 >  >  > 	But I've no idea about side effects.
 >  >  
 >  >  Very bad: the output queues are protected by the KERNEL_LOCK and splnet().
 >  >  If you revert ip_output 1.208, you'll also have to revert ip_input.c
 >  >  1.286 and 1.285, so that the whole IP stack runs under the KERNEL_LOCK again.
 >  >  
 >  >  -- 
 >  >  Manuel Bouyer <bouyer@antioche.eu.org>
 >  >       NetBSD: 26 ans d'experience feront toujours la difference
 >  >  --
 >  >  
 >  > 
 >  > 
 >  > Received: from DBXPR07MB317.eurprd07.prod.outlook.com (10.141.12.139) by
 >  >  DBXPR07MB319.eurprd07.prod.outlook.com (10.141.12.141) with Microsoft SMTP
 >  >  Server (TLS) id 15.0.918.8 via Mailbox Transport; Fri, 11 Apr 2014 16:00:11
 >  >  +0000
 >  > Received: from DBXPR07CA001.eurprd07.prod.outlook.com (10.255.191.159) by
 >  >  DBXPR07MB317.eurprd07.prod.outlook.com (10.141.12.139) with Microsoft SMTP
 >  >  Server (TLS) id 15.0.918.8; Fri, 11 Apr 2014 16:00:10 +0000
 >  > Received: from DB3FFO11FD011.protection.gbl (2a01:111:f400:7e04::177) by
 >  >  DBXPR07CA001.outlook.office365.com (2a01:111:e400:9800::31) with Microsoft
 >  >  SMTP Server (TLS) id 15.0.918.8 via Frontend Transport; Fri, 11 Apr 2014
 >  >  16:00:09 +0000
 >  > Received: from e002.nagler-company.com (212.185.86.227) by
 >  >  DB3FFO11FD011.mail.protection.outlook.com (10.47.216.167) with Microsoft SMTP
 >  >  Server (TLS) id 15.0.918.6 via Frontend Transport; Fri, 11 Apr 2014 16:00:08
 >  >  +0000
 >  > Received: from mollari.NetBSD.org (mollari.netbsd.org [149.20.53.80])
 >  > 	by e002.nagler-company.com (8.14.7/8.14.7) with ESMTP id s3BG0378005931
 >  > 	for <Wolfgang.Stukenbrock@nagler-company.com>; Fri, 11 Apr 2014 18:00:06 +0200 (CEST)
 >  > Received: by mollari.NetBSD.org (Postfix, from userid 31008)
 >  > 	id C23A5A5828; Fri, 11 Apr 2014 16:00:01 +0000 (UTC)
 >  > From: Manuel Bouyer <bouyer@antioche.eu.org>
 >  > To: <kern-bug-people@netbsd.org>, <gnats-admin@netbsd.org>,
 >  > 	<netbsd-bugs@netbsd.org>, <Wolfgang.Stukenbrock@nagler-company.com>
 >  > Reply-To: <gnats-bugs@NetBSD.org>
 >  > Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
 >  > References: <pr-kern-48733@gnats.netbsd.org>
 >  >   <20140411131311.74AF4123B93@test-s0.nagler-company.com>
 >  > X-Gnats-Was-Stupid: no
 >  > CC:
 >  > Message-ID: <20140411160001.C23A5A5828@mollari.NetBSD.org>
 >  > Date: Fri, 11 Apr 2014 16:00:01 +0000
 >  > Return-Path: gnats@NetBSD.org
 >  > X-EOPAttributedMessage: 0
 >  > X-MS-Exchange-Organization-MessageDirectionality: Incoming
 >  > X-Forefront-Antispam-Report: CIP:212.185.86.227;CTRY:DE;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(979002)(6009001)(428001)(50944004)(51704005)(24454002)(199002)(189002)(16796002)(46386002)(74502001)(70486001)(50466002)(87836001)(46102001)(90896003)(77982001)(52956003)(54356999)(50986999)(76176999)(53806999)(81542001)(81342001)(74662001)(45336002)(43066001)(83072002)(85852003)(6806004)(19580395003)(83322001)(42186004)(44976005)(80976001)(19580405001)(76482001)(99396002)(4396001)(80022001)(33656001)(20776003)(47776003)(48376002)(2201001)(79102001)(92726001)(42882001)(90966001)(969003)(989001)(999001)(1009001)(1019001);DIR:INB;SFP:;SCL:1;SRVR:DBXPR07MB317;H:e002.nagler-company.com;FPR:BF74F31D.9C06D725.B1F32CB3.4CE95053.203B9;PTR:e002.nagler-company.com;A:1;MX:1;LANG:en;
 >  > Content-Type: text/plain
 >  > X-MS-Exchange-Organization-Network-Message-Id: ae18e11e-2fa8-44eb-8125-08d123b63ff9
 >  > X-MS-Exchange-Organization-AVStamp-Service: 1.0
 >  > Received-SPF: None (: NetBSD.org does not designate permitted sender hosts)
 >  > X-MS-Exchange-Organization-SCL: 1
 >  > X-MS-Exchange-Organization-AuthSource: DB3FFO11FD011.protection.gbl
 >  > X-MS-Exchange-Organization-AuthAs: Anonymous
 >  > MIME-Version: 1.0
 >  > 
 >  > The following reply was made to PR kern/48733; it has been noted by GNATS.
 >  > 
 >  > From: Manuel Bouyer <bouyer@antioche.eu.org>
 >  > To: gnats-bugs@NetBSD.org
 >  > Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
 >  > Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
 >  > Date: Fri, 11 Apr 2014 17:59:09 +0200
 >  > 
 >  >  On Fri, Apr 11, 2014 at 02:40:00PM +0000, Wolfgang.Stukenbrock@nagler-company.com wrote:
 >  >  > >Description:
 >  >  > 	Problem located in /src/sys/netinet/ip_output.c.
 >  >  > 	Since file revision 1.208 the Kernel-Lock is locked prior calling if_output
 >  >  > 	on the interface.
 >  >  > 	Now - at least the wm-driver - will call splnet() and splx() inside the output
 >  >  > 	routine.
 >  >  > 	If any interrupt occurs in between splnet() and splx(), the interrupt is delayed and
 >  >  > 	is processes in splx() when the level is released again.
 >  >  > 	If such an interrupt is e.g. not MP-SAFE, the call stup in intr_biglock_wrapper() is
 >  >  > 	used to call the interrupt routine and that one will lock the KERNEL-LOCK again.
 >  >  > 	So we try to lock it again here -> dead-lock.
 >  >  > 
 >  >  > 	Our system runs fine with 4 8257x interfaces, but after adding 2 additional 8254x
 >  >  > 	interfaces, the system lock-up after a short time. Don't ask me, why the if_output
 >  >  > 	call takes "to long" with theese two additonal interfaces, but it is reproducable.
 >  >  > 	I've analysed this several times with DDB. Most times I've seen an USB-interrupt
 >  >  > 	that dead-lock the system.
 >  >  
 >  >  I think your analsys is wrong. the KERNEL_LOCK is special in the sense that
 >  >  it can be locked multiple time on the same CPU. So it's not a problem
 >  >  that splx() on the same CPU tries to get KERNEL_LOCK again, it will just
 >  >  increase the lock count. A splx() on another CPU will wait for the
 >  >  KERNEL_LOCK to be relased.
 >  >  
 >  >  I think your problem is more likely in the USB stack.
 >  >  Maybe one of your new ethernet interface shares an interrupt with the
 >  >  USB controller ?
 >  >  
 >  >  
 >  >  > >How-To-Repeat:
 >  >  > 	Run a lot of trafic over wm-interfaces and do shomething e.g. on USB at the same
 >  >  > 	time. It is just a question of time till system-dead-lock.
 >  >  > >Fix:
 >  >  > 	Fist guess: revert change done from 1.207 to 1.208.
 >  >  > 	But I've no idea about side effects.
 >  >  
 >  >  Very bad: the output queues are protected by the KERNEL_LOCK and splnet().
 >  >  If you revert ip_output 1.208, you'll also have to revert ip_input.c
 >  >  1.286 and 1.285, so that the whole IP stack runs under the KERNEL_LOCK again.
 >  >  
 >  >  -- 
 >  >  Manuel Bouyer <bouyer@antioche.eu.org>
 >  >       NetBSD: 26 ans d'experience feront toujours la difference
 >  >  --
 >  >  
 >  > 
 >  
 >  
 >  -- 
 >  
 >  
 >  Dr. Nagler & Company GmbH
 >  Hauptstraße 9
 >  92253 Schnaittenbach
 >  
 >  Tel. +49 9622/71 97-42
 >  Fax +49 9622/71 97-50
 >  
 >  Wolfgang.Stukenbrock@nagler-company.com
 >  http://www.nagler-company.com
 >  
 >  
 >  Hauptsitz: Schnaittenbach
 >  Handelregister: Amberg HRB
 >  Gerichtsstand: Amberg
 >  Steuernummer: 201/118/51825
 >  USt.-ID-Nummer: DE 273143997
 >  Geschäftsführer: Dr. Martin Nagler
 >  
 >  
 > 


 -- 


 Dr. Nagler & Company GmbH
 Hauptstraße 9
 92253 Schnaittenbach

 Tel. +49 9622/71 97-42
 Fax +49 9622/71 97-50

 Wolfgang.Stukenbrock@nagler-company.com
 http://www.nagler-company.com


 Hauptsitz: Schnaittenbach
 Handelregister: Amberg HRB
 Gerichtsstand: Amberg
 Steuernummer: 201/118/51825
 USt.-ID-Nummer: DE 273143997
 Geschäftsführer: Dr. Martin Nagler


From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org,
        Wolfgang.Stukenbrock@nagler-company.com
Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
Date: Mon, 12 May 2014 14:23:46 +0200

 On Fri, May 09, 2014 at 12:20:00PM +0000, Wolfgang Stukenbrock wrote:
 >  [...]
 >  db{0}> bt/a fffffe822f73f420
 >  trace: pid 0 lid 3 at 0xfffffe810e967760
 >  ether_output() at netbsd:ether_output+0x2b6
 >  ip_output() at netbsd:ip_output+0xa8f
 >  tcp_output() at netbsd:tcp_output+0x1698
 >  tcp_input() at netbsd:tcp_input+0x15d9
 >  ip_input() at netbsd:ip_input+0x3ef
 >  ipintr() at netbsd:ipintr+0x109
 >  softint_dispatch() at netbsd:softint_dispatch+0xd9
 >  DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e967d70
 >  Xsoftintr() at netbsd:Xsoftintr+0x4f
 >  --- interrupt ---
 >  0:
 >  
 >  That is the part that is gooing to send a packet. I see the printout in 
 >  ip_output prior calling 'ifp->if_output()' - not the one behind.
 >  The location pointed to by the backtrace in ether_output() is the call 
 >  to "return ifq_enqueue(...)". I also see the printout I've added in 
 >  front of this call, but not the one behind.
 >  In ifq_enqueue() I see the output of the call to 'ifp->if_start' - the 
 >  wm-driver - in this routine and the printout in front of the splx(s) at 
 >  the end of the routine - not the printout behind it.
 >  This is the localtion where the deadlock happens while processing other 
 >  interrupts in Xspllower.
 >  This always looks the same ....

 ether_output() is called with the KERNEL_LOCK held, so at this point cpu0
 already owns KERNEL_LOCK, it won't spin trying to grab it again.
 You can confirm this by printing curcpu()->ci_biglock_count.
 Did you try a kernel with options LOCKDEBUG ?

 What's possible here is a loop trying to process the same interrupt
 forever.

 >  
 >  
 >  
 >  db{0}> bt/a fffffe822f736440
 >  trace: pid 0 lid 6 at 0xfffffe810e9739c8
 >  breakpoint() at netbsd:breakpoint+0x5
 >  comintr() at netbsd:comintr+0x518
 >  Xintr_ioapic_edge1() at netbsd:Xintr_ioapic_edge1+0xea
 >  --- interrupt ---
 >  bus_space_read_4() at netbsd:bus_space_read_4+0xa
 >  intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x3b
 >  Xintr_ioapic_level6() at netbsd:Xintr_ioapic_level6+0xf2
 >  --- interrupt ---
 >  Xspllower() at netbsd:Xspllower+0xe
 >  DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e973d70
 >  Xsoftintr() at netbsd:Xsoftintr+0x4f
 >  --- interrupt ---
 >  0:
 >  
 >  
 >  Hmmm - not shure about it ...
 >  It looks like that during processing one pending interrupt in Xspllower 
 >  at the end of that routine an interrupt came im that takes the 
 >  KERNEL_LOCK in intr_biglock_wrapper() again and do what? Hangup in 
 >  bus_space_read_4() ???? Busy-loop for whatever reason in that interrupt 
 >  and the location where the DDB-enter occures in bus_space_read_4() is 
 >  just random ????
 >  The comintr looks like the break-interrupt on the serial console of the 
 >  system to enter DDB to me.

 it is.

 >  Any idea to find out what interrupt routine it is???

 dmesg could point to the problem; the interrupt we're looking for is
 level-triggered on pin 6 (so maybe "irq 6")

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
Date: Fri, 16 May 2014 10:20:53 +0200

 Hi again,

 I've already tried LOCK_DEBUG - no change in behaviour and I also don't 
 see additional usable information. (But now I'm nearly shure, it is not 
 the KERNEL_LOCK - see below ...)

 I've still not added the requested curcpu()->ci_biglock_count print, but 
 I've searched for the irq 6 or something like this.
 The "surprising" result:

 s011a# grep irq /var/run/dmesg.boot
 pckbc0 at acpi0 (KBC0, PNP0303) (kbd port): io 0x60,0x64 irq 1
 pckbc1 at acpi0 (MSE0, PNP0F13) (aux port): irq 12
 com0 at acpi0 (COM1, PNP0501-1): io 0x3f8-0x3ff irq 4
 com1 at acpi0 (COM2, PNP0501-2): io 0x2f8-0x2ff irq 3
 pckbc0: using irq 1 for kbd slot
 s011a# grep interrupt !$
 grep interrupt /var/run/dmesg.boot
 acpi0: SCI interrupting at int 9
 wm0: interrupting at ioapic1 pin 0
 wm1: interrupting at ioapic1 pin 1
 Intel product 0x0326 (interrupt system, interface 0x20, revision 0x09) 
 at pci1 dev 0 function 1 not configured
 uhci0: interrupting at ioapic0 pin 16
 uhci1: interrupting at ioapic0 pin 17
 uhci2: interrupting at ioapic0 pin 18
 ehci0: interrupting at ioapic0 pin 18
 wm2: interrupting at ioapic0 pin 16
 wm3: interrupting at ioapic0 pin 17
 wm4: interrupting at ioapic0 pin 18
 wm5: interrupting at ioapic0 pin 19
 uhci3: interrupting at ioapic0 pin 23
 uhci4: interrupting at ioapic0 pin 22
 uhci5: interrupting at ioapic0 pin 21
 ehci1: interrupting at ioapic0 pin 23
 pciide0: using ioapic0 pin 23 for native-PCI interrupt
 piixide0: primary channel interrupting at ioapic0 pin 14
 piixide0: secondary channel interrupting at ioapic0 pin 15
 ichsmb0: interrupting at ioapic0 pin 17
 piixide1: using ioapic0 pin 18 for native-PCI interrupt
 timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
 s011a# grep ioapic !$
 grep ioapic /var/run/dmesg.boot
 ioapic0 at mainbus0 apid 2: pa 0xfec00000, version 20, 24 pins
 ioapic1 at mainbus0 apid 3: pa 0xfecc0000, version 20, 24 pins
 wm0: interrupting at ioapic1 pin 0
 wm1: interrupting at ioapic1 pin 1
 uhci0: interrupting at ioapic0 pin 16
 uhci1: interrupting at ioapic0 pin 17
 uhci2: interrupting at ioapic0 pin 18
 ehci0: interrupting at ioapic0 pin 18
 wm2: interrupting at ioapic0 pin 16
 wm3: interrupting at ioapic0 pin 17
 wm4: interrupting at ioapic0 pin 18
 wm5: interrupting at ioapic0 pin 19
 uhci3: interrupting at ioapic0 pin 23
 uhci4: interrupting at ioapic0 pin 22
 uhci5: interrupting at ioapic0 pin 21
 ehci1: interrupting at ioapic0 pin 23
 pciide0: using ioapic0 pin 23 for native-PCI interrupt
 piixide0: primary channel interrupting at ioapic0 pin 14
 piixide0: secondary channel interrupting at ioapic0 pin 15
 ichsmb0: interrupting at ioapic0 pin 17
 piixide1: using ioapic0 pin 18 for native-PCI interrupt




 Hmmm - no irq6 or pin 6 !?!?!?!
 The first greating line is in /var/run/dmesg.boot, so the result is 
 "complete".
 The board is reported as 'Supermicro X7SBi-LN4 (0123456789)'.
 The two "additonal" interfaces (wm0 and wm1) are on ioapic1 - no other 
 HW is on that APIC.



 The search for not-configured devices say:
 s011a# grep confi !$
 grep confi /var/run/dmesg.boot
 MBRD (PNP0C02) at acpi0 not configured
 FWH (INT0800) at acpi0 not configured
 pci0 at mainbus0 bus 0: configuration mode 1
 Intel product 0x0326 (interrupt system, interface 0x20, revision 0x09) 
 at pci1 dev 0 function 1 not configured
 drm at vga0 not configured
 ichlpcib0: TCO (watchdog) timer configured.
 piixide0: primary channel configured to compatibility mode
 piixide0: secondary channel configured to compatibility mode
 Intel 82801I Thermal Controller (miscellaneous DASP, revision 0x02) at 
 pci0 dev 31 function 6 not configured


 Not really a lot - other systems have more unconfigured devices ....
 Normaly I'm testing all available drivers on a "new" HW, so I'm nearly 
 100% shure that theese devices are not covered by any driver.

 After that research it looks like there is something "suddenly" 
 interrupting without a driver for it.
 The time it take till "freeze" varies, so a "normal timer timeout" does 
 not sound reasonable.
 And why does this occure only after installing two additional interfaces 
 (wm0 and wm1) that use the second APIC? The system does not show the 
 problem without them.

 pci0 at mainbus0 bus 0: configuration mode 1
 pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
 pchb0 at pci0 dev 0 function 0: Intel product 0x29f0 (rev. 0x01)
 ppb0 at pci0 dev 1 function 0: Intel product 0x29f1 (rev. 0x01)
 ppb0: PCI Express 2.0 <Root Port of PCI-E Root Complex>
 pci1 at ppb0 bus 1
 pci1: i/o space, memory space enabled, rd/line, wr/inv ok
 ppb1 at pci1 dev 0 function 0: Intel product 0x032c (rev. 0x09)
 ppb1: PCI Express 1.0 <PCI-E to PCI/PCI-X Bridge>
 pci2 at ppb1 bus 2
 pci2: i/o space, memory space enabled, rd/line, wr/inv ok
 wm0 at pci2 dev 1 function 0: Intel i82546GB 1000BASE-T Ethernet (rev. 0x03)
 wm0: interrupting at ioapic1 pin 0
 wm0: 64-bit 133MHz PCIX bus
 wm0: NVM dump:
 1b00 0621 2a12 0430 ffff ffff ffff ffff
 c414 2104 460b 1179 8086 1079 8086 34e8
 000c 1079 0000 2102 10c8 ffff ffff ffff
 ffff ffff ffff ffff ffff ffff ffff ffff
 c30c ff63 5004 2102 00c8 ffff ffff ffff
 ffff ffff ffff ffff ffff ffff ffff 0602
 ffff ffff ffff ffff ffff ffff ffff ffff
 ffff ffff ffff ffff ffff ffff ffff 6be8
 wm0: 256 word (8 address bits) MicroWire EEPROM
 wm0: Ethernet address 00:1b:21:06:12:2a
 makphy0 at wm0 phy 1: Marvell 88E1011 Gigabit PHY, rev. 5
 makphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
 1000baseT-FDX, auto
 wm1 at pci2 dev 1 function 1: Intel i82546GB 1000BASE-T Ethernet (rev. 0x03)
 wm1: interrupting at ioapic1 pin 1
 wm1: 64-bit 133MHz PCIX bus
 wm1: NVM dump:
 1b00 0621 2a12 0430 ffff ffff ffff ffff
 c414 2104 460b 1179 8086 1079 8086 34e8
 000c 1079 0000 2102 10c8 ffff ffff ffff
 ffff ffff ffff ffff ffff ffff ffff ffff
 c30c ff63 5004 2102 00c8 ffff ffff ffff
 ffff ffff ffff ffff ffff ffff ffff 0602
 ffff ffff ffff ffff ffff ffff ffff ffff
 ffff ffff ffff ffff ffff ffff ffff 6be8
 wm1: 256 word (8 address bits) MicroWire EEPROM
 wm1: Ethernet address 00:1b:21:06:12:2b
 makphy1 at wm1 phy 1: Marvell 88E1011 Gigabit PHY, rev. 5
 makphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
 1000baseT-FDX, auto


 Question: Are unused interrupts enabled? And if "yes", why?
 I thought I've seen some code that will print a warning about this and 
 tries to reset the interrupt, but I'm not 100% shure about the 
 netbsd-version I've looked at.

 Any hints to go on with debugging?
 It would take me a lot of time to "understand" the interrupt setup on 
 the amd64 architecture ...

 best regards

 W. Stukenbrock


 Manuel Bouyer wrote:

 > The following reply was made to PR kern/48733; it has been noted by GNATS.
 > 
 > From: Manuel Bouyer <bouyer@antioche.eu.org>
 > To: gnats-bugs@NetBSD.org
 > Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org,
 >         Wolfgang.Stukenbrock@nagler-company.com
 > Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
 > Date: Mon, 12 May 2014 14:23:46 +0200
 > 
 >  On Fri, May 09, 2014 at 12:20:00PM +0000, Wolfgang Stukenbrock wrote:
 >  >  [...]
 >  >  db{0}> bt/a fffffe822f73f420
 >  >  trace: pid 0 lid 3 at 0xfffffe810e967760
 >  >  ether_output() at netbsd:ether_output+0x2b6
 >  >  ip_output() at netbsd:ip_output+0xa8f
 >  >  tcp_output() at netbsd:tcp_output+0x1698
 >  >  tcp_input() at netbsd:tcp_input+0x15d9
 >  >  ip_input() at netbsd:ip_input+0x3ef
 >  >  ipintr() at netbsd:ipintr+0x109
 >  >  softint_dispatch() at netbsd:softint_dispatch+0xd9
 >  >  DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e967d70
 >  >  Xsoftintr() at netbsd:Xsoftintr+0x4f
 >  >  --- interrupt ---
 >  >  0:
 >  >  
 >  >  That is the part that is gooing to send a packet. I see the printout in 
 >  >  ip_output prior calling 'ifp->if_output()' - not the one behind.
 >  >  The location pointed to by the backtrace in ether_output() is the call 
 >  >  to "return ifq_enqueue(...)". I also see the printout I've added in 
 >  >  front of this call, but not the one behind.
 >  >  In ifq_enqueue() I see the output of the call to 'ifp->if_start' - the 
 >  >  wm-driver - in this routine and the printout in front of the splx(s) at 
 >  >  the end of the routine - not the printout behind it.
 >  >  This is the localtion where the deadlock happens while processing other 
 >  >  interrupts in Xspllower.
 >  >  This always looks the same ....
 >  
 >  ether_output() is called with the KERNEL_LOCK held, so at this point cpu0
 >  already owns KERNEL_LOCK, it won't spin trying to grab it again.
 >  You can confirm this by printing curcpu()->ci_biglock_count.
 >  Did you try a kernel with options LOCKDEBUG ?
 >  
 >  What's possible here is a loop trying to process the same interrupt
 >  forever.
 >  
 >  >  
 >  >  
 >  >  
 >  >  db{0}> bt/a fffffe822f736440
 >  >  trace: pid 0 lid 6 at 0xfffffe810e9739c8
 >  >  breakpoint() at netbsd:breakpoint+0x5
 >  >  comintr() at netbsd:comintr+0x518
 >  >  Xintr_ioapic_edge1() at netbsd:Xintr_ioapic_edge1+0xea
 >  >  --- interrupt ---
 >  >  bus_space_read_4() at netbsd:bus_space_read_4+0xa
 >  >  intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x3b
 >  >  Xintr_ioapic_level6() at netbsd:Xintr_ioapic_level6+0xf2
 >  >  --- interrupt ---
 >  >  Xspllower() at netbsd:Xspllower+0xe
 >  >  DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e973d70
 >  >  Xsoftintr() at netbsd:Xsoftintr+0x4f
 >  >  --- interrupt ---
 >  >  0:
 >  >  
 >  >  
 >  >  Hmmm - not shure about it ...
 >  >  It looks like that during processing one pending interrupt in Xspllower 
 >  >  at the end of that routine an interrupt came im that takes the 
 >  >  KERNEL_LOCK in intr_biglock_wrapper() again and do what? Hangup in 
 >  >  bus_space_read_4() ???? Busy-loop for whatever reason in that interrupt 
 >  >  and the location where the DDB-enter occures in bus_space_read_4() is 
 >  >  just random ????
 >  >  The comintr looks like the break-interrupt on the serial console of the 
 >  >  system to enter DDB to me.
 >  
 >  it is.
 >  
 >  >  Any idea to find out what interrupt routine it is???
 >  
 >  dmesg could point to the problem; the interrupt we're looking for is
 >  level-triggered on pin 6 (so maybe "irq 6")
 >  
 >  -- 
 >  Manuel Bouyer <bouyer@antioche.eu.org>
 >       NetBSD: 26 ans d'experience feront toujours la difference
 >  --
 >  
 > 


 -- 


 Dr. Nagler & Company GmbH
 Hauptstraße 9
 92253 Schnaittenbach

 Tel. +49 9622/71 97-42
 Fax +49 9622/71 97-50

 Wolfgang.Stukenbrock@nagler-company.com
 http://www.nagler-company.com


 Hauptsitz: Schnaittenbach
 Handelregister: Amberg HRB
 Gerichtsstand: Amberg
 Steuernummer: 201/118/51825
 USt.-ID-Nummer: DE 273143997
 Geschäftsführer: Dr. Martin Nagler


From: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
Date: Fri, 16 May 2014 12:58:17 +0200

 Hi again,

 I've found a "hidden" option "MPVERBOSE" and enabled it.
 Why the hell are some options missing in the GENERIC (or ALL - if 
 present) config file?
 What is the expected way to find them ????

 Now I've some additional information about the ioapic setup.
 Full output of dmesg.boot until interrupts are enabled follow:

 s011a# cat /var/run/dmesg.boot
 Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
      2006, 2007, 2008, 2009, 2010, 2011, 2012
      The NetBSD Foundation, Inc.  All rights reserved.
 Copyright (c) 1982, 1986, 1989, 1991, 1993
      The Regents of the University of California.  All rights reserved.

 NetBSD 6.1 (NSW-S011) #0: Fri May 16 12:33:18 CEST 2014
          root@s011a:/usr/src/sys/arch/amd64/compile/NSW-S011
 total memory = 8189 MB
 avail memory = 7945 MB
 timecounter: Timecounters tick every 10.000 msec
 timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
 Supermicro X7SBi-LN4 (0123456789)
 mainbus0 (root)
 mainbus0: scanning 0x9dc00 to 0x9dff0 for MP signature
 mainbus0: scanning 0x9d800 to 0x9dbf0 for MP signature
 mainbus0: scanning 0xf0000 to 0xffff0 for MP signature
 mainbus0: MP floating pointer found in bios at 0xf6570
 mainbus0: MP config table at 0x9e0a1, 484 bytes long
 cpu0 at mainbus0 apid 0cpu0: prelint0 
 0x700<vector=0x0,delmode=0x7,dest=0x0> 0x0<target=0x0>
 cpu0: prelint1 0x400<vector=0x0,delmode=0x4,dest=0x0> 0x0<target=0x0>
 cpu0: timer0 0x10000<vector=0x0,delmode=0x0,masked,dest=0x0> 0x0<target=0x0>
 cpu0: pcint0 0x10000<vector=0x0,delmode=0x0,masked,dest=0x0> 0x0<target=0x0>
 cpu0: lint0 0x700<vector=0x0,delmode=0x7,dest=0x0> 0x0<target=0x0>
 cpu0: lint1 0x400<vector=0x0,delmode=0x4,dest=0x0> 0x0<target=0x0>
 cpu0: err0 0x10000<vector=0x0,delmode=0x0,masked,dest=0x0> 0x0<target=0x0>
 : Intel(R) Xeon(R) CPU           L3110  @ 3.00GHz, id 0x1067a
 cpu0: idle lwp at 0xfffffe822f746840, idle sp at 0xfffffe810e967c50
 cpu1 at mainbus0 apid 1: Intel(R) Xeon(R) CPU           L3110  @ 
 3.00GHz, id 0x1067a
 cpu1: idle lwp at 0xfffffe810e9b78e0, idle sp at 0xfffffe810e994c50
 ioapic0 at mainbus0 apid 2: pa 0xfec00000, virtual wire mode, version 
 20, 24 pins
 ioapic1 at mainbus0 apid 3: pa 0xfecc0000, virtual wire mode, version 
 20, 24 pins
 acpi0 at mainbus0: Intel ACPICA 20110623
 acpi0: X/RSDT: OemId <PTLTD ,    XSDT  ,06040000>, AslId < LTP,00000000>
 ioapic0: int9 0x8060<vector=0x60,delmode=0x0,level,dest=0x0> 0x0<target=0x0>
 mpacpi: found root PCI bus 0
 mpacpi: found subordinate bus 0
 mpacpi: found subordinate bus 0
 mpacpi: found subordinate bus 0
 mpacpi: found subordinate bus 0
 mpacpi: found subordinate bus 0
 mpacpi: found subordinate bus 0
 mpacpi: found subordinate bus 0
 mpacpi: 8 PCI busses
 mpacpi: ISA interrupt override 0 -> 2 (1/1)
 mpacpi: ISA interrupt override 9 -> 9 (1/3)
 mpacpi: configuring PCI bus 0 int routing
 mpacpi: configuring PCI bus 1 int routing
 mpacpi: configuring PCI bus 2 int routing
 mpacpi: configuring PCI bus 5 int routing
 mpacpi: configuring PCI bus 7 int routing
 mpacpi: configuring PCI bus 9 int routing
 mpacpi: configuring PCI bus 11 int routing
 mpacpi: configuring PCI bus 17 int routing
 ioapic0: pin 2 attached to isa0 irq 0 (type 0x0<type=0x0> flags 
 0x5<pol=0x1=Act Hi,trig=0x1=Edge>)
 ioapic0: pin 1 attached to isa0 irq 1 (type 0x0<type=0x0> flags 
 0x0<pol=0x0,trig=0x0>)
 ioapic0: pin 3 attached to isa0 irq 3 (type 0x0<type=0x0> flags 
 0x0<pol=0x0,trig=0x0>)^M
 ioapic0: pin 4 attached to isa0 irq 4 (type 0x0<type=0x0> flags 
 0x0<pol=0x0,trig=0x0>)
 ioapic0: pin 5 attached to isa0 irq 5 (type 0x0<type=0x0> flags 
 0x0<pol=0x0,trig=0x0>)
 ioapic0: pin 6 attached to isa0 irq 6 (type 0x0<type=0x0> flags 
 0x0<pol=0x0,trig=0x0>)
 ioapic0: pin 7 attached to isa0 irq 7 (type 0x0<type=0x0> flags 
 0x0<pol=0x0,trig=0x0>)
 ioapic0: pin 8 attached to isa0 irq 8 (type 0x0<type=0x0> flags 
 0x0<pol=0x0,trig=0x0>)
 ioapic0: pin 9 attached to isa0 irq 9 (type 0x0<type=0x0> flags 
 0xd<pol=0x1=Act Hi,trig=0x3=Level>)
 ioapic0: pin 10 attached to isa0 irq 10 (type 0x0<type=0x0> flags 
 0x0<pol=0x0,trig=0x0>)
 ioapic0: pin 11 attached to isa0 irq 11 (type 0x0<type=0x0> flags 
 0x0<pol=0x0,trig=0x0>)
 ioapic0: pin 12 attached to isa0 irq 12 (type 0x0<type=0x0> flags 
 0x0<pol=0x0,trig=0x0>)
 ioapic0: pin 13 attached to isa0 irq 13 (type 0x0<type=0x0> flags 
 0x0<pol=0x0,trig=0x0>)
 ioapic0: pin 14 attached to isa0 irq 14 (type 0x0<type=0x0> flags 
 0x0<pol=0x0,trig=0x0>)
 ioapic0: pin 15 attached to isa0 irq 15 (type 0x0<type=0x0> flags 
 0x0<pol=0x0,trig=0x0>)
 local apic: pin 1 attached to NMI (type 0x1<type=0x1=NMI> flags 
 0x0<pol=0x0,trig=0x0>)
 local apic: pin 1 attached to NMI (type 0x1<type=0x1=NMI> flags 
 0x0<pol=0x0,trig=0x0>)
 ioapic0: pin 16 attached to pci0 device 1 INT_A (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 16 attached to pci0 device 2 INT_A (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 16 attached to pci0 device 3 INT_A (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 17 attached to pci0 device 3 INT_B (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 18 attached to pci0 device 3 INT_C (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 19 attached to pci0 device 3 INT_D (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 16 attached to pci0 device 6 INT_A (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 16 attached to pci0 device 25 INT_A (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 16 attached to pci0 device 26 INT_A (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 17 attached to pci0 device 26 INT_B (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 18 attached to pci0 device 26 INT_C (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 16 attached to pci0 device 27 INT_A (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 16 attached to pci0 device 28 INT_A (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 17 attached to pci0 device 28 INT_B (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 18 attached to pci0 device 28 INT_C (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 19 attached to pci0 device 28 INT_D (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 23 attached to pci0 device 29 INT_A (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 22 attached to pci0 device 29 INT_B (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 21 attached to pci0 device 29 INT_C (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 19 attached to pci0 device 29 INT_D (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 16 attached to pci0 device 31 INT_A (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 17 attached to pci0 device 31 INT_B (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 18 attached to pci0 device 31 INT_C (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 19 attached to pci0 device 31 INT_D (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic1: pin 0 attached to pci2 device 1 INT_A (type 0x0<type=0x0> flags 
 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic1: pin 1 attached to pci2 device 1 INT_B (type 0x0<type=0x0> flags 
 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic1: pin 2 attached to pci2 device 1 INT_C (type 0x0<type=0x0> flags 
 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic1: pin 3 attached to pci2 device 1 INT_D (type 0x0<type=0x0> flags 
 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic1: pin 4 attached to pci2 device 2 INT_A (type 0x0<type=0x0> flags 
 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic1: pin 5 attached to pci2 device 2 INT_B (type 0x0<type=0x0> flags 
 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic1: pin 4 attached to pci2 device 2 INT_C (type 0x0<type=0x0> flags 
 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic1: pin 5 attached to pci2 device 2 INT_D (type 0x0<type=0x0> flags 
 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 16 attached to pci5 device 0 INT_A (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 17 attached to pci5 device 0 INT_B (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 18 attached to pci5 device 0 INT_C (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 19 attached to pci5 device 0 INT_D (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 17 attached to pci7 device 0 INT_A (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 18 attached to pci7 device 0 INT_B (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 19 attached to pci7 device 0 INT_C (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 16 attached to pci7 device 0 INT_D (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 18 attached to pci9 device 0 INT_A (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 19 attached to pci9 device 0 INT_B (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 16 attached to pci9 device 0 INT_C (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 17 attached to pci9 device 0 INT_D (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 19 attached to pci11 device 0 INT_A (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 16 attached to pci11 device 0 INT_B (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 17 attached to pci11 device 0 INT_C (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 18 attached to pci11 device 0 INT_D (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 20 attached to pci17 device 0 INT_A (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 21 attached to pci17 device 0 INT_B (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 22 attached to pci17 device 0 INT_C (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 23 attached to pci17 device 0 INT_D (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 22 attached to pci17 device 3 INT_A (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 ioapic0: pin 23 attached to pci17 device 4 INT_A (type 0x0<type=0x0> 
 flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
 acpi0: SCI interrupting at int 9
 timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
 hpet0 at acpi0: high precision event timer (mem 0xfed00000-0xfed00400)
 timecounter: Timecounter "hpet0" frequency 14318180 Hz quality 2000
 MBRD (PNP0C02) at acpi0 not configured
 pcppi0 at acpi0 (SPKR, PNP0800): io 0x61
 sysbeep0 at pcppi0
 attimer0 at acpi0 (TIMR, PNP0100): io 0x40-0x43,0x50-0x53
 FWH (INT0800) at acpi0 not configured
 pckbc0 at acpi0 (KBC0, PNP0303) (kbd port): io 0x60,0x64 irq 1
 pckbc1 at acpi0 (MSE0, PNP0F13) (aux port): irq 12
 com0 at acpi0 (COM1, PNP0501-1): io 0x3f8-0x3ff irq 4
 com0: ns16550a, working fifo
 com0: console
 ioapic0: int4 0x81<vector=0x81,delmode=0x0,dest=0x0> 0x0<target=0x0>
 com1 at acpi0 (COM2, PNP0501-2): io 0x2f8-0x2ff irq 3
 com1: ns16550a, working fifo
 ioapic0: int3 0x82<vector=0x82,delmode=0x0,dest=0x0> 0x0<target=0x0>
 acpibut0 at acpi0 (PWRB, PNP0C0C): ACPI Power Button
 apm0 at acpi0: Power Management spec V1.2
 attimer0: attached to pcppi0
 pckbd0 at pckbc0 (kbd slot)
 ioapic0: int1 0x61<vector=0x61,delmode=0x0,dest=0x0> 0x0<target=0x0>
 pckbc0: using irq 1 for kbd slot
 wskbd0 at pckbd0 (mux ignored)
 pci0 at mainbus0 bus 0: configuration mode 1
 mainbus0: added to list as bus 0
 pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
 pchb0 at pci0 dev 0 function 0: Intel product 0x29f0 (rev. 0x01)
 ppb0 at pci0 dev 1 function 0: Intel product 0x29f1 (rev. 0x01)
 ppb0: PCI Express 2.0 <Root Port of PCI-E Root Complex>
 pci1 at ppb0 bus 1
 ppb0: added to list as bus 1
 pci1: i/o space, memory space enabled, rd/line, wr/inv ok
 ppb1 at pci1 dev 0 function 0: Intel product 0x032c (rev. 0x09)
 ppb1: PCI Express 1.0 <PCI-E to PCI/PCI-X Bridge>
 pci2 at ppb1 bus 2
 ppb1: added to list as bus 2
 pci2: i/o space, memory space enabled, rd/line, wr/inv ok
 wm0 at pci2 dev 1 function 0: Intel i82546GB 1000BASE-T Ethernet (rev. 0x03)
 ioapic1: int0 0xa062<vector=0x62,delmode=0x0,actlo,level,dest=0x0> 
 0x0<target=0x0>
 wm0: interrupting at ioapic1 pin 0
 wm0: 64-bit 133MHz PCIX bus
 wm0: 256 word (8 address bits) MicroWire EEPROM
 wm0: Ethernet address 00:1b:21:06:12:2a
 makphy0 at wm0 phy 1: Marvell 88E1011 Gigabit PHY, rev. 5
 makphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
 1000baseT-FDX, auto
 wm1 at pci2 dev 1 function 1: Intel i82546GB 1000BASE-T Ethernet (rev. 0x03)
 ioapic1: int1 0xa063<vector=0x63,delmode=0x0,actlo,level,dest=0x0> 
 0x0<target=0x0>
 wm1: interrupting at ioapic1 pin 1
 wm1: 64-bit 133MHz PCIX bus
 wm1: 256 word (8 address bits) MicroWire EEPROM
 wm1: Ethernet address 00:1b:21:06:12:2b
 makphy1 at wm1 phy 1: Marvell 88E1011 Gigabit PHY, rev. 5
 makphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
 1000baseT-FDX, auto
 Intel product 0x0326 (interrupt system, interface 0x20, revision 0x09) 
 at pci1 dev 0 function 1 not configured
 uhci0 at pci0 dev 26 function 0: Intel 82801I USB UHCI Controller (rev. 
 0x02)
 ioapic0: int16 0xa064<vector=0x64,delmode=0x0,actlo,level,dest=0x0> 
 0x0<target=0x0>
 uhci0: interrupting at ioapic0 pin 16
 usb0 at uhci0: USB revision 1.0
 uhci1 at pci0 dev 26 function 1: Intel 82801I USB UHCI Controller (rev. 
 0x02)
 ioapic0: int17 0xa065<vector=0x65,delmode=0x0,actlo,level,dest=0x0> 
 0x0<target=0x0>
 uhci1: interrupting at ioapic0 pin 17
 usb1 at uhci1: USB revision 1.0
 uhci2 at pci0 dev 26 function 2: Intel 82801I USB UHCI Controller (rev. 
 0x02)
 ioapic0: int18 0xa066<vector=0x66,delmode=0x0,actlo,level,dest=0x0> 
 0x0<target=0x0>
 uhci2: interrupting at ioapic0 pin 18
 usb2 at uhci2: USB revision 1.0
 ehci0 at pci0 dev 26 function 7: Intel 82801I USB EHCI Controller (rev. 
 0x02)
 ioapic0: int18 0xa066<vector=0x66,delmode=0x0,actlo,level,dest=0x0> 
 0x0<target=0x0>
 ehci0: interrupting at ioapic0 pin 18
 ehci0: EHCI version 1.0
 ehci0: companion controllers, 2 ports each: uhci0 uhci1 uhci2
 usb3 at ehci0: USB revision 2.0
 ppb2 at pci0 dev 28 function 0: Intel 82801I PCI Express Port #1 (rev. 0x02)
 ppb2: PCI Express 1.0 <Root Port of PCI-E Root Complex>
 pci3 at ppb2 bus 5
 ppb2: added to list as bus 5
 pci3: i/o space, memory space enabled, rd/line, wr/inv ok
 wm2 at pci3 dev 0 function 0: Intel i82573E IAMT (rev. 0x03)
 ioapic0: int16 0xa064<vector=0x64,delmode=0x0,actlo,level,dest=0x0> 
 0x0<target=0x0>
 wm2: interrupting at ioapic0 pin 16
 wm2: PCI-Express bus
 wm2: 256 word (8 address bits) SPI EEPROM
 wm2: Ethernet address 00:30:48:fd:4f:8a
 makphy2 at wm2 phy 1: Marvell 88E1111 Gigabit PHY, rev. 2
 makphy2: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
 1000baseT-FDX, auto
 ppb3 at pci0 dev 28 function 1: Intel 82801I PCI Express Port #2 (rev. 0x02)
 ppb3: PCI Express 1.0 <Root Port of PCI-E Root Complex>
 pci4 at ppb3 bus 7
 ppb3: added to list as bus 7
 pci4: i/o space, memory space enabled, rd/line, wr/inv ok
 wm3 at pci4 dev 0 function 0: Intel i82573L Gigabit Ethernet (rev. 0x00)
 ioapic0: int17 0xa065<vector=0x65,delmode=0x0,actlo,level,dest=0x0> 
 0x0<target=0x0>
 wm3: interrupting at ioapic0 pin 17
 wm3: PCI-Express bus
 wm3: 256 word (8 address bits) SPI EEPROM
 wm3: Ethernet address 00:30:48:fd:4f:8b
 makphy3 at wm3 phy 1: Marvell 88E1111 Gigabit PHY, rev. 2
 makphy3: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
 1000baseT-FDX, auto
 ppb4 at pci0 dev 28 function 2: Intel 82801I PCI Express Port #3 (rev. 0x02)
 ppb4: PCI Express 1.0 <Root Port of PCI-E Root Complex>
 pci5 at ppb4 bus 9
 ppb4: added to list as bus 9
 pci5: i/o space, memory space enabled, rd/line, wr/inv ok
 wm4 at pci5 dev 0 function 0: Intel i82573L Gigabit Ethernet (rev. 0x00)
 ioapic0: int18 0xa066<vector=0x66,delmode=0x0,actlo,level,dest=0x0> 
 0x0<target=0x0>
 wm4: interrupting at ioapic0 pin 18
 wm4: PCI-Express bus
 wm4: 256 word (8 address bits) SPI EEPROM
 wm4: Ethernet address 00:30:48:fd:4f:8c
 makphy4 at wm4 phy 1: Marvell 88E1111 Gigabit PHY, rev. 2
 makphy4: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
 1000baseT-FDX, auto
 ppb5 at pci0 dev 28 function 3: Intel 82801I PCI Express Port #4 (rev. 0x02)
 ppb5: PCI Express 1.0 <Root Port of PCI-E Root Complex>
 pci6 at ppb5 bus 11
 ppb5: added to list as bus 11
 pci6: i/o space, memory space enabled, rd/line, wr/inv ok
 wm5 at pci6 dev 0 function 0: Intel i82573L Gigabit Ethernet (rev. 0x00)
 ioapic0: int19 0xa067<vector=0x67,delmode=0x0,actlo,level,dest=0x0> 
 0x0<target=0x0>
 wm5: interrupting at ioapic0 pin 19
 wm5: PCI-Express bus
 wm5: 256 word (8 address bits) SPI EEPROM
 wm5: Ethernet address 00:30:48:fd:4f:8d
 makphy5 at wm5 phy 1: Marvell 88E1111 Gigabit PHY, rev. 2
 makphy5: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
 1000baseT-FDX, auto
 uhci3 at pci0 dev 29 function 0: Intel 82801I USB UHCI Controller (rev. 
 0x02)
 ioapic0: int23 0xa068<vector=0x68,delmode=0x0,actlo,level,dest=0x0> 
 0x0<target=0x0>
 uhci3: interrupting at ioapic0 pin 23
 usb4 at uhci3: USB revision 1.0
 uhci4 at pci0 dev 29 function 1: Intel 82801I USB UHCI Controller (rev. 
 0x02)
 ioapic0: int22 0xa069<vector=0x69,delmode=0x0,actlo,level,dest=0x0> 
 0x0<target=0x0>
 uhci4: interrupting at ioapic0 pin 22
 usb5 at uhci4: USB revision 1.0
 uhci5 at pci0 dev 29 function 2: Intel 82801I USB UHCI Controller (rev. 
 0x02)
 ioapic0: int21 0xa06a<vector=0x6a,delmode=0x0,actlo,level,dest=0x0> 
 0x0<target=0x0>
 uhci5: interrupting at ioapic0 pin 21
 usb6 at uhci5: USB revision 1.0
 ehci1 at pci0 dev 29 function 7: Intel 82801I USB EHCI Controller (rev. 
 0x02)
 ioapic0: int23 0xa068<vector=0x68,delmode=0x0,actlo,level,dest=0x0> 
 0x0<target=0x0>
 ehci1: interrupting at ioapic0 pin 23
 ehci1: EHCI version 1.0
 ehci1: companion controllers, 2 ports each: uhci3 uhci4 uhci5
 usb7 at ehci1: USB revision 2.0
 ppb6 at pci0 dev 30 function 0: Intel 82801BA Hub-PCI Bridge (rev. 0x92)
 pci7 at ppb6 bus 17
 ppb6: added to list as bus 17
 pci7: i/o space, memory space enabled
 vga0 at pci7 dev 3 function 0: ATI Technologies ES1000 (rev. 0x02)
 wsdisplay0 at vga0 (kbdmux ignored)
 drm at vga0 not configured
 pciide0 at pci7 dev 4 function 0: Integrated Technology Express product 
 0x8213 (rev. 0x00)
 pciide0: bus-master DMA support present, but unused (no driver support)
 pciide0: primary channel wired to native-PCI mode
 ioapic0: int23 0xa068<vector=0x68,delmode=0x0,actlo,level,dest=0x0> 
 0x0<target=0x0>
 pciide0: using ioapic0 pin 23 for native-PCI interrupt
 atabus0 at pciide0 channel 0
 pciide0: secondary channel wired to native-PCI mode
 atabus1 at pciide0 channel 1
 ichlpcib0 at pci0 dev 31 function 0: Intel 82801IR LPC Interface Bridge 
 (rev. 0x02)
 timecounter: Timecounter "ichlpcib0" frequency 3579545 Hz quality 1000
 ichlpcib0: 24-bit timer
 ichlpcib0: TCO (watchdog) timer configured.
 gpio0 at ichlpcib0: 64 pins
 piixide0 at pci0 dev 31 function 2: Intel 82801I Serial ATA Controller 
 (ICH9) (rev. 0x02)
 piixide0: bus-master DMA support present
 piixide0: primary channel configured to compatibility mode
 ioapic0: int14 0x6b<vector=0x6b,delmode=0x0,dest=0x0> 0x0<target=0x0>
 piixide0: primary channel interrupting at ioapic0 pin 14
 atabus2 at piixide0 channel 0
 piixide0: secondary channel configured to compatibility mode
 ioapic0: int15 0x6c<vector=0x6c,delmode=0x0,dest=0x0> 0x0<target=0x0>
 piixide0: secondary channel interrupting at ioapic0 pin 15
 atabus3 at piixide0 channel 1
 ichsmb0 at pci0 dev 31 function 3: Intel 82801I SMBus Controller (rev. 0x02)
 ioapic0: int17 0xa065<vector=0x65,delmode=0x0,actlo,level,dest=0x0> 
 0x0<target=0x0>
 ichsmb0: interrupting at ioapic0 pin 17
 iic0 at ichsmb0: I2C bus
 spdmem0 at iic0 addr 0x50
 spdmem0: DDR2 SDRAM, data ECC, 2GB, 800MHz (PC2-6400)
 spdmem0: 14 rows, 10 cols, 2 ranks, 8 banks/chip, 2.50ns cycle time
 spdmem0: tAA-tRCD-tRP-tRAS: 5-5-5-18
 spdmem0: voltage SSTL 1.8V, refresh time 7.8us (self-refreshing)
 spdmem1 at iic0 addr 0x51
 spdmem1: DDR2 SDRAM, data ECC, 2GB, 800MHz (PC2-6400)
 spdmem1: 14 rows, 10 cols, 2 ranks, 8 banks/chip, 2.50ns cycle time
 spdmem1: tAA-tRCD-tRP-tRAS: 5-5-5-18
 spdmem1: voltage SSTL 1.8V, refresh time 7.8us (self-refreshing)
 spdmem2 at iic0 addr 0x52
 spdmem2: DDR2 SDRAM, data ECC, 2GB, 800MHz (PC2-6400)
 spdmem2: 14 rows, 10 cols, 2 ranks, 8 banks/chip, 2.50ns cycle time
 spdmem2: tAA-tRCD-tRP-tRAS: 5-5-5-18
 spdmem2: voltage SSTL 1.8V, refresh time 7.8us (self-refreshing)
 spdmem3 at iic0 addr 0x53
 spdmem3: DDR2 SDRAM, data ECC, 2GB, 800MHz (PC2-6400)
 spdmem3: 14 rows, 10 cols, 2 ranks, 8 banks/chip, 2.50ns cycle time
 spdmem3: tAA-tRCD-tRP-tRAS: 5-5-5-18
 spdmem3: voltage SSTL 1.8V, refresh time 7.8us (self-refreshing)
 piixide1 at pci0 dev 31 function 5: Intel 82801I Serial ATA Controller 
 (ICH9) (rev. 0x02)
 piixide1: bus-master DMA support present
 piixide1: primary channel wired to native-PCI mode
 ioapic0: int18 0xa066<vector=0x66,delmode=0x0,actlo,level,dest=0x0> 
 0x0<target=0x0>
 piixide1: using ioapic0 pin 18 for native-PCI interrupt
 atabus4 at piixide1 channel 0
 piixide1: secondary channel wired to native-PCI mode
 atabus5 at piixide1 channel 1
 Intel 82801I Thermal Controller (miscellaneous DASP, revision 0x02) at 
 pci0 dev 31 function 6 not configured
 isa0 at ichlpcib0
 lm0 at isa0 port 0x290-0x297
 lm0: Using default temp sensors
 lm0: Winbond W83627HF Hardware monitor
 acpicpu0 at cpu0: ACPI CPU
 acpicpu0: C1: HLT, lat   0 us, pow     0 mW
 acpicpu0: P0: FFH, lat  10 us, pow 35000 mW, 3000 MHz
 acpicpu0: P1: FFH, lat  10 us, pow 28000 mW, 2667 MHz
 acpicpu0: P2: FFH, lat  10 us, pow 22000 mW, 2333 MHz
 acpicpu0: P3: FFH, lat  10 us, pow 16000 mW, 2000 MHz
 acpicpu0: T0: FFH, lat   1 us, pow 16000 mW, 100 %
 acpicpu0: T1: FFH, lat   1 us, pow 14000 mW,  88 %
 acpicpu0: T2: FFH, lat   1 us, pow 12000 mW,  75 %
 acpicpu0: T3: FFH, lat   1 us, pow 10000 mW,  63 %
 acpicpu0: T4: FFH, lat   1 us, pow  8000 mW,  50 %
 acpicpu0: T5: FFH, lat   1 us, pow  6000 mW,  38 %
 acpicpu0: T6: FFH, lat   1 us, pow  4000 mW,  25 %
 acpicpu0: T7: FFH, lat   1 us, pow  2000 mW,  13 %
 coretemp0 at cpu0: thermal sensor, 1 C resolution
 acpicpu1 at cpu1: ACPI CPU
 coretemp1 at cpu1: thermal sensor, 1 C resolution
 timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
 cpu1: prelint0 0x10000<vector=0x0,delmode=0x0,masked,dest=0x0> 
 0x0<target=0x0>
 cpu1: prelint1 0x10000<vector=0x0,delmode=0x0,masked,dest=0x0> 
 0x0<target=0x0>
 cpu1: timer0 0x10000<vector=0x0,delmode=0x0,masked,dest=0x0> 0x0<target=0x0>
 cpu1: pcint0 0x10000<vector=0x0,delmode=0x0,masked,dest=0x0> 0x0<target=0x0>
 cpu1: lint0 0x10700<vector=0x0,delmode=0x7,masked,dest=0x0> 0x0<target=0x0>
 cpu1: lint1 0x400<vector=0x0,delmode=0x4,dest=0x0> 0x0<target=0x0>


 One interesting thing is, that only pin 0 to 5 of ioapic1 are printed, 
 but ioapic1 is reported to have 24 pins ...
 May this lead to some "uninitialized" pin-setup and perhaps suprious 
 interrupts?

 W. Stukenbrock

 Wolfgang Stukenbrock wrote:

 > The following reply was made to PR kern/48733; it has been noted by GNATS.
 > 
 > From: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
 > To: gnats-bugs@NetBSD.org
 > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
 > Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
 > Date: Fri, 16 May 2014 10:20:53 +0200
 > 
 >  Hi again,
 >  
 >  I've already tried LOCK_DEBUG - no change in behaviour and I also don't 
 >  see additional usable information. (But now I'm nearly shure, it is not 
 >  the KERNEL_LOCK - see below ...)
 >  
 >  I've still not added the requested curcpu()->ci_biglock_count print, but 
 >  I've searched for the irq 6 or something like this.
 >  The "surprising" result:
 >  
 >  s011a# grep irq /var/run/dmesg.boot
 >  pckbc0 at acpi0 (KBC0, PNP0303) (kbd port): io 0x60,0x64 irq 1
 >  pckbc1 at acpi0 (MSE0, PNP0F13) (aux port): irq 12
 >  com0 at acpi0 (COM1, PNP0501-1): io 0x3f8-0x3ff irq 4
 >  com1 at acpi0 (COM2, PNP0501-2): io 0x2f8-0x2ff irq 3
 >  pckbc0: using irq 1 for kbd slot
 >  s011a# grep interrupt !$
 >  grep interrupt /var/run/dmesg.boot
 >  acpi0: SCI interrupting at int 9
 >  wm0: interrupting at ioapic1 pin 0
 >  wm1: interrupting at ioapic1 pin 1
 >  Intel product 0x0326 (interrupt system, interface 0x20, revision 0x09) 
 >  at pci1 dev 0 function 1 not configured
 >  uhci0: interrupting at ioapic0 pin 16
 >  uhci1: interrupting at ioapic0 pin 17
 >  uhci2: interrupting at ioapic0 pin 18
 >  ehci0: interrupting at ioapic0 pin 18
 >  wm2: interrupting at ioapic0 pin 16
 >  wm3: interrupting at ioapic0 pin 17
 >  wm4: interrupting at ioapic0 pin 18
 >  wm5: interrupting at ioapic0 pin 19
 >  uhci3: interrupting at ioapic0 pin 23
 >  uhci4: interrupting at ioapic0 pin 22
 >  uhci5: interrupting at ioapic0 pin 21
 >  ehci1: interrupting at ioapic0 pin 23
 >  pciide0: using ioapic0 pin 23 for native-PCI interrupt
 >  piixide0: primary channel interrupting at ioapic0 pin 14
 >  piixide0: secondary channel interrupting at ioapic0 pin 15
 >  ichsmb0: interrupting at ioapic0 pin 17
 >  piixide1: using ioapic0 pin 18 for native-PCI interrupt
 >  timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
 >  s011a# grep ioapic !$
 >  grep ioapic /var/run/dmesg.boot
 >  ioapic0 at mainbus0 apid 2: pa 0xfec00000, version 20, 24 pins
 >  ioapic1 at mainbus0 apid 3: pa 0xfecc0000, version 20, 24 pins
 >  wm0: interrupting at ioapic1 pin 0
 >  wm1: interrupting at ioapic1 pin 1
 >  uhci0: interrupting at ioapic0 pin 16
 >  uhci1: interrupting at ioapic0 pin 17
 >  uhci2: interrupting at ioapic0 pin 18
 >  ehci0: interrupting at ioapic0 pin 18
 >  wm2: interrupting at ioapic0 pin 16
 >  wm3: interrupting at ioapic0 pin 17
 >  wm4: interrupting at ioapic0 pin 18
 >  wm5: interrupting at ioapic0 pin 19
 >  uhci3: interrupting at ioapic0 pin 23
 >  uhci4: interrupting at ioapic0 pin 22
 >  uhci5: interrupting at ioapic0 pin 21
 >  ehci1: interrupting at ioapic0 pin 23
 >  pciide0: using ioapic0 pin 23 for native-PCI interrupt
 >  piixide0: primary channel interrupting at ioapic0 pin 14
 >  piixide0: secondary channel interrupting at ioapic0 pin 15
 >  ichsmb0: interrupting at ioapic0 pin 17
 >  piixide1: using ioapic0 pin 18 for native-PCI interrupt
 >  
 >  
 >  
 >  
 >  Hmmm - no irq6 or pin 6 !?!?!?!
 >  The first greating line is in /var/run/dmesg.boot, so the result is 
 >  "complete".
 >  The board is reported as 'Supermicro X7SBi-LN4 (0123456789)'.
 >  The two "additonal" interfaces (wm0 and wm1) are on ioapic1 - no other 
 >  HW is on that APIC.
 >  
 >  
 >  
 >  The search for not-configured devices say:
 >  s011a# grep confi !$
 >  grep confi /var/run/dmesg.boot
 >  MBRD (PNP0C02) at acpi0 not configured
 >  FWH (INT0800) at acpi0 not configured
 >  pci0 at mainbus0 bus 0: configuration mode 1
 >  Intel product 0x0326 (interrupt system, interface 0x20, revision 0x09) 
 >  at pci1 dev 0 function 1 not configured
 >  drm at vga0 not configured
 >  ichlpcib0: TCO (watchdog) timer configured.
 >  piixide0: primary channel configured to compatibility mode
 >  piixide0: secondary channel configured to compatibility mode
 >  Intel 82801I Thermal Controller (miscellaneous DASP, revision 0x02) at 
 >  pci0 dev 31 function 6 not configured
 >  
 >  
 >  Not really a lot - other systems have more unconfigured devices ....
 >  Normaly I'm testing all available drivers on a "new" HW, so I'm nearly 
 >  100% shure that theese devices are not covered by any driver.
 >  
 >  After that research it looks like there is something "suddenly" 
 >  interrupting without a driver for it.
 >  The time it take till "freeze" varies, so a "normal timer timeout" does 
 >  not sound reasonable.
 >  And why does this occure only after installing two additional interfaces 
 >  (wm0 and wm1) that use the second APIC? The system does not show the 
 >  problem without them.
 >  
 >  pci0 at mainbus0 bus 0: configuration mode 1
 >  pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
 >  pchb0 at pci0 dev 0 function 0: Intel product 0x29f0 (rev. 0x01)
 >  ppb0 at pci0 dev 1 function 0: Intel product 0x29f1 (rev. 0x01)
 >  ppb0: PCI Express 2.0 <Root Port of PCI-E Root Complex>
 >  pci1 at ppb0 bus 1
 >  pci1: i/o space, memory space enabled, rd/line, wr/inv ok
 >  ppb1 at pci1 dev 0 function 0: Intel product 0x032c (rev. 0x09)
 >  ppb1: PCI Express 1.0 <PCI-E to PCI/PCI-X Bridge>
 >  pci2 at ppb1 bus 2
 >  pci2: i/o space, memory space enabled, rd/line, wr/inv ok
 >  wm0 at pci2 dev 1 function 0: Intel i82546GB 1000BASE-T Ethernet (rev. 0x03)
 >  wm0: interrupting at ioapic1 pin 0
 >  wm0: 64-bit 133MHz PCIX bus
 >  wm0: NVM dump:
 >  1b00 0621 2a12 0430 ffff ffff ffff ffff
 >  c414 2104 460b 1179 8086 1079 8086 34e8
 >  000c 1079 0000 2102 10c8 ffff ffff ffff
 >  ffff ffff ffff ffff ffff ffff ffff ffff
 >  c30c ff63 5004 2102 00c8 ffff ffff ffff
 >  ffff ffff ffff ffff ffff ffff ffff 0602
 >  ffff ffff ffff ffff ffff ffff ffff ffff
 >  ffff ffff ffff ffff ffff ffff ffff 6be8
 >  wm0: 256 word (8 address bits) MicroWire EEPROM
 >  wm0: Ethernet address 00:1b:21:06:12:2a
 >  makphy0 at wm0 phy 1: Marvell 88E1011 Gigabit PHY, rev. 5
 >  makphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
 >  1000baseT-FDX, auto
 >  wm1 at pci2 dev 1 function 1: Intel i82546GB 1000BASE-T Ethernet (rev. 0x03)
 >  wm1: interrupting at ioapic1 pin 1
 >  wm1: 64-bit 133MHz PCIX bus
 >  wm1: NVM dump:
 >  1b00 0621 2a12 0430 ffff ffff ffff ffff
 >  c414 2104 460b 1179 8086 1079 8086 34e8
 >  000c 1079 0000 2102 10c8 ffff ffff ffff
 >  ffff ffff ffff ffff ffff ffff ffff ffff
 >  c30c ff63 5004 2102 00c8 ffff ffff ffff
 >  ffff ffff ffff ffff ffff ffff ffff 0602
 >  ffff ffff ffff ffff ffff ffff ffff ffff
 >  ffff ffff ffff ffff ffff ffff ffff 6be8
 >  wm1: 256 word (8 address bits) MicroWire EEPROM
 >  wm1: Ethernet address 00:1b:21:06:12:2b
 >  makphy1 at wm1 phy 1: Marvell 88E1011 Gigabit PHY, rev. 5
 >  makphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT, 
 >  1000baseT-FDX, auto
 >  
 >  
 >  Question: Are unused interrupts enabled? And if "yes", why?
 >  I thought I've seen some code that will print a warning about this and 
 >  tries to reset the interrupt, but I'm not 100% shure about the 
 >  netbsd-version I've looked at.
 >  
 >  Any hints to go on with debugging?
 >  It would take me a lot of time to "understand" the interrupt setup on 
 >  the amd64 architecture ...
 >  
 >  best regards
 >  
 >  W. Stukenbrock
 >  
 >  
 >  Manuel Bouyer wrote:
 >  
 >  > The following reply was made to PR kern/48733; it has been noted by GNATS.
 >  > 
 >  > From: Manuel Bouyer <bouyer@antioche.eu.org>
 >  > To: gnats-bugs@NetBSD.org
 >  > Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org,
 >  >         Wolfgang.Stukenbrock@nagler-company.com
 >  > Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
 >  > Date: Mon, 12 May 2014 14:23:46 +0200
 >  > 
 >  >  On Fri, May 09, 2014 at 12:20:00PM +0000, Wolfgang Stukenbrock wrote:
 >  >  >  [...]
 >  >  >  db{0}> bt/a fffffe822f73f420
 >  >  >  trace: pid 0 lid 3 at 0xfffffe810e967760
 >  >  >  ether_output() at netbsd:ether_output+0x2b6
 >  >  >  ip_output() at netbsd:ip_output+0xa8f
 >  >  >  tcp_output() at netbsd:tcp_output+0x1698
 >  >  >  tcp_input() at netbsd:tcp_input+0x15d9
 >  >  >  ip_input() at netbsd:ip_input+0x3ef
 >  >  >  ipintr() at netbsd:ipintr+0x109
 >  >  >  softint_dispatch() at netbsd:softint_dispatch+0xd9
 >  >  >  DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e967d70
 >  >  >  Xsoftintr() at netbsd:Xsoftintr+0x4f
 >  >  >  --- interrupt ---
 >  >  >  0:
 >  >  >  
 >  >  >  That is the part that is gooing to send a packet. I see the printout in 
 >  >  >  ip_output prior calling 'ifp->if_output()' - not the one behind.
 >  >  >  The location pointed to by the backtrace in ether_output() is the call 
 >  >  >  to "return ifq_enqueue(...)". I also see the printout I've added in 
 >  >  >  front of this call, but not the one behind.
 >  >  >  In ifq_enqueue() I see the output of the call to 'ifp->if_start' - the 
 >  >  >  wm-driver - in this routine and the printout in front of the splx(s) at 
 >  >  >  the end of the routine - not the printout behind it.
 >  >  >  This is the localtion where the deadlock happens while processing other 
 >  >  >  interrupts in Xspllower.
 >  >  >  This always looks the same ....
 >  >  
 >  >  ether_output() is called with the KERNEL_LOCK held, so at this point cpu0
 >  >  already owns KERNEL_LOCK, it won't spin trying to grab it again.
 >  >  You can confirm this by printing curcpu()->ci_biglock_count.
 >  >  Did you try a kernel with options LOCKDEBUG ?
 >  >  
 >  >  What's possible here is a loop trying to process the same interrupt
 >  >  forever.
 >  >  
 >  >  >  
 >  >  >  
 >  >  >  
 >  >  >  db{0}> bt/a fffffe822f736440
 >  >  >  trace: pid 0 lid 6 at 0xfffffe810e9739c8
 >  >  >  breakpoint() at netbsd:breakpoint+0x5
 >  >  >  comintr() at netbsd:comintr+0x518
 >  >  >  Xintr_ioapic_edge1() at netbsd:Xintr_ioapic_edge1+0xea
 >  >  >  --- interrupt ---
 >  >  >  bus_space_read_4() at netbsd:bus_space_read_4+0xa
 >  >  >  intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x3b
 >  >  >  Xintr_ioapic_level6() at netbsd:Xintr_ioapic_level6+0xf2
 >  >  >  --- interrupt ---
 >  >  >  Xspllower() at netbsd:Xspllower+0xe
 >  >  >  DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e973d70
 >  >  >  Xsoftintr() at netbsd:Xsoftintr+0x4f
 >  >  >  --- interrupt ---
 >  >  >  0:
 >  >  >  
 >  >  >  
 >  >  >  Hmmm - not shure about it ...
 >  >  >  It looks like that during processing one pending interrupt in Xspllower 
 >  >  >  at the end of that routine an interrupt came im that takes the 
 >  >  >  KERNEL_LOCK in intr_biglock_wrapper() again and do what? Hangup in 
 >  >  >  bus_space_read_4() ???? Busy-loop for whatever reason in that interrupt 
 >  >  >  and the location where the DDB-enter occures in bus_space_read_4() is 
 >  >  >  just random ????
 >  >  >  The comintr looks like the break-interrupt on the serial console of the 
 >  >  >  system to enter DDB to me.
 >  >  
 >  >  it is.
 >  >  
 >  >  >  Any idea to find out what interrupt routine it is???
 >  >  
 >  >  dmesg could point to the problem; the interrupt we're looking for is
 >  >  level-triggered on pin 6 (so maybe "irq 6")
 >  >  
 >  >  -- 
 >  >  Manuel Bouyer <bouyer@antioche.eu.org>
 >  >       NetBSD: 26 ans d'experience feront toujours la difference
 >  >  --
 >  >  
 >  > 
 >  
 >  
 >  -- 
 >  
 >  
 >  Dr. Nagler & Company GmbH
 >  Hauptstraße 9
 >  92253 Schnaittenbach
 >  
 >  Tel. +49 9622/71 97-42
 >  Fax +49 9622/71 97-50
 >  
 >  Wolfgang.Stukenbrock@nagler-company.com
 >  http://www.nagler-company.com
 >  
 >  
 >  Hauptsitz: Schnaittenbach
 >  Handelregister: Amberg HRB
 >  Gerichtsstand: Amberg
 >  Steuernummer: 201/118/51825
 >  USt.-ID-Nummer: DE 273143997
 >  Geschäftsführer: Dr. Martin Nagler
 >  
 >  
 > 


 -- 


 Dr. Nagler & Company GmbH
 Hauptstraße 9
 92253 Schnaittenbach

 Tel. +49 9622/71 97-42
 Fax +49 9622/71 97-50

 Wolfgang.Stukenbrock@nagler-company.com
 http://www.nagler-company.com


 Hauptsitz: Schnaittenbach
 Handelregister: Amberg HRB
 Gerichtsstand: Amberg
 Steuernummer: 201/118/51825
 USt.-ID-Nummer: DE 273143997
 Geschäftsführer: Dr. Martin Nagler


From: "David A. Holland" <dholland@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/48733 CVS commit: src/sys/arch
Date: Mon, 2 Jun 2014 02:11:52 +0000

 Module Name:	src
 Committed By:	dholland
 Date:		Mon Jun  2 02:11:52 UTC 2014

 Modified Files:
 	src/sys/arch/amd64/conf: ALL
 	src/sys/arch/i386/conf: ALL

 Log Message:
 add MPVERBOSE, noted missing in PR 48733.


 To generate a diff of this commit:
 cvs rdiff -u -r1.7 -r1.8 src/sys/arch/amd64/conf/ALL
 cvs rdiff -u -r1.374 -r1.375 src/sys/arch/i386/conf/ALL

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
Date: Mon, 2 Jun 2014 02:12:21 +0000

 On Fri, May 16, 2014 at 11:00:01AM +0000, Wolfgang Stukenbrock wrote:
  >  I've found a "hidden" option "MPVERBOSE" and enabled it.
  >  Why the hell are some options missing in the GENERIC (or ALL - if 
  >  present) config file?
  >  What is the expected way to find them ????

 They aren't supposed to exist... but entropy is always increasing.

 -- 
 David A. Holland
 dholland@netbsd.org

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.