NetBSD Problem Report #48733
From Wolfgang.Stukenbrock@nagler-company.com Fri Apr 11 14:35:34 2014
Return-Path: <Wolfgang.Stukenbrock@nagler-company.com>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1 with cipher ECDHE-RSA-AES256-SHA (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 3FB3CA5807
for <gnats-bugs@gnats.NetBSD.org>; Fri, 11 Apr 2014 14:35:34 +0000 (UTC)
Message-Id: <20140411131311.74AF4123B93@test-s0.nagler-company.com>
Date: Fri, 11 Apr 2014 15:13:11 +0200 (CEST)
From: Wolfgang.Stukenbrock@nagler-company.com
Reply-To: Wolfgang.Stukenbrock@nagler-company.com
To: gnats-bugs@gnats.NetBSD.org
Subject: deadlock in if_output() with interrupt on KERNEL_LOCK
X-Send-Pr-Version: 3.95
>Number: 48733
>Category: kern
>Synopsis: deadlock in if_output() with interrupt on KERNEL_LOCK
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Apr 11 14:40:00 +0000 2014
>Last-Modified: Mon Jun 02 02:15:02 +0000 2014
>Originator: Wolfgang Stukenbrock
>Release: NetBSD 6.1 - MAIN
>Organization:
Dr. Nagler & Company GmbH
>Environment:
System: NetBSD e013 6.1 NetBSD 6.1 (NSW-E013) #64: Tue Oct 22 11:03:30 CEST 2013 wgstuken@s012:/export/NetBSD-6.1/N+C-build/.OBJDIR_amd64/export/NetBSD-6.1/src/sys/arch/amd64/compile/NSW-E013 amd64
Architecture: x86_64
Machine: amd64
>Description:
Problem located in /src/sys/netinet/ip_output.c.
Since file revision 1.208 the Kernel-Lock is locked prior calling if_output
on the interface.
Now - at least the wm-driver - will call splnet() and splx() inside the output
routine.
If any interrupt occurs in between splnet() and splx(), the interrupt is delayed and
is processes in splx() when the level is released again.
If such an interrupt is e.g. not MP-SAFE, the call stup in intr_biglock_wrapper() is
used to call the interrupt routine and that one will lock the KERNEL-LOCK again.
So we try to lock it again here -> dead-lock.
Our system runs fine with 4 8257x interfaces, but after adding 2 additional 8254x
interfaces, the system lock-up after a short time. Don't ask me, why the if_output
call takes "to long" with theese two additonal interfaces, but it is reproducable.
I've analysed this several times with DDB. Most times I've seen an USB-interrupt
that dead-lock the system.
>How-To-Repeat:
Run a lot of trafic over wm-interfaces and do shomething e.g. on USB at the same
time. It is just a question of time till system-dead-lock.
>Fix:
Fist guess: revert change done from 1.207 to 1.208.
But I've no idea about side effects.
>Audit-Trail:
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
Date: Fri, 11 Apr 2014 17:59:09 +0200
On Fri, Apr 11, 2014 at 02:40:00PM +0000, Wolfgang.Stukenbrock@nagler-company.com wrote:
> >Description:
> Problem located in /src/sys/netinet/ip_output.c.
> Since file revision 1.208 the Kernel-Lock is locked prior calling if_output
> on the interface.
> Now - at least the wm-driver - will call splnet() and splx() inside the output
> routine.
> If any interrupt occurs in between splnet() and splx(), the interrupt is delayed and
> is processes in splx() when the level is released again.
> If such an interrupt is e.g. not MP-SAFE, the call stup in intr_biglock_wrapper() is
> used to call the interrupt routine and that one will lock the KERNEL-LOCK again.
> So we try to lock it again here -> dead-lock.
>
> Our system runs fine with 4 8257x interfaces, but after adding 2 additional 8254x
> interfaces, the system lock-up after a short time. Don't ask me, why the if_output
> call takes "to long" with theese two additonal interfaces, but it is reproducable.
> I've analysed this several times with DDB. Most times I've seen an USB-interrupt
> that dead-lock the system.
I think your analsys is wrong. the KERNEL_LOCK is special in the sense that
it can be locked multiple time on the same CPU. So it's not a problem
that splx() on the same CPU tries to get KERNEL_LOCK again, it will just
increase the lock count. A splx() on another CPU will wait for the
KERNEL_LOCK to be relased.
I think your problem is more likely in the USB stack.
Maybe one of your new ethernet interface shares an interrupt with the
USB controller ?
> >How-To-Repeat:
> Run a lot of trafic over wm-interfaces and do shomething e.g. on USB at the same
> time. It is just a question of time till system-dead-lock.
> >Fix:
> Fist guess: revert change done from 1.207 to 1.208.
> But I've no idea about side effects.
Very bad: the output queues are protected by the KERNEL_LOCK and splnet().
If you revert ip_output 1.208, you'll also have to revert ip_input.c
1.286 and 1.285, so that the whole IP stack runs under the KERNEL_LOCK again.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
Date: Sun, 27 Apr 2014 10:21:03 +0200
Hi, sorry for the delay, but I'm working off-side most time at the moment.
I have to look onto the system to validate the assumption, that the USB
shares the interupt. (I remember that there are some interrupts shared.)
But I can do this not earlyer as 02.05.2014 or later ...
And lowering the spl-level does not imply that the interrupt has to be
shared with the wm-driver to run into this problem.
The USB-interupt is not the only one I've seen.
And I've analysed the stack backtrace that has dead-locked the system
and it locks up in the interrupt code while trying to get the KERNEL_LOCK!
I will have a look at this again, but as far as I remember it definitly
locks up in the splx while trying to get the KERNEL_LOCK. splx does not
return anymore and I've got the interrupt source by translating the jump
address to a symbol that has not been reached anymore.
(I've still not commented out the LOCK/UNLOCK prior calling the
output-routine to validate my analyses - no time till now, sorry - but I
will do it as soon as possible.)
I will send the results of my review as soon as I've be able to do it.
best regards
W. Stukenbrock
Manuel Bouyer wrote:
> The following reply was made to PR kern/48733; it has been noted by GNATS.
>
> From: Manuel Bouyer <bouyer@antioche.eu.org>
> To: gnats-bugs@NetBSD.org
> Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
> Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
> Date: Fri, 11 Apr 2014 17:59:09 +0200
>
> On Fri, Apr 11, 2014 at 02:40:00PM +0000, Wolfgang.Stukenbrock@nagler-company.com wrote:
> > >Description:
> > Problem located in /src/sys/netinet/ip_output.c.
> > Since file revision 1.208 the Kernel-Lock is locked prior calling if_output
> > on the interface.
> > Now - at least the wm-driver - will call splnet() and splx() inside the output
> > routine.
> > If any interrupt occurs in between splnet() and splx(), the interrupt is delayed and
> > is processes in splx() when the level is released again.
> > If such an interrupt is e.g. not MP-SAFE, the call stup in intr_biglock_wrapper() is
> > used to call the interrupt routine and that one will lock the KERNEL-LOCK again.
> > So we try to lock it again here -> dead-lock.
> >
> > Our system runs fine with 4 8257x interfaces, but after adding 2 additional 8254x
> > interfaces, the system lock-up after a short time. Don't ask me, why the if_output
> > call takes "to long" with theese two additonal interfaces, but it is reproducable.
> > I've analysed this several times with DDB. Most times I've seen an USB-interrupt
> > that dead-lock the system.
>
> I think your analsys is wrong. the KERNEL_LOCK is special in the sense that
> it can be locked multiple time on the same CPU. So it's not a problem
> that splx() on the same CPU tries to get KERNEL_LOCK again, it will just
> increase the lock count. A splx() on another CPU will wait for the
> KERNEL_LOCK to be relased.
>
> I think your problem is more likely in the USB stack.
> Maybe one of your new ethernet interface shares an interrupt with the
> USB controller ?
>
>
> > >How-To-Repeat:
> > Run a lot of trafic over wm-interfaces and do shomething e.g. on USB at the same
> > time. It is just a question of time till system-dead-lock.
> > >Fix:
> > Fist guess: revert change done from 1.207 to 1.208.
> > But I've no idea about side effects.
>
> Very bad: the output queues are protected by the KERNEL_LOCK and splnet().
> If you revert ip_output 1.208, you'll also have to revert ip_input.c
> 1.286 and 1.285, so that the whole IP stack runs under the KERNEL_LOCK again.
>
> --
> Manuel Bouyer <bouyer@antioche.eu.org>
> NetBSD: 26 ans d'experience feront toujours la difference
> --
>
>
>
> Received: from DBXPR07MB317.eurprd07.prod.outlook.com (10.141.12.139) by
> DBXPR07MB319.eurprd07.prod.outlook.com (10.141.12.141) with Microsoft SMTP
> Server (TLS) id 15.0.918.8 via Mailbox Transport; Fri, 11 Apr 2014 16:00:11
> +0000
> Received: from DBXPR07CA001.eurprd07.prod.outlook.com (10.255.191.159) by
> DBXPR07MB317.eurprd07.prod.outlook.com (10.141.12.139) with Microsoft SMTP
> Server (TLS) id 15.0.918.8; Fri, 11 Apr 2014 16:00:10 +0000
> Received: from DB3FFO11FD011.protection.gbl (2a01:111:f400:7e04::177) by
> DBXPR07CA001.outlook.office365.com (2a01:111:e400:9800::31) with Microsoft
> SMTP Server (TLS) id 15.0.918.8 via Frontend Transport; Fri, 11 Apr 2014
> 16:00:09 +0000
> Received: from e002.nagler-company.com (212.185.86.227) by
> DB3FFO11FD011.mail.protection.outlook.com (10.47.216.167) with Microsoft SMTP
> Server (TLS) id 15.0.918.6 via Frontend Transport; Fri, 11 Apr 2014 16:00:08
> +0000
> Received: from mollari.NetBSD.org (mollari.netbsd.org [149.20.53.80])
> by e002.nagler-company.com (8.14.7/8.14.7) with ESMTP id s3BG0378005931
> for <Wolfgang.Stukenbrock@nagler-company.com>; Fri, 11 Apr 2014 18:00:06 +0200 (CEST)
> Received: by mollari.NetBSD.org (Postfix, from userid 31008)
> id C23A5A5828; Fri, 11 Apr 2014 16:00:01 +0000 (UTC)
> From: Manuel Bouyer <bouyer@antioche.eu.org>
> To: <kern-bug-people@netbsd.org>, <gnats-admin@netbsd.org>,
> <netbsd-bugs@netbsd.org>, <Wolfgang.Stukenbrock@nagler-company.com>
> Reply-To: <gnats-bugs@NetBSD.org>
> Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
> References: <pr-kern-48733@gnats.netbsd.org>
> <20140411131311.74AF4123B93@test-s0.nagler-company.com>
> X-Gnats-Was-Stupid: no
> CC:
> Message-ID: <20140411160001.C23A5A5828@mollari.NetBSD.org>
> Date: Fri, 11 Apr 2014 16:00:01 +0000
> Return-Path: gnats@NetBSD.org
> X-EOPAttributedMessage: 0
> X-MS-Exchange-Organization-MessageDirectionality: Incoming
> X-Forefront-Antispam-Report: CIP:212.185.86.227;CTRY:DE;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(979002)(6009001)(428001)(50944004)(51704005)(24454002)(199002)(189002)(16796002)(46386002)(74502001)(70486001)(50466002)(87836001)(46102001)(90896003)(77982001)(52956003)(54356999)(50986999)(76176999)(53806999)(81542001)(81342001)(74662001)(45336002)(43066001)(83072002)(85852003)(6806004)(19580395003)(83322001)(42186004)(44976005)(80976001)(19580405001)(76482001)(99396002)(4396001)(80022001)(33656001)(20776003)(47776003)(48376002)(2201001)(79102001)(92726001)(42882001)(90966001)(969003)(989001)(999001)(1009001)(1019001);DIR:INB;SFP:;SCL:1;SRVR:DBXPR07MB317;H:e002.nagler-company.com;FPR:BF74F31D.9C06D725.B1F32CB3.4CE95053.203B9;PTR:e002.nagler-company.com;A:1;MX:1;LANG:en;
> Content-Type: text/plain
> X-MS-Exchange-Organization-Network-Message-Id: ae18e11e-2fa8-44eb-8125-08d123b63ff9
> X-MS-Exchange-Organization-AVStamp-Service: 1.0
> Received-SPF: None (: NetBSD.org does not designate permitted sender hosts)
> X-MS-Exchange-Organization-SCL: 1
> X-MS-Exchange-Organization-AuthSource: DB3FFO11FD011.protection.gbl
> X-MS-Exchange-Organization-AuthAs: Anonymous
> MIME-Version: 1.0
>
> The following reply was made to PR kern/48733; it has been noted by GNATS.
>
> From: Manuel Bouyer <bouyer@antioche.eu.org>
> To: gnats-bugs@NetBSD.org
> Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
> Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
> Date: Fri, 11 Apr 2014 17:59:09 +0200
>
> On Fri, Apr 11, 2014 at 02:40:00PM +0000, Wolfgang.Stukenbrock@nagler-company.com wrote:
> > >Description:
> > Problem located in /src/sys/netinet/ip_output.c.
> > Since file revision 1.208 the Kernel-Lock is locked prior calling if_output
> > on the interface.
> > Now - at least the wm-driver - will call splnet() and splx() inside the output
> > routine.
> > If any interrupt occurs in between splnet() and splx(), the interrupt is delayed and
> > is processes in splx() when the level is released again.
> > If such an interrupt is e.g. not MP-SAFE, the call stup in intr_biglock_wrapper() is
> > used to call the interrupt routine and that one will lock the KERNEL-LOCK again.
> > So we try to lock it again here -> dead-lock.
> >
> > Our system runs fine with 4 8257x interfaces, but after adding 2 additional 8254x
> > interfaces, the system lock-up after a short time. Don't ask me, why the if_output
> > call takes "to long" with theese two additonal interfaces, but it is reproducable.
> > I've analysed this several times with DDB. Most times I've seen an USB-interrupt
> > that dead-lock the system.
>
> I think your analsys is wrong. the KERNEL_LOCK is special in the sense that
> it can be locked multiple time on the same CPU. So it's not a problem
> that splx() on the same CPU tries to get KERNEL_LOCK again, it will just
> increase the lock count. A splx() on another CPU will wait for the
> KERNEL_LOCK to be relased.
>
> I think your problem is more likely in the USB stack.
> Maybe one of your new ethernet interface shares an interrupt with the
> USB controller ?
>
>
> > >How-To-Repeat:
> > Run a lot of trafic over wm-interfaces and do shomething e.g. on USB at the same
> > time. It is just a question of time till system-dead-lock.
> > >Fix:
> > Fist guess: revert change done from 1.207 to 1.208.
> > But I've no idea about side effects.
>
> Very bad: the output queues are protected by the KERNEL_LOCK and splnet().
> If you revert ip_output 1.208, you'll also have to revert ip_input.c
> 1.286 and 1.285, so that the whole IP stack runs under the KERNEL_LOCK again.
>
> --
> Manuel Bouyer <bouyer@antioche.eu.org>
> NetBSD: 26 ans d'experience feront toujours la difference
> --
>
>
--
Dr. Nagler & Company GmbH
Hauptstraße 9
92253 Schnaittenbach
Tel. +49 9622/71 97-42
Fax +49 9622/71 97-50
Wolfgang.Stukenbrock@nagler-company.com
http://www.nagler-company.com
Hauptsitz: Schnaittenbach
Handelregister: Amberg HRB
Gerichtsstand: Amberg
Steuernummer: 201/118/51825
USt.-ID-Nummer: DE 273143997
Geschäftsführer: Dr. Martin Nagler
From: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
Date: Fri, 09 May 2014 14:18:18 +0200
Hi again,
I've tried to analyse the stackframes etc. a little bit more - but now I
need some help ...
After entering DDB I see the following:
db{0}> ps /l
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
666 > 1 7 0 0 fffffe822c504b20 ftpd
638 1 3 1 80 fffffe822e047280 csh ttyraw
646 1 3 1 80 fffffe822dae06e0 bounce kqueue
505 1 3 1 80 fffffe822dae02c0 smtp lockf
615 1 3 0 80 fffffe822c504700 smtp kqueue
632 1 3 0 80 fffffe822c605b60 trivial-rewrite kqueue
639 1 3 1 80 fffffe81330ce240 cleanup kqueue
690 1 3 1 80 fffffe81330cea80 qmgr kqueue
644 1 3 0 80 fffffe81330ce660 pickup kqueue
691 1 3 1 80 fffffe810f80f220 login wait
507 1 3 0 80 fffffe822c8c6300 inetd kqueue
628 1 3 1 80 fffffe822c8c6720 master kqueue
437 1 3 0 80 fffffe822e047ac0 choparp select
381 1 3 0 80 fffffe822c8c6b40 powerd kqueue
337 1 3 0 80 fffffe822df43260 logger pipe_rd
333 1 3 0 80 fffffe822df0b2a0 sh wait
183 > 1 7 1 0 fffffe822df0bae0 syslogd
1 1 3 1 80 fffffe810f1d21a0 init wait
0 66 3 1 200 fffffe822e0476a0 physiod physiod
0 65 3 0 200 fffffe810f80f640 aiodoned aiodoned
0 64 3 1 200 fffffe810f506200 ioflush syncer
0 63 3 0 200 fffffe810f80fa60 pgdaemon pgdaemon
0 60 3 0 200 fffffe810f506620 raidio1 raidiow
0 59 3 0 200 fffffe810f506a40 raid1 rfnodeq
0 58 3 1 200 fffffe810f21c1e0 raidio0 raidiow
0 57 3 1 200 fffffe810f21c600 raid0 rfnodeq
0 56 3 1 200 fffffe810f10d160 swcrthr swcr_wait
0 55 3 1 200 fffffe810f1dda00 cryptoret crypto_w
0 53 3 1 200 fffffe810f10d9a0 usb6 usbevt
0 52 3 1 200 fffffe810f14e9c0 usb4 usbevt
0 51 3 1 200 fffffe810f0cc560 usb7 usbevt
0 50 3 1 200 fffffe810f10d580 usb2 usbevt
0 49 3 1 200 fffffe810f0cc140 usb5 usbevt
0 48 3 1 200 fffffe810f08a120 usb3 usbevt
0 47 3 1 200 fffffe810f0cc980 usb0 usbevt
0 46 3 1 200 fffffe810f21ca20 usbtask-dr usbtsk
0 45 3 1 200 fffffe810f1dd1c0 usbtask-hc usbtsk
0 44 3 1 200 fffffe810f1dd5e0 usb1 usbevt
0 42 3 1 200 fffffe810f1d25c0 unpgc unpgc
0 41 3 1 200 fffffe810f1d29e0 vmem_rehash vmem_rehash
0 40 3 0 200 fffffe810f14e180 coretemp1 coretemp1
0 39 3 1 200 fffffe810f14e5a0 coretemp0 coretemp0
0 30 3 1 200 fffffe810f08a540 atabus5 atath
0 29 3 0 200 fffffe810f08a960 atabus4 atath
0 28 3 0 200 fffffe810f01e100 iic0 iicintr
0 27 3 1 200 fffffe810f01e520 atabus3 atath
0 26 3 1 200 fffffe810f01e940 atabus2 atath
0 25 3 0 200 fffffe810ea580e0 atabus1 atath
0 24 3 0 200 fffffe810ea58500 atabus0 atath
0 23 3 0 200 fffffe810ea58920 apm0 apmev
0 22 3 1 200 fffffe810e9f30c0 xcall/1 xcall
0 21 1 1 200 fffffe810e9f34e0 softser/1
0 20 3 1 200 fffffe810e9f3900 softclk/1 tstile
0 19 1 1 200 fffffe810e9b50a0 softbio/1
0 18 1 1 200 fffffe810e9b54c0 softnet/1
0 17 1 1 201 fffffe810e9b58e0 idle/1
0 16 3 0 200 fffffe822e11b080 sysmon smtaskq
0 15 3 0 200 fffffe822e11b4a0 pmfsuspend pmfsuspend
0 14 3 0 200 fffffe822e11b8c0 pmfevent pmfevent
0 13 3 0 200 fffffe822ef24060 sopendfree sopendfr
0 12 3 0 200 fffffe822ef24480 nfssilly nfssilly
0 11 3 1 200 fffffe822ef248a0 cachegc cachegc
0 10 3 1 200 fffffe822f72a040 vrele vrele
0 9 3 1 200 fffffe822f72a460 vdrain vdrain
0 8 3 0 200 fffffe822f72a880 modunload mod_unld
0 7 3 0 200 fffffe822f736020 xcall/0 xcall
0 > 6 7 0 200 fffffe822f736440 softser/0
0 5 1 0 200 fffffe822f736860 softclk/0
0 4 1 0 200 fffffe822f73f000 softbio/0
0 > 3 7 0 200 fffffe822f73f420 softnet/0
0 2 1 0 201 fffffe822f73f840 idle/0
0 1 3 1 200 ffffffff80655480 swapper uvm
OK - I thing the kernel lwp 6 and 3 are most relevant, but first the
other two processes marked with '>':
db{0}> bt/a fffffe822c504b20
trace: pid 666 lid 1 at 0xfffffe810f8a9b40
soo_write() at netbsd:soo_write+0x2c
dofilewrite() at netbsd:dofilewrite+0x7d
sys_write() at netbsd:sys_write+0x62
syscall() at netbsd:syscall+0xac
No surprise - I'm running an ftp to the system to trigger the problem.
ftpd is gooing to send some data.
db{0}> bt/a fffffe822df0bae0
trace: pid 183 lid 1 at 0xfffffe810f82abe0
kevent1() at netbsd:kevent1+0x5c0
sys___kevent50() at netbsd:sys___kevent50+0x33
syscall() at netbsd:syscall+0xac
Whatever syslogd is gooing to do - I do not think it is relevant.
I think it is woken by a message added by either my print statements or
the unknown "interrupt" routine (see below).
db{0}> bt/a fffffe822f73f420
trace: pid 0 lid 3 at 0xfffffe810e967760
ether_output() at netbsd:ether_output+0x2b6
ip_output() at netbsd:ip_output+0xa8f
tcp_output() at netbsd:tcp_output+0x1698
tcp_input() at netbsd:tcp_input+0x15d9
ip_input() at netbsd:ip_input+0x3ef
ipintr() at netbsd:ipintr+0x109
softint_dispatch() at netbsd:softint_dispatch+0xd9
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e967d70
Xsoftintr() at netbsd:Xsoftintr+0x4f
--- interrupt ---
0:
That is the part that is gooing to send a packet. I see the printout in
ip_output prior calling 'ifp->if_output()' - not the one behind.
The location pointed to by the backtrace in ether_output() is the call
to "return ifq_enqueue(...)". I also see the printout I've added in
front of this call, but not the one behind.
In ifq_enqueue() I see the output of the call to 'ifp->if_start' - the
wm-driver - in this routine and the printout in front of the splx(s) at
the end of the routine - not the printout behind it.
This is the localtion where the deadlock happens while processing other
interrupts in Xspllower.
This always looks the same ....
db{0}> bt/a fffffe822f736440
trace: pid 0 lid 6 at 0xfffffe810e9739c8
breakpoint() at netbsd:breakpoint+0x5
comintr() at netbsd:comintr+0x518
Xintr_ioapic_edge1() at netbsd:Xintr_ioapic_edge1+0xea
--- interrupt ---
bus_space_read_4() at netbsd:bus_space_read_4+0xa
intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x3b
Xintr_ioapic_level6() at netbsd:Xintr_ioapic_level6+0xf2
--- interrupt ---
Xspllower() at netbsd:Xspllower+0xe
DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e973d70
Xsoftintr() at netbsd:Xsoftintr+0x4f
--- interrupt ---
0:
Hmmm - not shure about it ...
It looks like that during processing one pending interrupt in Xspllower
at the end of that routine an interrupt came im that takes the
KERNEL_LOCK in intr_biglock_wrapper() again and do what? Hangup in
bus_space_read_4() ???? Busy-loop for whatever reason in that interrupt
and the location where the DDB-enter occures in bus_space_read_4() is
just random ????
The comintr looks like the break-interrupt on the serial console of the
system to enter DDB to me.
Any idea to find out what interrupt routine it is???
Where should I go on with debugging?
Accedently the trace-back is not 100% the same in the interrupt-part
every time. (But it happens always during the splx(s) in ifq_enqueue()
and the other two processes also have the same backtrace.)
The system has frosen at this point. If I try to reboot it from DDB it
gets lots of packet interrupts from the interfaces and freezes again
after all are processed. Break and reboot again will reboot the system.
But I think this information is not relevant.
Thanks in advance for help.
By the way:
The '/a' for the trace command is not documented in DDB manuals and the
online help of DDB !!!!!
I've found it by a look into the sources while trying to get the
backtraces for the kernel-threads.
Perhaps that should be changed ...
Should I open a PR for this?
Best regards
W. Stukenbrock
Wolfgang Stukenbrock wrote:
> The following reply was made to PR kern/48733; it has been noted by GNATS.
>
> From: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
> To: gnats-bugs@NetBSD.org
> Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
> Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
> Date: Sun, 27 Apr 2014 10:21:03 +0200
>
> Hi, sorry for the delay, but I'm working off-side most time at the moment.
>
> I have to look onto the system to validate the assumption, that the USB
> shares the interupt. (I remember that there are some interrupts shared.)
> But I can do this not earlyer as 02.05.2014 or later ...
> And lowering the spl-level does not imply that the interrupt has to be
> shared with the wm-driver to run into this problem.
>
> The USB-interupt is not the only one I've seen.
> And I've analysed the stack backtrace that has dead-locked the system
> and it locks up in the interrupt code while trying to get the KERNEL_LOCK!
> I will have a look at this again, but as far as I remember it definitly
> locks up in the splx while trying to get the KERNEL_LOCK. splx does not
> return anymore and I've got the interrupt source by translating the jump
> address to a symbol that has not been reached anymore.
> (I've still not commented out the LOCK/UNLOCK prior calling the
> output-routine to validate my analyses - no time till now, sorry - but I
> will do it as soon as possible.)
>
> I will send the results of my review as soon as I've be able to do it.
>
> best regards
>
> W. Stukenbrock
>
> Manuel Bouyer wrote:
>
> > The following reply was made to PR kern/48733; it has been noted by GNATS.
> >
> > From: Manuel Bouyer <bouyer@antioche.eu.org>
> > To: gnats-bugs@NetBSD.org
> > Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
> > Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
> > Date: Fri, 11 Apr 2014 17:59:09 +0200
> >
> > On Fri, Apr 11, 2014 at 02:40:00PM +0000, Wolfgang.Stukenbrock@nagler-company.com wrote:
> > > >Description:
> > > Problem located in /src/sys/netinet/ip_output.c.
> > > Since file revision 1.208 the Kernel-Lock is locked prior calling if_output
> > > on the interface.
> > > Now - at least the wm-driver - will call splnet() and splx() inside the output
> > > routine.
> > > If any interrupt occurs in between splnet() and splx(), the interrupt is delayed and
> > > is processes in splx() when the level is released again.
> > > If such an interrupt is e.g. not MP-SAFE, the call stup in intr_biglock_wrapper() is
> > > used to call the interrupt routine and that one will lock the KERNEL-LOCK again.
> > > So we try to lock it again here -> dead-lock.
> > >
> > > Our system runs fine with 4 8257x interfaces, but after adding 2 additional 8254x
> > > interfaces, the system lock-up after a short time. Don't ask me, why the if_output
> > > call takes "to long" with theese two additonal interfaces, but it is reproducable.
> > > I've analysed this several times with DDB. Most times I've seen an USB-interrupt
> > > that dead-lock the system.
> >
> > I think your analsys is wrong. the KERNEL_LOCK is special in the sense that
> > it can be locked multiple time on the same CPU. So it's not a problem
> > that splx() on the same CPU tries to get KERNEL_LOCK again, it will just
> > increase the lock count. A splx() on another CPU will wait for the
> > KERNEL_LOCK to be relased.
> >
> > I think your problem is more likely in the USB stack.
> > Maybe one of your new ethernet interface shares an interrupt with the
> > USB controller ?
> >
> >
> > > >How-To-Repeat:
> > > Run a lot of trafic over wm-interfaces and do shomething e.g. on USB at the same
> > > time. It is just a question of time till system-dead-lock.
> > > >Fix:
> > > Fist guess: revert change done from 1.207 to 1.208.
> > > But I've no idea about side effects.
> >
> > Very bad: the output queues are protected by the KERNEL_LOCK and splnet().
> > If you revert ip_output 1.208, you'll also have to revert ip_input.c
> > 1.286 and 1.285, so that the whole IP stack runs under the KERNEL_LOCK again.
> >
> > --
> > Manuel Bouyer <bouyer@antioche.eu.org>
> > NetBSD: 26 ans d'experience feront toujours la difference
> > --
> >
> >
> >
> > Received: from DBXPR07MB317.eurprd07.prod.outlook.com (10.141.12.139) by
> > DBXPR07MB319.eurprd07.prod.outlook.com (10.141.12.141) with Microsoft SMTP
> > Server (TLS) id 15.0.918.8 via Mailbox Transport; Fri, 11 Apr 2014 16:00:11
> > +0000
> > Received: from DBXPR07CA001.eurprd07.prod.outlook.com (10.255.191.159) by
> > DBXPR07MB317.eurprd07.prod.outlook.com (10.141.12.139) with Microsoft SMTP
> > Server (TLS) id 15.0.918.8; Fri, 11 Apr 2014 16:00:10 +0000
> > Received: from DB3FFO11FD011.protection.gbl (2a01:111:f400:7e04::177) by
> > DBXPR07CA001.outlook.office365.com (2a01:111:e400:9800::31) with Microsoft
> > SMTP Server (TLS) id 15.0.918.8 via Frontend Transport; Fri, 11 Apr 2014
> > 16:00:09 +0000
> > Received: from e002.nagler-company.com (212.185.86.227) by
> > DB3FFO11FD011.mail.protection.outlook.com (10.47.216.167) with Microsoft SMTP
> > Server (TLS) id 15.0.918.6 via Frontend Transport; Fri, 11 Apr 2014 16:00:08
> > +0000
> > Received: from mollari.NetBSD.org (mollari.netbsd.org [149.20.53.80])
> > by e002.nagler-company.com (8.14.7/8.14.7) with ESMTP id s3BG0378005931
> > for <Wolfgang.Stukenbrock@nagler-company.com>; Fri, 11 Apr 2014 18:00:06 +0200 (CEST)
> > Received: by mollari.NetBSD.org (Postfix, from userid 31008)
> > id C23A5A5828; Fri, 11 Apr 2014 16:00:01 +0000 (UTC)
> > From: Manuel Bouyer <bouyer@antioche.eu.org>
> > To: <kern-bug-people@netbsd.org>, <gnats-admin@netbsd.org>,
> > <netbsd-bugs@netbsd.org>, <Wolfgang.Stukenbrock@nagler-company.com>
> > Reply-To: <gnats-bugs@NetBSD.org>
> > Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
> > References: <pr-kern-48733@gnats.netbsd.org>
> > <20140411131311.74AF4123B93@test-s0.nagler-company.com>
> > X-Gnats-Was-Stupid: no
> > CC:
> > Message-ID: <20140411160001.C23A5A5828@mollari.NetBSD.org>
> > Date: Fri, 11 Apr 2014 16:00:01 +0000
> > Return-Path: gnats@NetBSD.org
> > X-EOPAttributedMessage: 0
> > X-MS-Exchange-Organization-MessageDirectionality: Incoming
> > X-Forefront-Antispam-Report: CIP:212.185.86.227;CTRY:DE;IPV:NLI;EFV:NLI;SFV:NSPM;SFS:(979002)(6009001)(428001)(50944004)(51704005)(24454002)(199002)(189002)(16796002)(46386002)(74502001)(70486001)(50466002)(87836001)(46102001)(90896003)(77982001)(52956003)(54356999)(50986999)(76176999)(53806999)(81542001)(81342001)(74662001)(45336002)(43066001)(83072002)(85852003)(6806004)(19580395003)(83322001)(42186004)(44976005)(80976001)(19580405001)(76482001)(99396002)(4396001)(80022001)(33656001)(20776003)(47776003)(48376002)(2201001)(79102001)(92726001)(42882001)(90966001)(969003)(989001)(999001)(1009001)(1019001);DIR:INB;SFP:;SCL:1;SRVR:DBXPR07MB317;H:e002.nagler-company.com;FPR:BF74F31D.9C06D725.B1F32CB3.4CE95053.203B9;PTR:e002.nagler-company.com;A:1;MX:1;LANG:en;
> > Content-Type: text/plain
> > X-MS-Exchange-Organization-Network-Message-Id: ae18e11e-2fa8-44eb-8125-08d123b63ff9
> > X-MS-Exchange-Organization-AVStamp-Service: 1.0
> > Received-SPF: None (: NetBSD.org does not designate permitted sender hosts)
> > X-MS-Exchange-Organization-SCL: 1
> > X-MS-Exchange-Organization-AuthSource: DB3FFO11FD011.protection.gbl
> > X-MS-Exchange-Organization-AuthAs: Anonymous
> > MIME-Version: 1.0
> >
> > The following reply was made to PR kern/48733; it has been noted by GNATS.
> >
> > From: Manuel Bouyer <bouyer@antioche.eu.org>
> > To: gnats-bugs@NetBSD.org
> > Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
> > Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
> > Date: Fri, 11 Apr 2014 17:59:09 +0200
> >
> > On Fri, Apr 11, 2014 at 02:40:00PM +0000, Wolfgang.Stukenbrock@nagler-company.com wrote:
> > > >Description:
> > > Problem located in /src/sys/netinet/ip_output.c.
> > > Since file revision 1.208 the Kernel-Lock is locked prior calling if_output
> > > on the interface.
> > > Now - at least the wm-driver - will call splnet() and splx() inside the output
> > > routine.
> > > If any interrupt occurs in between splnet() and splx(), the interrupt is delayed and
> > > is processes in splx() when the level is released again.
> > > If such an interrupt is e.g. not MP-SAFE, the call stup in intr_biglock_wrapper() is
> > > used to call the interrupt routine and that one will lock the KERNEL-LOCK again.
> > > So we try to lock it again here -> dead-lock.
> > >
> > > Our system runs fine with 4 8257x interfaces, but after adding 2 additional 8254x
> > > interfaces, the system lock-up after a short time. Don't ask me, why the if_output
> > > call takes "to long" with theese two additonal interfaces, but it is reproducable.
> > > I've analysed this several times with DDB. Most times I've seen an USB-interrupt
> > > that dead-lock the system.
> >
> > I think your analsys is wrong. the KERNEL_LOCK is special in the sense that
> > it can be locked multiple time on the same CPU. So it's not a problem
> > that splx() on the same CPU tries to get KERNEL_LOCK again, it will just
> > increase the lock count. A splx() on another CPU will wait for the
> > KERNEL_LOCK to be relased.
> >
> > I think your problem is more likely in the USB stack.
> > Maybe one of your new ethernet interface shares an interrupt with the
> > USB controller ?
> >
> >
> > > >How-To-Repeat:
> > > Run a lot of trafic over wm-interfaces and do shomething e.g. on USB at the same
> > > time. It is just a question of time till system-dead-lock.
> > > >Fix:
> > > Fist guess: revert change done from 1.207 to 1.208.
> > > But I've no idea about side effects.
> >
> > Very bad: the output queues are protected by the KERNEL_LOCK and splnet().
> > If you revert ip_output 1.208, you'll also have to revert ip_input.c
> > 1.286 and 1.285, so that the whole IP stack runs under the KERNEL_LOCK again.
> >
> > --
> > Manuel Bouyer <bouyer@antioche.eu.org>
> > NetBSD: 26 ans d'experience feront toujours la difference
> > --
> >
> >
>
>
> --
>
>
> Dr. Nagler & Company GmbH
> Hauptstraße 9
> 92253 Schnaittenbach
>
> Tel. +49 9622/71 97-42
> Fax +49 9622/71 97-50
>
> Wolfgang.Stukenbrock@nagler-company.com
> http://www.nagler-company.com
>
>
> Hauptsitz: Schnaittenbach
> Handelregister: Amberg HRB
> Gerichtsstand: Amberg
> Steuernummer: 201/118/51825
> USt.-ID-Nummer: DE 273143997
> Geschäftsführer: Dr. Martin Nagler
>
>
>
--
Dr. Nagler & Company GmbH
Hauptstraße 9
92253 Schnaittenbach
Tel. +49 9622/71 97-42
Fax +49 9622/71 97-50
Wolfgang.Stukenbrock@nagler-company.com
http://www.nagler-company.com
Hauptsitz: Schnaittenbach
Handelregister: Amberg HRB
Gerichtsstand: Amberg
Steuernummer: 201/118/51825
USt.-ID-Nummer: DE 273143997
Geschäftsführer: Dr. Martin Nagler
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org,
Wolfgang.Stukenbrock@nagler-company.com
Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
Date: Mon, 12 May 2014 14:23:46 +0200
On Fri, May 09, 2014 at 12:20:00PM +0000, Wolfgang Stukenbrock wrote:
> [...]
> db{0}> bt/a fffffe822f73f420
> trace: pid 0 lid 3 at 0xfffffe810e967760
> ether_output() at netbsd:ether_output+0x2b6
> ip_output() at netbsd:ip_output+0xa8f
> tcp_output() at netbsd:tcp_output+0x1698
> tcp_input() at netbsd:tcp_input+0x15d9
> ip_input() at netbsd:ip_input+0x3ef
> ipintr() at netbsd:ipintr+0x109
> softint_dispatch() at netbsd:softint_dispatch+0xd9
> DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e967d70
> Xsoftintr() at netbsd:Xsoftintr+0x4f
> --- interrupt ---
> 0:
>
> That is the part that is gooing to send a packet. I see the printout in
> ip_output prior calling 'ifp->if_output()' - not the one behind.
> The location pointed to by the backtrace in ether_output() is the call
> to "return ifq_enqueue(...)". I also see the printout I've added in
> front of this call, but not the one behind.
> In ifq_enqueue() I see the output of the call to 'ifp->if_start' - the
> wm-driver - in this routine and the printout in front of the splx(s) at
> the end of the routine - not the printout behind it.
> This is the localtion where the deadlock happens while processing other
> interrupts in Xspllower.
> This always looks the same ....
ether_output() is called with the KERNEL_LOCK held, so at this point cpu0
already owns KERNEL_LOCK, it won't spin trying to grab it again.
You can confirm this by printing curcpu()->ci_biglock_count.
Did you try a kernel with options LOCKDEBUG ?
What's possible here is a loop trying to process the same interrupt
forever.
>
>
>
> db{0}> bt/a fffffe822f736440
> trace: pid 0 lid 6 at 0xfffffe810e9739c8
> breakpoint() at netbsd:breakpoint+0x5
> comintr() at netbsd:comintr+0x518
> Xintr_ioapic_edge1() at netbsd:Xintr_ioapic_edge1+0xea
> --- interrupt ---
> bus_space_read_4() at netbsd:bus_space_read_4+0xa
> intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x3b
> Xintr_ioapic_level6() at netbsd:Xintr_ioapic_level6+0xf2
> --- interrupt ---
> Xspllower() at netbsd:Xspllower+0xe
> DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e973d70
> Xsoftintr() at netbsd:Xsoftintr+0x4f
> --- interrupt ---
> 0:
>
>
> Hmmm - not shure about it ...
> It looks like that during processing one pending interrupt in Xspllower
> at the end of that routine an interrupt came im that takes the
> KERNEL_LOCK in intr_biglock_wrapper() again and do what? Hangup in
> bus_space_read_4() ???? Busy-loop for whatever reason in that interrupt
> and the location where the DDB-enter occures in bus_space_read_4() is
> just random ????
> The comintr looks like the break-interrupt on the serial console of the
> system to enter DDB to me.
it is.
> Any idea to find out what interrupt routine it is???
dmesg could point to the problem; the interrupt we're looking for is
level-triggered on pin 6 (so maybe "irq 6")
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
Date: Fri, 16 May 2014 10:20:53 +0200
Hi again,
I've already tried LOCK_DEBUG - no change in behaviour and I also don't
see additional usable information. (But now I'm nearly shure, it is not
the KERNEL_LOCK - see below ...)
I've still not added the requested curcpu()->ci_biglock_count print, but
I've searched for the irq 6 or something like this.
The "surprising" result:
s011a# grep irq /var/run/dmesg.boot
pckbc0 at acpi0 (KBC0, PNP0303) (kbd port): io 0x60,0x64 irq 1
pckbc1 at acpi0 (MSE0, PNP0F13) (aux port): irq 12
com0 at acpi0 (COM1, PNP0501-1): io 0x3f8-0x3ff irq 4
com1 at acpi0 (COM2, PNP0501-2): io 0x2f8-0x2ff irq 3
pckbc0: using irq 1 for kbd slot
s011a# grep interrupt !$
grep interrupt /var/run/dmesg.boot
acpi0: SCI interrupting at int 9
wm0: interrupting at ioapic1 pin 0
wm1: interrupting at ioapic1 pin 1
Intel product 0x0326 (interrupt system, interface 0x20, revision 0x09)
at pci1 dev 0 function 1 not configured
uhci0: interrupting at ioapic0 pin 16
uhci1: interrupting at ioapic0 pin 17
uhci2: interrupting at ioapic0 pin 18
ehci0: interrupting at ioapic0 pin 18
wm2: interrupting at ioapic0 pin 16
wm3: interrupting at ioapic0 pin 17
wm4: interrupting at ioapic0 pin 18
wm5: interrupting at ioapic0 pin 19
uhci3: interrupting at ioapic0 pin 23
uhci4: interrupting at ioapic0 pin 22
uhci5: interrupting at ioapic0 pin 21
ehci1: interrupting at ioapic0 pin 23
pciide0: using ioapic0 pin 23 for native-PCI interrupt
piixide0: primary channel interrupting at ioapic0 pin 14
piixide0: secondary channel interrupting at ioapic0 pin 15
ichsmb0: interrupting at ioapic0 pin 17
piixide1: using ioapic0 pin 18 for native-PCI interrupt
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
s011a# grep ioapic !$
grep ioapic /var/run/dmesg.boot
ioapic0 at mainbus0 apid 2: pa 0xfec00000, version 20, 24 pins
ioapic1 at mainbus0 apid 3: pa 0xfecc0000, version 20, 24 pins
wm0: interrupting at ioapic1 pin 0
wm1: interrupting at ioapic1 pin 1
uhci0: interrupting at ioapic0 pin 16
uhci1: interrupting at ioapic0 pin 17
uhci2: interrupting at ioapic0 pin 18
ehci0: interrupting at ioapic0 pin 18
wm2: interrupting at ioapic0 pin 16
wm3: interrupting at ioapic0 pin 17
wm4: interrupting at ioapic0 pin 18
wm5: interrupting at ioapic0 pin 19
uhci3: interrupting at ioapic0 pin 23
uhci4: interrupting at ioapic0 pin 22
uhci5: interrupting at ioapic0 pin 21
ehci1: interrupting at ioapic0 pin 23
pciide0: using ioapic0 pin 23 for native-PCI interrupt
piixide0: primary channel interrupting at ioapic0 pin 14
piixide0: secondary channel interrupting at ioapic0 pin 15
ichsmb0: interrupting at ioapic0 pin 17
piixide1: using ioapic0 pin 18 for native-PCI interrupt
Hmmm - no irq6 or pin 6 !?!?!?!
The first greating line is in /var/run/dmesg.boot, so the result is
"complete".
The board is reported as 'Supermicro X7SBi-LN4 (0123456789)'.
The two "additonal" interfaces (wm0 and wm1) are on ioapic1 - no other
HW is on that APIC.
The search for not-configured devices say:
s011a# grep confi !$
grep confi /var/run/dmesg.boot
MBRD (PNP0C02) at acpi0 not configured
FWH (INT0800) at acpi0 not configured
pci0 at mainbus0 bus 0: configuration mode 1
Intel product 0x0326 (interrupt system, interface 0x20, revision 0x09)
at pci1 dev 0 function 1 not configured
drm at vga0 not configured
ichlpcib0: TCO (watchdog) timer configured.
piixide0: primary channel configured to compatibility mode
piixide0: secondary channel configured to compatibility mode
Intel 82801I Thermal Controller (miscellaneous DASP, revision 0x02) at
pci0 dev 31 function 6 not configured
Not really a lot - other systems have more unconfigured devices ....
Normaly I'm testing all available drivers on a "new" HW, so I'm nearly
100% shure that theese devices are not covered by any driver.
After that research it looks like there is something "suddenly"
interrupting without a driver for it.
The time it take till "freeze" varies, so a "normal timer timeout" does
not sound reasonable.
And why does this occure only after installing two additional interfaces
(wm0 and wm1) that use the second APIC? The system does not show the
problem without them.
pci0 at mainbus0 bus 0: configuration mode 1
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0: Intel product 0x29f0 (rev. 0x01)
ppb0 at pci0 dev 1 function 0: Intel product 0x29f1 (rev. 0x01)
ppb0: PCI Express 2.0 <Root Port of PCI-E Root Complex>
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
ppb1 at pci1 dev 0 function 0: Intel product 0x032c (rev. 0x09)
ppb1: PCI Express 1.0 <PCI-E to PCI/PCI-X Bridge>
pci2 at ppb1 bus 2
pci2: i/o space, memory space enabled, rd/line, wr/inv ok
wm0 at pci2 dev 1 function 0: Intel i82546GB 1000BASE-T Ethernet (rev. 0x03)
wm0: interrupting at ioapic1 pin 0
wm0: 64-bit 133MHz PCIX bus
wm0: NVM dump:
1b00 0621 2a12 0430 ffff ffff ffff ffff
c414 2104 460b 1179 8086 1079 8086 34e8
000c 1079 0000 2102 10c8 ffff ffff ffff
ffff ffff ffff ffff ffff ffff ffff ffff
c30c ff63 5004 2102 00c8 ffff ffff ffff
ffff ffff ffff ffff ffff ffff ffff 0602
ffff ffff ffff ffff ffff ffff ffff ffff
ffff ffff ffff ffff ffff ffff ffff 6be8
wm0: 256 word (8 address bits) MicroWire EEPROM
wm0: Ethernet address 00:1b:21:06:12:2a
makphy0 at wm0 phy 1: Marvell 88E1011 Gigabit PHY, rev. 5
makphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
wm1 at pci2 dev 1 function 1: Intel i82546GB 1000BASE-T Ethernet (rev. 0x03)
wm1: interrupting at ioapic1 pin 1
wm1: 64-bit 133MHz PCIX bus
wm1: NVM dump:
1b00 0621 2a12 0430 ffff ffff ffff ffff
c414 2104 460b 1179 8086 1079 8086 34e8
000c 1079 0000 2102 10c8 ffff ffff ffff
ffff ffff ffff ffff ffff ffff ffff ffff
c30c ff63 5004 2102 00c8 ffff ffff ffff
ffff ffff ffff ffff ffff ffff ffff 0602
ffff ffff ffff ffff ffff ffff ffff ffff
ffff ffff ffff ffff ffff ffff ffff 6be8
wm1: 256 word (8 address bits) MicroWire EEPROM
wm1: Ethernet address 00:1b:21:06:12:2b
makphy1 at wm1 phy 1: Marvell 88E1011 Gigabit PHY, rev. 5
makphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
Question: Are unused interrupts enabled? And if "yes", why?
I thought I've seen some code that will print a warning about this and
tries to reset the interrupt, but I'm not 100% shure about the
netbsd-version I've looked at.
Any hints to go on with debugging?
It would take me a lot of time to "understand" the interrupt setup on
the amd64 architecture ...
best regards
W. Stukenbrock
Manuel Bouyer wrote:
> The following reply was made to PR kern/48733; it has been noted by GNATS.
>
> From: Manuel Bouyer <bouyer@antioche.eu.org>
> To: gnats-bugs@NetBSD.org
> Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org,
> Wolfgang.Stukenbrock@nagler-company.com
> Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
> Date: Mon, 12 May 2014 14:23:46 +0200
>
> On Fri, May 09, 2014 at 12:20:00PM +0000, Wolfgang Stukenbrock wrote:
> > [...]
> > db{0}> bt/a fffffe822f73f420
> > trace: pid 0 lid 3 at 0xfffffe810e967760
> > ether_output() at netbsd:ether_output+0x2b6
> > ip_output() at netbsd:ip_output+0xa8f
> > tcp_output() at netbsd:tcp_output+0x1698
> > tcp_input() at netbsd:tcp_input+0x15d9
> > ip_input() at netbsd:ip_input+0x3ef
> > ipintr() at netbsd:ipintr+0x109
> > softint_dispatch() at netbsd:softint_dispatch+0xd9
> > DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e967d70
> > Xsoftintr() at netbsd:Xsoftintr+0x4f
> > --- interrupt ---
> > 0:
> >
> > That is the part that is gooing to send a packet. I see the printout in
> > ip_output prior calling 'ifp->if_output()' - not the one behind.
> > The location pointed to by the backtrace in ether_output() is the call
> > to "return ifq_enqueue(...)". I also see the printout I've added in
> > front of this call, but not the one behind.
> > In ifq_enqueue() I see the output of the call to 'ifp->if_start' - the
> > wm-driver - in this routine and the printout in front of the splx(s) at
> > the end of the routine - not the printout behind it.
> > This is the localtion where the deadlock happens while processing other
> > interrupts in Xspllower.
> > This always looks the same ....
>
> ether_output() is called with the KERNEL_LOCK held, so at this point cpu0
> already owns KERNEL_LOCK, it won't spin trying to grab it again.
> You can confirm this by printing curcpu()->ci_biglock_count.
> Did you try a kernel with options LOCKDEBUG ?
>
> What's possible here is a loop trying to process the same interrupt
> forever.
>
> >
> >
> >
> > db{0}> bt/a fffffe822f736440
> > trace: pid 0 lid 6 at 0xfffffe810e9739c8
> > breakpoint() at netbsd:breakpoint+0x5
> > comintr() at netbsd:comintr+0x518
> > Xintr_ioapic_edge1() at netbsd:Xintr_ioapic_edge1+0xea
> > --- interrupt ---
> > bus_space_read_4() at netbsd:bus_space_read_4+0xa
> > intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x3b
> > Xintr_ioapic_level6() at netbsd:Xintr_ioapic_level6+0xf2
> > --- interrupt ---
> > Xspllower() at netbsd:Xspllower+0xe
> > DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e973d70
> > Xsoftintr() at netbsd:Xsoftintr+0x4f
> > --- interrupt ---
> > 0:
> >
> >
> > Hmmm - not shure about it ...
> > It looks like that during processing one pending interrupt in Xspllower
> > at the end of that routine an interrupt came im that takes the
> > KERNEL_LOCK in intr_biglock_wrapper() again and do what? Hangup in
> > bus_space_read_4() ???? Busy-loop for whatever reason in that interrupt
> > and the location where the DDB-enter occures in bus_space_read_4() is
> > just random ????
> > The comintr looks like the break-interrupt on the serial console of the
> > system to enter DDB to me.
>
> it is.
>
> > Any idea to find out what interrupt routine it is???
>
> dmesg could point to the problem; the interrupt we're looking for is
> level-triggered on pin 6 (so maybe "irq 6")
>
> --
> Manuel Bouyer <bouyer@antioche.eu.org>
> NetBSD: 26 ans d'experience feront toujours la difference
> --
>
>
--
Dr. Nagler & Company GmbH
Hauptstraße 9
92253 Schnaittenbach
Tel. +49 9622/71 97-42
Fax +49 9622/71 97-50
Wolfgang.Stukenbrock@nagler-company.com
http://www.nagler-company.com
Hauptsitz: Schnaittenbach
Handelregister: Amberg HRB
Gerichtsstand: Amberg
Steuernummer: 201/118/51825
USt.-ID-Nummer: DE 273143997
Geschäftsführer: Dr. Martin Nagler
From: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
Date: Fri, 16 May 2014 12:58:17 +0200
Hi again,
I've found a "hidden" option "MPVERBOSE" and enabled it.
Why the hell are some options missing in the GENERIC (or ALL - if
present) config file?
What is the expected way to find them ????
Now I've some additional information about the ioapic setup.
Full output of dmesg.boot until interrupts are enabled follow:
s011a# cat /var/run/dmesg.boot
Copyright (c) 1996, 1997, 1998, 1999, 2000, 2001, 2002, 2003, 2004, 2005,
2006, 2007, 2008, 2009, 2010, 2011, 2012
The NetBSD Foundation, Inc. All rights reserved.
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
NetBSD 6.1 (NSW-S011) #0: Fri May 16 12:33:18 CEST 2014
root@s011a:/usr/src/sys/arch/amd64/compile/NSW-S011
total memory = 8189 MB
avail memory = 7945 MB
timecounter: Timecounters tick every 10.000 msec
timecounter: Timecounter "i8254" frequency 1193182 Hz quality 100
Supermicro X7SBi-LN4 (0123456789)
mainbus0 (root)
mainbus0: scanning 0x9dc00 to 0x9dff0 for MP signature
mainbus0: scanning 0x9d800 to 0x9dbf0 for MP signature
mainbus0: scanning 0xf0000 to 0xffff0 for MP signature
mainbus0: MP floating pointer found in bios at 0xf6570
mainbus0: MP config table at 0x9e0a1, 484 bytes long
cpu0 at mainbus0 apid 0cpu0: prelint0
0x700<vector=0x0,delmode=0x7,dest=0x0> 0x0<target=0x0>
cpu0: prelint1 0x400<vector=0x0,delmode=0x4,dest=0x0> 0x0<target=0x0>
cpu0: timer0 0x10000<vector=0x0,delmode=0x0,masked,dest=0x0> 0x0<target=0x0>
cpu0: pcint0 0x10000<vector=0x0,delmode=0x0,masked,dest=0x0> 0x0<target=0x0>
cpu0: lint0 0x700<vector=0x0,delmode=0x7,dest=0x0> 0x0<target=0x0>
cpu0: lint1 0x400<vector=0x0,delmode=0x4,dest=0x0> 0x0<target=0x0>
cpu0: err0 0x10000<vector=0x0,delmode=0x0,masked,dest=0x0> 0x0<target=0x0>
: Intel(R) Xeon(R) CPU L3110 @ 3.00GHz, id 0x1067a
cpu0: idle lwp at 0xfffffe822f746840, idle sp at 0xfffffe810e967c50
cpu1 at mainbus0 apid 1: Intel(R) Xeon(R) CPU L3110 @
3.00GHz, id 0x1067a
cpu1: idle lwp at 0xfffffe810e9b78e0, idle sp at 0xfffffe810e994c50
ioapic0 at mainbus0 apid 2: pa 0xfec00000, virtual wire mode, version
20, 24 pins
ioapic1 at mainbus0 apid 3: pa 0xfecc0000, virtual wire mode, version
20, 24 pins
acpi0 at mainbus0: Intel ACPICA 20110623
acpi0: X/RSDT: OemId <PTLTD , XSDT ,06040000>, AslId < LTP,00000000>
ioapic0: int9 0x8060<vector=0x60,delmode=0x0,level,dest=0x0> 0x0<target=0x0>
mpacpi: found root PCI bus 0
mpacpi: found subordinate bus 0
mpacpi: found subordinate bus 0
mpacpi: found subordinate bus 0
mpacpi: found subordinate bus 0
mpacpi: found subordinate bus 0
mpacpi: found subordinate bus 0
mpacpi: found subordinate bus 0
mpacpi: 8 PCI busses
mpacpi: ISA interrupt override 0 -> 2 (1/1)
mpacpi: ISA interrupt override 9 -> 9 (1/3)
mpacpi: configuring PCI bus 0 int routing
mpacpi: configuring PCI bus 1 int routing
mpacpi: configuring PCI bus 2 int routing
mpacpi: configuring PCI bus 5 int routing
mpacpi: configuring PCI bus 7 int routing
mpacpi: configuring PCI bus 9 int routing
mpacpi: configuring PCI bus 11 int routing
mpacpi: configuring PCI bus 17 int routing
ioapic0: pin 2 attached to isa0 irq 0 (type 0x0<type=0x0> flags
0x5<pol=0x1=Act Hi,trig=0x1=Edge>)
ioapic0: pin 1 attached to isa0 irq 1 (type 0x0<type=0x0> flags
0x0<pol=0x0,trig=0x0>)
ioapic0: pin 3 attached to isa0 irq 3 (type 0x0<type=0x0> flags
0x0<pol=0x0,trig=0x0>)^M
ioapic0: pin 4 attached to isa0 irq 4 (type 0x0<type=0x0> flags
0x0<pol=0x0,trig=0x0>)
ioapic0: pin 5 attached to isa0 irq 5 (type 0x0<type=0x0> flags
0x0<pol=0x0,trig=0x0>)
ioapic0: pin 6 attached to isa0 irq 6 (type 0x0<type=0x0> flags
0x0<pol=0x0,trig=0x0>)
ioapic0: pin 7 attached to isa0 irq 7 (type 0x0<type=0x0> flags
0x0<pol=0x0,trig=0x0>)
ioapic0: pin 8 attached to isa0 irq 8 (type 0x0<type=0x0> flags
0x0<pol=0x0,trig=0x0>)
ioapic0: pin 9 attached to isa0 irq 9 (type 0x0<type=0x0> flags
0xd<pol=0x1=Act Hi,trig=0x3=Level>)
ioapic0: pin 10 attached to isa0 irq 10 (type 0x0<type=0x0> flags
0x0<pol=0x0,trig=0x0>)
ioapic0: pin 11 attached to isa0 irq 11 (type 0x0<type=0x0> flags
0x0<pol=0x0,trig=0x0>)
ioapic0: pin 12 attached to isa0 irq 12 (type 0x0<type=0x0> flags
0x0<pol=0x0,trig=0x0>)
ioapic0: pin 13 attached to isa0 irq 13 (type 0x0<type=0x0> flags
0x0<pol=0x0,trig=0x0>)
ioapic0: pin 14 attached to isa0 irq 14 (type 0x0<type=0x0> flags
0x0<pol=0x0,trig=0x0>)
ioapic0: pin 15 attached to isa0 irq 15 (type 0x0<type=0x0> flags
0x0<pol=0x0,trig=0x0>)
local apic: pin 1 attached to NMI (type 0x1<type=0x1=NMI> flags
0x0<pol=0x0,trig=0x0>)
local apic: pin 1 attached to NMI (type 0x1<type=0x1=NMI> flags
0x0<pol=0x0,trig=0x0>)
ioapic0: pin 16 attached to pci0 device 1 INT_A (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 16 attached to pci0 device 2 INT_A (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 16 attached to pci0 device 3 INT_A (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 17 attached to pci0 device 3 INT_B (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 18 attached to pci0 device 3 INT_C (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 19 attached to pci0 device 3 INT_D (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 16 attached to pci0 device 6 INT_A (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 16 attached to pci0 device 25 INT_A (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 16 attached to pci0 device 26 INT_A (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 17 attached to pci0 device 26 INT_B (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 18 attached to pci0 device 26 INT_C (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 16 attached to pci0 device 27 INT_A (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 16 attached to pci0 device 28 INT_A (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 17 attached to pci0 device 28 INT_B (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 18 attached to pci0 device 28 INT_C (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 19 attached to pci0 device 28 INT_D (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 23 attached to pci0 device 29 INT_A (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 22 attached to pci0 device 29 INT_B (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 21 attached to pci0 device 29 INT_C (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 19 attached to pci0 device 29 INT_D (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 16 attached to pci0 device 31 INT_A (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 17 attached to pci0 device 31 INT_B (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 18 attached to pci0 device 31 INT_C (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 19 attached to pci0 device 31 INT_D (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic1: pin 0 attached to pci2 device 1 INT_A (type 0x0<type=0x0> flags
0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic1: pin 1 attached to pci2 device 1 INT_B (type 0x0<type=0x0> flags
0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic1: pin 2 attached to pci2 device 1 INT_C (type 0x0<type=0x0> flags
0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic1: pin 3 attached to pci2 device 1 INT_D (type 0x0<type=0x0> flags
0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic1: pin 4 attached to pci2 device 2 INT_A (type 0x0<type=0x0> flags
0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic1: pin 5 attached to pci2 device 2 INT_B (type 0x0<type=0x0> flags
0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic1: pin 4 attached to pci2 device 2 INT_C (type 0x0<type=0x0> flags
0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic1: pin 5 attached to pci2 device 2 INT_D (type 0x0<type=0x0> flags
0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 16 attached to pci5 device 0 INT_A (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 17 attached to pci5 device 0 INT_B (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 18 attached to pci5 device 0 INT_C (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 19 attached to pci5 device 0 INT_D (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 17 attached to pci7 device 0 INT_A (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 18 attached to pci7 device 0 INT_B (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 19 attached to pci7 device 0 INT_C (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 16 attached to pci7 device 0 INT_D (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 18 attached to pci9 device 0 INT_A (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 19 attached to pci9 device 0 INT_B (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 16 attached to pci9 device 0 INT_C (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 17 attached to pci9 device 0 INT_D (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 19 attached to pci11 device 0 INT_A (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 16 attached to pci11 device 0 INT_B (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 17 attached to pci11 device 0 INT_C (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 18 attached to pci11 device 0 INT_D (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 20 attached to pci17 device 0 INT_A (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 21 attached to pci17 device 0 INT_B (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 22 attached to pci17 device 0 INT_C (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 23 attached to pci17 device 0 INT_D (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 22 attached to pci17 device 3 INT_A (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
ioapic0: pin 23 attached to pci17 device 4 INT_A (type 0x0<type=0x0>
flags 0xf<pol=0x3=Act Lo,trig=0x3=Level>)
acpi0: SCI interrupting at int 9
timecounter: Timecounter "ACPI-Fast" frequency 3579545 Hz quality 1000
hpet0 at acpi0: high precision event timer (mem 0xfed00000-0xfed00400)
timecounter: Timecounter "hpet0" frequency 14318180 Hz quality 2000
MBRD (PNP0C02) at acpi0 not configured
pcppi0 at acpi0 (SPKR, PNP0800): io 0x61
sysbeep0 at pcppi0
attimer0 at acpi0 (TIMR, PNP0100): io 0x40-0x43,0x50-0x53
FWH (INT0800) at acpi0 not configured
pckbc0 at acpi0 (KBC0, PNP0303) (kbd port): io 0x60,0x64 irq 1
pckbc1 at acpi0 (MSE0, PNP0F13) (aux port): irq 12
com0 at acpi0 (COM1, PNP0501-1): io 0x3f8-0x3ff irq 4
com0: ns16550a, working fifo
com0: console
ioapic0: int4 0x81<vector=0x81,delmode=0x0,dest=0x0> 0x0<target=0x0>
com1 at acpi0 (COM2, PNP0501-2): io 0x2f8-0x2ff irq 3
com1: ns16550a, working fifo
ioapic0: int3 0x82<vector=0x82,delmode=0x0,dest=0x0> 0x0<target=0x0>
acpibut0 at acpi0 (PWRB, PNP0C0C): ACPI Power Button
apm0 at acpi0: Power Management spec V1.2
attimer0: attached to pcppi0
pckbd0 at pckbc0 (kbd slot)
ioapic0: int1 0x61<vector=0x61,delmode=0x0,dest=0x0> 0x0<target=0x0>
pckbc0: using irq 1 for kbd slot
wskbd0 at pckbd0 (mux ignored)
pci0 at mainbus0 bus 0: configuration mode 1
mainbus0: added to list as bus 0
pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
pchb0 at pci0 dev 0 function 0: Intel product 0x29f0 (rev. 0x01)
ppb0 at pci0 dev 1 function 0: Intel product 0x29f1 (rev. 0x01)
ppb0: PCI Express 2.0 <Root Port of PCI-E Root Complex>
pci1 at ppb0 bus 1
ppb0: added to list as bus 1
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
ppb1 at pci1 dev 0 function 0: Intel product 0x032c (rev. 0x09)
ppb1: PCI Express 1.0 <PCI-E to PCI/PCI-X Bridge>
pci2 at ppb1 bus 2
ppb1: added to list as bus 2
pci2: i/o space, memory space enabled, rd/line, wr/inv ok
wm0 at pci2 dev 1 function 0: Intel i82546GB 1000BASE-T Ethernet (rev. 0x03)
ioapic1: int0 0xa062<vector=0x62,delmode=0x0,actlo,level,dest=0x0>
0x0<target=0x0>
wm0: interrupting at ioapic1 pin 0
wm0: 64-bit 133MHz PCIX bus
wm0: 256 word (8 address bits) MicroWire EEPROM
wm0: Ethernet address 00:1b:21:06:12:2a
makphy0 at wm0 phy 1: Marvell 88E1011 Gigabit PHY, rev. 5
makphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
wm1 at pci2 dev 1 function 1: Intel i82546GB 1000BASE-T Ethernet (rev. 0x03)
ioapic1: int1 0xa063<vector=0x63,delmode=0x0,actlo,level,dest=0x0>
0x0<target=0x0>
wm1: interrupting at ioapic1 pin 1
wm1: 64-bit 133MHz PCIX bus
wm1: 256 word (8 address bits) MicroWire EEPROM
wm1: Ethernet address 00:1b:21:06:12:2b
makphy1 at wm1 phy 1: Marvell 88E1011 Gigabit PHY, rev. 5
makphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
Intel product 0x0326 (interrupt system, interface 0x20, revision 0x09)
at pci1 dev 0 function 1 not configured
uhci0 at pci0 dev 26 function 0: Intel 82801I USB UHCI Controller (rev.
0x02)
ioapic0: int16 0xa064<vector=0x64,delmode=0x0,actlo,level,dest=0x0>
0x0<target=0x0>
uhci0: interrupting at ioapic0 pin 16
usb0 at uhci0: USB revision 1.0
uhci1 at pci0 dev 26 function 1: Intel 82801I USB UHCI Controller (rev.
0x02)
ioapic0: int17 0xa065<vector=0x65,delmode=0x0,actlo,level,dest=0x0>
0x0<target=0x0>
uhci1: interrupting at ioapic0 pin 17
usb1 at uhci1: USB revision 1.0
uhci2 at pci0 dev 26 function 2: Intel 82801I USB UHCI Controller (rev.
0x02)
ioapic0: int18 0xa066<vector=0x66,delmode=0x0,actlo,level,dest=0x0>
0x0<target=0x0>
uhci2: interrupting at ioapic0 pin 18
usb2 at uhci2: USB revision 1.0
ehci0 at pci0 dev 26 function 7: Intel 82801I USB EHCI Controller (rev.
0x02)
ioapic0: int18 0xa066<vector=0x66,delmode=0x0,actlo,level,dest=0x0>
0x0<target=0x0>
ehci0: interrupting at ioapic0 pin 18
ehci0: EHCI version 1.0
ehci0: companion controllers, 2 ports each: uhci0 uhci1 uhci2
usb3 at ehci0: USB revision 2.0
ppb2 at pci0 dev 28 function 0: Intel 82801I PCI Express Port #1 (rev. 0x02)
ppb2: PCI Express 1.0 <Root Port of PCI-E Root Complex>
pci3 at ppb2 bus 5
ppb2: added to list as bus 5
pci3: i/o space, memory space enabled, rd/line, wr/inv ok
wm2 at pci3 dev 0 function 0: Intel i82573E IAMT (rev. 0x03)
ioapic0: int16 0xa064<vector=0x64,delmode=0x0,actlo,level,dest=0x0>
0x0<target=0x0>
wm2: interrupting at ioapic0 pin 16
wm2: PCI-Express bus
wm2: 256 word (8 address bits) SPI EEPROM
wm2: Ethernet address 00:30:48:fd:4f:8a
makphy2 at wm2 phy 1: Marvell 88E1111 Gigabit PHY, rev. 2
makphy2: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
ppb3 at pci0 dev 28 function 1: Intel 82801I PCI Express Port #2 (rev. 0x02)
ppb3: PCI Express 1.0 <Root Port of PCI-E Root Complex>
pci4 at ppb3 bus 7
ppb3: added to list as bus 7
pci4: i/o space, memory space enabled, rd/line, wr/inv ok
wm3 at pci4 dev 0 function 0: Intel i82573L Gigabit Ethernet (rev. 0x00)
ioapic0: int17 0xa065<vector=0x65,delmode=0x0,actlo,level,dest=0x0>
0x0<target=0x0>
wm3: interrupting at ioapic0 pin 17
wm3: PCI-Express bus
wm3: 256 word (8 address bits) SPI EEPROM
wm3: Ethernet address 00:30:48:fd:4f:8b
makphy3 at wm3 phy 1: Marvell 88E1111 Gigabit PHY, rev. 2
makphy3: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
ppb4 at pci0 dev 28 function 2: Intel 82801I PCI Express Port #3 (rev. 0x02)
ppb4: PCI Express 1.0 <Root Port of PCI-E Root Complex>
pci5 at ppb4 bus 9
ppb4: added to list as bus 9
pci5: i/o space, memory space enabled, rd/line, wr/inv ok
wm4 at pci5 dev 0 function 0: Intel i82573L Gigabit Ethernet (rev. 0x00)
ioapic0: int18 0xa066<vector=0x66,delmode=0x0,actlo,level,dest=0x0>
0x0<target=0x0>
wm4: interrupting at ioapic0 pin 18
wm4: PCI-Express bus
wm4: 256 word (8 address bits) SPI EEPROM
wm4: Ethernet address 00:30:48:fd:4f:8c
makphy4 at wm4 phy 1: Marvell 88E1111 Gigabit PHY, rev. 2
makphy4: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
ppb5 at pci0 dev 28 function 3: Intel 82801I PCI Express Port #4 (rev. 0x02)
ppb5: PCI Express 1.0 <Root Port of PCI-E Root Complex>
pci6 at ppb5 bus 11
ppb5: added to list as bus 11
pci6: i/o space, memory space enabled, rd/line, wr/inv ok
wm5 at pci6 dev 0 function 0: Intel i82573L Gigabit Ethernet (rev. 0x00)
ioapic0: int19 0xa067<vector=0x67,delmode=0x0,actlo,level,dest=0x0>
0x0<target=0x0>
wm5: interrupting at ioapic0 pin 19
wm5: PCI-Express bus
wm5: 256 word (8 address bits) SPI EEPROM
wm5: Ethernet address 00:30:48:fd:4f:8d
makphy5 at wm5 phy 1: Marvell 88E1111 Gigabit PHY, rev. 2
makphy5: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
1000baseT-FDX, auto
uhci3 at pci0 dev 29 function 0: Intel 82801I USB UHCI Controller (rev.
0x02)
ioapic0: int23 0xa068<vector=0x68,delmode=0x0,actlo,level,dest=0x0>
0x0<target=0x0>
uhci3: interrupting at ioapic0 pin 23
usb4 at uhci3: USB revision 1.0
uhci4 at pci0 dev 29 function 1: Intel 82801I USB UHCI Controller (rev.
0x02)
ioapic0: int22 0xa069<vector=0x69,delmode=0x0,actlo,level,dest=0x0>
0x0<target=0x0>
uhci4: interrupting at ioapic0 pin 22
usb5 at uhci4: USB revision 1.0
uhci5 at pci0 dev 29 function 2: Intel 82801I USB UHCI Controller (rev.
0x02)
ioapic0: int21 0xa06a<vector=0x6a,delmode=0x0,actlo,level,dest=0x0>
0x0<target=0x0>
uhci5: interrupting at ioapic0 pin 21
usb6 at uhci5: USB revision 1.0
ehci1 at pci0 dev 29 function 7: Intel 82801I USB EHCI Controller (rev.
0x02)
ioapic0: int23 0xa068<vector=0x68,delmode=0x0,actlo,level,dest=0x0>
0x0<target=0x0>
ehci1: interrupting at ioapic0 pin 23
ehci1: EHCI version 1.0
ehci1: companion controllers, 2 ports each: uhci3 uhci4 uhci5
usb7 at ehci1: USB revision 2.0
ppb6 at pci0 dev 30 function 0: Intel 82801BA Hub-PCI Bridge (rev. 0x92)
pci7 at ppb6 bus 17
ppb6: added to list as bus 17
pci7: i/o space, memory space enabled
vga0 at pci7 dev 3 function 0: ATI Technologies ES1000 (rev. 0x02)
wsdisplay0 at vga0 (kbdmux ignored)
drm at vga0 not configured
pciide0 at pci7 dev 4 function 0: Integrated Technology Express product
0x8213 (rev. 0x00)
pciide0: bus-master DMA support present, but unused (no driver support)
pciide0: primary channel wired to native-PCI mode
ioapic0: int23 0xa068<vector=0x68,delmode=0x0,actlo,level,dest=0x0>
0x0<target=0x0>
pciide0: using ioapic0 pin 23 for native-PCI interrupt
atabus0 at pciide0 channel 0
pciide0: secondary channel wired to native-PCI mode
atabus1 at pciide0 channel 1
ichlpcib0 at pci0 dev 31 function 0: Intel 82801IR LPC Interface Bridge
(rev. 0x02)
timecounter: Timecounter "ichlpcib0" frequency 3579545 Hz quality 1000
ichlpcib0: 24-bit timer
ichlpcib0: TCO (watchdog) timer configured.
gpio0 at ichlpcib0: 64 pins
piixide0 at pci0 dev 31 function 2: Intel 82801I Serial ATA Controller
(ICH9) (rev. 0x02)
piixide0: bus-master DMA support present
piixide0: primary channel configured to compatibility mode
ioapic0: int14 0x6b<vector=0x6b,delmode=0x0,dest=0x0> 0x0<target=0x0>
piixide0: primary channel interrupting at ioapic0 pin 14
atabus2 at piixide0 channel 0
piixide0: secondary channel configured to compatibility mode
ioapic0: int15 0x6c<vector=0x6c,delmode=0x0,dest=0x0> 0x0<target=0x0>
piixide0: secondary channel interrupting at ioapic0 pin 15
atabus3 at piixide0 channel 1
ichsmb0 at pci0 dev 31 function 3: Intel 82801I SMBus Controller (rev. 0x02)
ioapic0: int17 0xa065<vector=0x65,delmode=0x0,actlo,level,dest=0x0>
0x0<target=0x0>
ichsmb0: interrupting at ioapic0 pin 17
iic0 at ichsmb0: I2C bus
spdmem0 at iic0 addr 0x50
spdmem0: DDR2 SDRAM, data ECC, 2GB, 800MHz (PC2-6400)
spdmem0: 14 rows, 10 cols, 2 ranks, 8 banks/chip, 2.50ns cycle time
spdmem0: tAA-tRCD-tRP-tRAS: 5-5-5-18
spdmem0: voltage SSTL 1.8V, refresh time 7.8us (self-refreshing)
spdmem1 at iic0 addr 0x51
spdmem1: DDR2 SDRAM, data ECC, 2GB, 800MHz (PC2-6400)
spdmem1: 14 rows, 10 cols, 2 ranks, 8 banks/chip, 2.50ns cycle time
spdmem1: tAA-tRCD-tRP-tRAS: 5-5-5-18
spdmem1: voltage SSTL 1.8V, refresh time 7.8us (self-refreshing)
spdmem2 at iic0 addr 0x52
spdmem2: DDR2 SDRAM, data ECC, 2GB, 800MHz (PC2-6400)
spdmem2: 14 rows, 10 cols, 2 ranks, 8 banks/chip, 2.50ns cycle time
spdmem2: tAA-tRCD-tRP-tRAS: 5-5-5-18
spdmem2: voltage SSTL 1.8V, refresh time 7.8us (self-refreshing)
spdmem3 at iic0 addr 0x53
spdmem3: DDR2 SDRAM, data ECC, 2GB, 800MHz (PC2-6400)
spdmem3: 14 rows, 10 cols, 2 ranks, 8 banks/chip, 2.50ns cycle time
spdmem3: tAA-tRCD-tRP-tRAS: 5-5-5-18
spdmem3: voltage SSTL 1.8V, refresh time 7.8us (self-refreshing)
piixide1 at pci0 dev 31 function 5: Intel 82801I Serial ATA Controller
(ICH9) (rev. 0x02)
piixide1: bus-master DMA support present
piixide1: primary channel wired to native-PCI mode
ioapic0: int18 0xa066<vector=0x66,delmode=0x0,actlo,level,dest=0x0>
0x0<target=0x0>
piixide1: using ioapic0 pin 18 for native-PCI interrupt
atabus4 at piixide1 channel 0
piixide1: secondary channel wired to native-PCI mode
atabus5 at piixide1 channel 1
Intel 82801I Thermal Controller (miscellaneous DASP, revision 0x02) at
pci0 dev 31 function 6 not configured
isa0 at ichlpcib0
lm0 at isa0 port 0x290-0x297
lm0: Using default temp sensors
lm0: Winbond W83627HF Hardware monitor
acpicpu0 at cpu0: ACPI CPU
acpicpu0: C1: HLT, lat 0 us, pow 0 mW
acpicpu0: P0: FFH, lat 10 us, pow 35000 mW, 3000 MHz
acpicpu0: P1: FFH, lat 10 us, pow 28000 mW, 2667 MHz
acpicpu0: P2: FFH, lat 10 us, pow 22000 mW, 2333 MHz
acpicpu0: P3: FFH, lat 10 us, pow 16000 mW, 2000 MHz
acpicpu0: T0: FFH, lat 1 us, pow 16000 mW, 100 %
acpicpu0: T1: FFH, lat 1 us, pow 14000 mW, 88 %
acpicpu0: T2: FFH, lat 1 us, pow 12000 mW, 75 %
acpicpu0: T3: FFH, lat 1 us, pow 10000 mW, 63 %
acpicpu0: T4: FFH, lat 1 us, pow 8000 mW, 50 %
acpicpu0: T5: FFH, lat 1 us, pow 6000 mW, 38 %
acpicpu0: T6: FFH, lat 1 us, pow 4000 mW, 25 %
acpicpu0: T7: FFH, lat 1 us, pow 2000 mW, 13 %
coretemp0 at cpu0: thermal sensor, 1 C resolution
acpicpu1 at cpu1: ACPI CPU
coretemp1 at cpu1: thermal sensor, 1 C resolution
timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
cpu1: prelint0 0x10000<vector=0x0,delmode=0x0,masked,dest=0x0>
0x0<target=0x0>
cpu1: prelint1 0x10000<vector=0x0,delmode=0x0,masked,dest=0x0>
0x0<target=0x0>
cpu1: timer0 0x10000<vector=0x0,delmode=0x0,masked,dest=0x0> 0x0<target=0x0>
cpu1: pcint0 0x10000<vector=0x0,delmode=0x0,masked,dest=0x0> 0x0<target=0x0>
cpu1: lint0 0x10700<vector=0x0,delmode=0x7,masked,dest=0x0> 0x0<target=0x0>
cpu1: lint1 0x400<vector=0x0,delmode=0x4,dest=0x0> 0x0<target=0x0>
One interesting thing is, that only pin 0 to 5 of ioapic1 are printed,
but ioapic1 is reported to have 24 pins ...
May this lead to some "uninitialized" pin-setup and perhaps suprious
interrupts?
W. Stukenbrock
Wolfgang Stukenbrock wrote:
> The following reply was made to PR kern/48733; it has been noted by GNATS.
>
> From: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
> To: gnats-bugs@NetBSD.org
> Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
> Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
> Date: Fri, 16 May 2014 10:20:53 +0200
>
> Hi again,
>
> I've already tried LOCK_DEBUG - no change in behaviour and I also don't
> see additional usable information. (But now I'm nearly shure, it is not
> the KERNEL_LOCK - see below ...)
>
> I've still not added the requested curcpu()->ci_biglock_count print, but
> I've searched for the irq 6 or something like this.
> The "surprising" result:
>
> s011a# grep irq /var/run/dmesg.boot
> pckbc0 at acpi0 (KBC0, PNP0303) (kbd port): io 0x60,0x64 irq 1
> pckbc1 at acpi0 (MSE0, PNP0F13) (aux port): irq 12
> com0 at acpi0 (COM1, PNP0501-1): io 0x3f8-0x3ff irq 4
> com1 at acpi0 (COM2, PNP0501-2): io 0x2f8-0x2ff irq 3
> pckbc0: using irq 1 for kbd slot
> s011a# grep interrupt !$
> grep interrupt /var/run/dmesg.boot
> acpi0: SCI interrupting at int 9
> wm0: interrupting at ioapic1 pin 0
> wm1: interrupting at ioapic1 pin 1
> Intel product 0x0326 (interrupt system, interface 0x20, revision 0x09)
> at pci1 dev 0 function 1 not configured
> uhci0: interrupting at ioapic0 pin 16
> uhci1: interrupting at ioapic0 pin 17
> uhci2: interrupting at ioapic0 pin 18
> ehci0: interrupting at ioapic0 pin 18
> wm2: interrupting at ioapic0 pin 16
> wm3: interrupting at ioapic0 pin 17
> wm4: interrupting at ioapic0 pin 18
> wm5: interrupting at ioapic0 pin 19
> uhci3: interrupting at ioapic0 pin 23
> uhci4: interrupting at ioapic0 pin 22
> uhci5: interrupting at ioapic0 pin 21
> ehci1: interrupting at ioapic0 pin 23
> pciide0: using ioapic0 pin 23 for native-PCI interrupt
> piixide0: primary channel interrupting at ioapic0 pin 14
> piixide0: secondary channel interrupting at ioapic0 pin 15
> ichsmb0: interrupting at ioapic0 pin 17
> piixide1: using ioapic0 pin 18 for native-PCI interrupt
> timecounter: Timecounter "clockinterrupt" frequency 100 Hz quality 0
> s011a# grep ioapic !$
> grep ioapic /var/run/dmesg.boot
> ioapic0 at mainbus0 apid 2: pa 0xfec00000, version 20, 24 pins
> ioapic1 at mainbus0 apid 3: pa 0xfecc0000, version 20, 24 pins
> wm0: interrupting at ioapic1 pin 0
> wm1: interrupting at ioapic1 pin 1
> uhci0: interrupting at ioapic0 pin 16
> uhci1: interrupting at ioapic0 pin 17
> uhci2: interrupting at ioapic0 pin 18
> ehci0: interrupting at ioapic0 pin 18
> wm2: interrupting at ioapic0 pin 16
> wm3: interrupting at ioapic0 pin 17
> wm4: interrupting at ioapic0 pin 18
> wm5: interrupting at ioapic0 pin 19
> uhci3: interrupting at ioapic0 pin 23
> uhci4: interrupting at ioapic0 pin 22
> uhci5: interrupting at ioapic0 pin 21
> ehci1: interrupting at ioapic0 pin 23
> pciide0: using ioapic0 pin 23 for native-PCI interrupt
> piixide0: primary channel interrupting at ioapic0 pin 14
> piixide0: secondary channel interrupting at ioapic0 pin 15
> ichsmb0: interrupting at ioapic0 pin 17
> piixide1: using ioapic0 pin 18 for native-PCI interrupt
>
>
>
>
> Hmmm - no irq6 or pin 6 !?!?!?!
> The first greating line is in /var/run/dmesg.boot, so the result is
> "complete".
> The board is reported as 'Supermicro X7SBi-LN4 (0123456789)'.
> The two "additonal" interfaces (wm0 and wm1) are on ioapic1 - no other
> HW is on that APIC.
>
>
>
> The search for not-configured devices say:
> s011a# grep confi !$
> grep confi /var/run/dmesg.boot
> MBRD (PNP0C02) at acpi0 not configured
> FWH (INT0800) at acpi0 not configured
> pci0 at mainbus0 bus 0: configuration mode 1
> Intel product 0x0326 (interrupt system, interface 0x20, revision 0x09)
> at pci1 dev 0 function 1 not configured
> drm at vga0 not configured
> ichlpcib0: TCO (watchdog) timer configured.
> piixide0: primary channel configured to compatibility mode
> piixide0: secondary channel configured to compatibility mode
> Intel 82801I Thermal Controller (miscellaneous DASP, revision 0x02) at
> pci0 dev 31 function 6 not configured
>
>
> Not really a lot - other systems have more unconfigured devices ....
> Normaly I'm testing all available drivers on a "new" HW, so I'm nearly
> 100% shure that theese devices are not covered by any driver.
>
> After that research it looks like there is something "suddenly"
> interrupting without a driver for it.
> The time it take till "freeze" varies, so a "normal timer timeout" does
> not sound reasonable.
> And why does this occure only after installing two additional interfaces
> (wm0 and wm1) that use the second APIC? The system does not show the
> problem without them.
>
> pci0 at mainbus0 bus 0: configuration mode 1
> pci0: i/o space, memory space enabled, rd/line, rd/mult, wr/inv ok
> pchb0 at pci0 dev 0 function 0: Intel product 0x29f0 (rev. 0x01)
> ppb0 at pci0 dev 1 function 0: Intel product 0x29f1 (rev. 0x01)
> ppb0: PCI Express 2.0 <Root Port of PCI-E Root Complex>
> pci1 at ppb0 bus 1
> pci1: i/o space, memory space enabled, rd/line, wr/inv ok
> ppb1 at pci1 dev 0 function 0: Intel product 0x032c (rev. 0x09)
> ppb1: PCI Express 1.0 <PCI-E to PCI/PCI-X Bridge>
> pci2 at ppb1 bus 2
> pci2: i/o space, memory space enabled, rd/line, wr/inv ok
> wm0 at pci2 dev 1 function 0: Intel i82546GB 1000BASE-T Ethernet (rev. 0x03)
> wm0: interrupting at ioapic1 pin 0
> wm0: 64-bit 133MHz PCIX bus
> wm0: NVM dump:
> 1b00 0621 2a12 0430 ffff ffff ffff ffff
> c414 2104 460b 1179 8086 1079 8086 34e8
> 000c 1079 0000 2102 10c8 ffff ffff ffff
> ffff ffff ffff ffff ffff ffff ffff ffff
> c30c ff63 5004 2102 00c8 ffff ffff ffff
> ffff ffff ffff ffff ffff ffff ffff 0602
> ffff ffff ffff ffff ffff ffff ffff ffff
> ffff ffff ffff ffff ffff ffff ffff 6be8
> wm0: 256 word (8 address bits) MicroWire EEPROM
> wm0: Ethernet address 00:1b:21:06:12:2a
> makphy0 at wm0 phy 1: Marvell 88E1011 Gigabit PHY, rev. 5
> makphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
> 1000baseT-FDX, auto
> wm1 at pci2 dev 1 function 1: Intel i82546GB 1000BASE-T Ethernet (rev. 0x03)
> wm1: interrupting at ioapic1 pin 1
> wm1: 64-bit 133MHz PCIX bus
> wm1: NVM dump:
> 1b00 0621 2a12 0430 ffff ffff ffff ffff
> c414 2104 460b 1179 8086 1079 8086 34e8
> 000c 1079 0000 2102 10c8 ffff ffff ffff
> ffff ffff ffff ffff ffff ffff ffff ffff
> c30c ff63 5004 2102 00c8 ffff ffff ffff
> ffff ffff ffff ffff ffff ffff ffff 0602
> ffff ffff ffff ffff ffff ffff ffff ffff
> ffff ffff ffff ffff ffff ffff ffff 6be8
> wm1: 256 word (8 address bits) MicroWire EEPROM
> wm1: Ethernet address 00:1b:21:06:12:2b
> makphy1 at wm1 phy 1: Marvell 88E1011 Gigabit PHY, rev. 5
> makphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseT,
> 1000baseT-FDX, auto
>
>
> Question: Are unused interrupts enabled? And if "yes", why?
> I thought I've seen some code that will print a warning about this and
> tries to reset the interrupt, but I'm not 100% shure about the
> netbsd-version I've looked at.
>
> Any hints to go on with debugging?
> It would take me a lot of time to "understand" the interrupt setup on
> the amd64 architecture ...
>
> best regards
>
> W. Stukenbrock
>
>
> Manuel Bouyer wrote:
>
> > The following reply was made to PR kern/48733; it has been noted by GNATS.
> >
> > From: Manuel Bouyer <bouyer@antioche.eu.org>
> > To: gnats-bugs@NetBSD.org
> > Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org,
> > Wolfgang.Stukenbrock@nagler-company.com
> > Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
> > Date: Mon, 12 May 2014 14:23:46 +0200
> >
> > On Fri, May 09, 2014 at 12:20:00PM +0000, Wolfgang Stukenbrock wrote:
> > > [...]
> > > db{0}> bt/a fffffe822f73f420
> > > trace: pid 0 lid 3 at 0xfffffe810e967760
> > > ether_output() at netbsd:ether_output+0x2b6
> > > ip_output() at netbsd:ip_output+0xa8f
> > > tcp_output() at netbsd:tcp_output+0x1698
> > > tcp_input() at netbsd:tcp_input+0x15d9
> > > ip_input() at netbsd:ip_input+0x3ef
> > > ipintr() at netbsd:ipintr+0x109
> > > softint_dispatch() at netbsd:softint_dispatch+0xd9
> > > DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e967d70
> > > Xsoftintr() at netbsd:Xsoftintr+0x4f
> > > --- interrupt ---
> > > 0:
> > >
> > > That is the part that is gooing to send a packet. I see the printout in
> > > ip_output prior calling 'ifp->if_output()' - not the one behind.
> > > The location pointed to by the backtrace in ether_output() is the call
> > > to "return ifq_enqueue(...)". I also see the printout I've added in
> > > front of this call, but not the one behind.
> > > In ifq_enqueue() I see the output of the call to 'ifp->if_start' - the
> > > wm-driver - in this routine and the printout in front of the splx(s) at
> > > the end of the routine - not the printout behind it.
> > > This is the localtion where the deadlock happens while processing other
> > > interrupts in Xspllower.
> > > This always looks the same ....
> >
> > ether_output() is called with the KERNEL_LOCK held, so at this point cpu0
> > already owns KERNEL_LOCK, it won't spin trying to grab it again.
> > You can confirm this by printing curcpu()->ci_biglock_count.
> > Did you try a kernel with options LOCKDEBUG ?
> >
> > What's possible here is a loop trying to process the same interrupt
> > forever.
> >
> > >
> > >
> > >
> > > db{0}> bt/a fffffe822f736440
> > > trace: pid 0 lid 6 at 0xfffffe810e9739c8
> > > breakpoint() at netbsd:breakpoint+0x5
> > > comintr() at netbsd:comintr+0x518
> > > Xintr_ioapic_edge1() at netbsd:Xintr_ioapic_edge1+0xea
> > > --- interrupt ---
> > > bus_space_read_4() at netbsd:bus_space_read_4+0xa
> > > intr_biglock_wrapper() at netbsd:intr_biglock_wrapper+0x3b
> > > Xintr_ioapic_level6() at netbsd:Xintr_ioapic_level6+0xf2
> > > --- interrupt ---
> > > Xspllower() at netbsd:Xspllower+0xe
> > > DDB lost frame for netbsd:Xsoftintr+0x4f, trying 0xfffffe810e973d70
> > > Xsoftintr() at netbsd:Xsoftintr+0x4f
> > > --- interrupt ---
> > > 0:
> > >
> > >
> > > Hmmm - not shure about it ...
> > > It looks like that during processing one pending interrupt in Xspllower
> > > at the end of that routine an interrupt came im that takes the
> > > KERNEL_LOCK in intr_biglock_wrapper() again and do what? Hangup in
> > > bus_space_read_4() ???? Busy-loop for whatever reason in that interrupt
> > > and the location where the DDB-enter occures in bus_space_read_4() is
> > > just random ????
> > > The comintr looks like the break-interrupt on the serial console of the
> > > system to enter DDB to me.
> >
> > it is.
> >
> > > Any idea to find out what interrupt routine it is???
> >
> > dmesg could point to the problem; the interrupt we're looking for is
> > level-triggered on pin 6 (so maybe "irq 6")
> >
> > --
> > Manuel Bouyer <bouyer@antioche.eu.org>
> > NetBSD: 26 ans d'experience feront toujours la difference
> > --
> >
> >
>
>
> --
>
>
> Dr. Nagler & Company GmbH
> Hauptstraße 9
> 92253 Schnaittenbach
>
> Tel. +49 9622/71 97-42
> Fax +49 9622/71 97-50
>
> Wolfgang.Stukenbrock@nagler-company.com
> http://www.nagler-company.com
>
>
> Hauptsitz: Schnaittenbach
> Handelregister: Amberg HRB
> Gerichtsstand: Amberg
> Steuernummer: 201/118/51825
> USt.-ID-Nummer: DE 273143997
> Geschäftsführer: Dr. Martin Nagler
>
>
>
--
Dr. Nagler & Company GmbH
Hauptstraße 9
92253 Schnaittenbach
Tel. +49 9622/71 97-42
Fax +49 9622/71 97-50
Wolfgang.Stukenbrock@nagler-company.com
http://www.nagler-company.com
Hauptsitz: Schnaittenbach
Handelregister: Amberg HRB
Gerichtsstand: Amberg
Steuernummer: 201/118/51825
USt.-ID-Nummer: DE 273143997
Geschäftsführer: Dr. Martin Nagler
From: "David A. Holland" <dholland@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/48733 CVS commit: src/sys/arch
Date: Mon, 2 Jun 2014 02:11:52 +0000
Module Name: src
Committed By: dholland
Date: Mon Jun 2 02:11:52 UTC 2014
Modified Files:
src/sys/arch/amd64/conf: ALL
src/sys/arch/i386/conf: ALL
Log Message:
add MPVERBOSE, noted missing in PR 48733.
To generate a diff of this commit:
cvs rdiff -u -r1.7 -r1.8 src/sys/arch/amd64/conf/ALL
cvs rdiff -u -r1.374 -r1.375 src/sys/arch/i386/conf/ALL
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/48733: deadlock in if_output() with interrupt on KERNEL_LOCK
Date: Mon, 2 Jun 2014 02:12:21 +0000
On Fri, May 16, 2014 at 11:00:01AM +0000, Wolfgang Stukenbrock wrote:
> I've found a "hidden" option "MPVERBOSE" and enabled it.
> Why the hell are some options missing in the GENERIC (or ALL - if
> present) config file?
> What is the expected way to find them ????
They aren't supposed to exist... but entropy is always increasing.
--
David A. Holland
dholland@netbsd.org
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.