NetBSD Problem Report #41918
From kardel@pip.acrys.com Sat Aug 22 11:39:08 2009
Return-Path: <kardel@pip.acrys.com>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
by www.NetBSD.org (Postfix) with ESMTP id A627763C29A
for <gnats-bugs@gnats.NetBSD.org>; Sat, 22 Aug 2009 11:39:08 +0000 (UTC)
Message-Id: <200908221032.n7MAW2kf002414@pip.acrys.com>
Date: Sat, 22 Aug 2009 12:32:02 +0200 (MEST)
From: kardel@netbsd.org
Reply-To: kardel@netbsd.org
To: gnats-bugs@gnats.NetBSD.org
Subject: pppoe BROKEN: panic:kernel diagnostic assertion "!cpu_softintr_p()" ... subr_kmem.c
X-Send-Pr-Version: 3.95
>Number: 41918
>Category: kern
>Synopsis: pppoe reconnects break with panic
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Aug 22 11:40:00 +0000 2009
>Closed-Date: Mon Jun 03 06:40:22 +0000 2019
>Last-Modified: Mon Jun 03 06:40:22 +0000 2019
>Originator: Frank Kardel
>Release: NetBSD 5.99.15
>Organization:
>Environment:
System: NetBSD pip 5.99.15 NetBSD 5.99.15 (PIP) #0: Sat Aug 22 11:09:52 MEST 2009 kardel@pip:/src/obj.i386/sys/arch/i386/compile/PIP i386
Architecture: i386
Machine: i386
>Description:
pppoe reconnect reliably panic the kernel.
hand copied stack trace (because of kern/41095 still being there):
panic:kernel diagnostic assertion "!cpu_softintr_p()" failed: file ".../sys/kern/subr_kmem.c", line 194
kmem_alloc
kmem_zalloc
in6_ifattach
in6_if_up
sppp_lcp_tlu
sppp_cp_input
sppp_input
pppoe_softintr_handler
softint_dispatch
The violation of calling kmem_zalloc() (likely in in6_ifaddrs_schedule()) from a softint seems clear. Possibly pppoe shouldn't call in6_ifattach from softint.
Currently this leads to a reliable panic every 24h with german DSL providers that think forceful disconnection every 24h is a bright idea.
>How-To-Repeat:
pick a -current (e. g. 20090822060000Z) kernel. fire up pppoe, let it connect, disconnect network cable, wait for lcp echo to time out (phase=dead), re-connect network cable, wait for reconnect via pppoe, enjoy panic.
>Fix:
may change the way pppoe handles the re-connect.
>Release-Note:
>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "!cpu_softintr_p()" ... subr_kmem.c
Date: Sun, 23 Aug 2009 00:32:58 +0200
On Sat, Aug 22, 2009 at 11:40:00AM +0000, kardel@pip.acrys.com wrote:
> The violation of calling kmem_zalloc() (likely in
> in6_ifaddrs_schedule()) from a softint seems clear.
> Possibly pppoe shouldn't call in6_ifattach from softint.
Pppoe does not call in6_ifattach, but if_up().
I don't see why this should not be allowed from a softint (so IMHO the upper
layers should be fixed instead), but if we decide it isn't allowed to do so,
we could move it to a full thread context, though I don't like that either.
The way IFF_UP is used in this code certainly is ... uhm ... "creative",
so completely reworking it is worth considering, but not as simple
a task at it looks at first glance.
Martin
From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: dyoung@netbsd.org
Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "!cpu_softintr_p()"
... subr_kmem.c
Date: Sun, 23 Aug 2009 08:43:16 +0200
Besides that there is always room for improvement, the problem occurred
just recently - a feeled 1-2weeks ago (I noticed it with my -current
update of last week).
So it is likely the deferral of the setting of link-local and loopback
addresses to a thread. The workqueue setup code uses kmem_zalloc that
imports the non softintr
requirement into in6_ifattach() (in6_ifattach.c:1.83 2009-08-13). Using
malloc() instead might be an option (in6_createmkludge()) does that.
Frank
Martin Husemann wrote:
> The following reply was made to PR kern/41918; it has been noted by GNATS.
>
> From: Martin Husemann <martin@duskware.de>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "!cpu_softintr_p()" ... subr_kmem.c
> Date: Sun, 23 Aug 2009 00:32:58 +0200
>
> On Sat, Aug 22, 2009 at 11:40:00AM +0000, kardel@pip.acrys.com wrote:
> > The violation of calling kmem_zalloc() (likely in
> > in6_ifaddrs_schedule()) from a softint seems clear.
> > Possibly pppoe shouldn't call in6_ifattach from softint.
>
> Pppoe does not call in6_ifattach, but if_up().
> I don't see why this should not be allowed from a softint (so IMHO the upper
> layers should be fixed instead), but if we decide it isn't allowed to do so,
> we could move it to a full thread context, though I don't like that either.
>
> The way IFF_UP is used in this code certainly is ... uhm ... "creative",
> so completely reworking it is worth considering, but not as simple
> a task at it looks at first glance.
>
> Martin
>
>
From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: dyoung@netbsd.org
Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "!cpu_softintr_p()"
... subr_kmem.c
Date: Sun, 23 Aug 2009 08:52:58 +0200
After reading the comment in mld6.c the malloc() there is seemingly
executed at a save place to avoid calling malloc() in an interrupt context.
So we seem to have a real issue here with in6_attach being called in
interrupt contexts and having (currently) a need for dynamic allocations.
Frank
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, kardel@pip.acrys.com
Cc:
Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "!cpu_softintr_p()" ... subr_kmem.c
Date: Sun, 23 Aug 2009 06:54:35 -0400
On Aug 23, 6:45am, kardel@netbsd.org (Frank Kardel) wrote:
-- Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "
| Besides that there is always room for improvement, the problem occurred
| just recently - a feeled 1-2weeks ago (I noticed it with my -current
| update of last week).
| So it is likely the deferral of the setting of link-local and loopback
| addresses to a thread. The workqueue setup code uses kmem_zalloc that
| imports the non softintr
| requirement into in6_ifattach() (in6_ifattach.c:1.83 2009-08-13). Using
| malloc() instead might be an option (in6_createmkludge()) does that.
|
| Frank
That also seems to break if_iwn.c during boot.
christos
From: David Young <dyoung@pobox.com>
To: Christos Zoulas <christos@zoulas.com>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
kardel@pip.acrys.com
Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion
"!cpu_softintr_p()" ... subr_kmem.c
Date: Sun, 23 Aug 2009 12:44:23 -0500
On Sun, Aug 23, 2009 at 06:54:35AM -0400, Christos Zoulas wrote:
> On Aug 23, 6:45am, kardel@netbsd.org (Frank Kardel) wrote:
> -- Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "
>
> | Besides that there is always room for improvement, the problem occurred
> | just recently - a feeled 1-2weeks ago (I noticed it with my -current
> | update of last week).
> | So it is likely the deferral of the setting of link-local and loopback
> | addresses to a thread. The workqueue setup code uses kmem_zalloc that
> | imports the non softintr
> | requirement into in6_ifattach() (in6_ifattach.c:1.83 2009-08-13). Using
> | malloc() instead might be an option (in6_createmkludge()) does that.
> |
> | Frank
>
> That also seems to break if_iwn.c during boot.
I don't know if it is related, but iwn(4) and pppoe(4) are both doing
"funny" things with IFF_UP. It is inexcusable in the case of iwn(4),
which should modify IFF_RUNNING, instead. I don't understand how/why
pppoe(4) messes with IFF_UP, yet, but it probably should not.
Dave
--
David Young OJC Technologies
dyoung@ojctech.com Urbana, IL * (217) 278-3933
From: Martin Husemann <martin@duskware.de>
To: David Young <dyoung@pobox.com>, Christos Zoulas <christos@zoulas.com>,
gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
kardel@pip.acrys.com
Cc:
Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "!cpu_softintr_p()" ... subr_kmem.c
Date: Sun, 23 Aug 2009 20:10:25 +0200
On Sun, Aug 23, 2009 at 12:44:23PM -0500, David Young wrote:
> which should modify IFF_RUNNING, instead. I don't understand how/why
> pppoe(4) messes with IFF_UP, yet, but it probably should not.
Hysterical raisinins mostly. This part was originally done for ISDN connections.
It prevented routing further packets down the interface while the connection
was being established (which might take quite some time) - preventing a
sure imediate quueue overflow and packet loss, basically causing retransmits
and often have the (expensive) ISDN connection's idle timeout fire before
the retry.
I agree it is ugly, I'm open to suggestions how to do it differently, but
we should probably disuss that outside of gnats separately.
Martin
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, kardel@pip.acrys.com
Cc:
Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "!cpu_softintr_p()" ... subr_kmem.c
Date: Sun, 23 Aug 2009 15:29:07 -0400
On Aug 23, 5:45pm, dyoung@pobox.com (David Young) wrote:
-- Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "
| I don't know if it is related, but iwn(4) and pppoe(4) are both doing
| "funny" things with IFF_UP. It is inexcusable in the case of iwn(4),
| which should modify IFF_RUNNING, instead. I don't understand how/why
| pppoe(4) messes with IFF_UP, yet, but it probably should not.
|
What's funny about it? It just clears it on error like all the other
wireless drivers do. Can you please explain what you mean?
christos
From: David Young <dyoung@pobox.com>
To: Christos Zoulas <christos@zoulas.com>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
kardel@pip.acrys.com
Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion
"!cpu_softintr_p()" ... subr_kmem.c
Date: Mon, 24 Aug 2009 12:15:09 -0500
On Sun, Aug 23, 2009 at 03:29:07PM -0400, Christos Zoulas wrote:
> On Aug 23, 5:45pm, dyoung@pobox.com (David Young) wrote:
> -- Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "
>
> | I don't know if it is related, but iwn(4) and pppoe(4) are both doing
> | "funny" things with IFF_UP. It is inexcusable in the case of iwn(4),
> | which should modify IFF_RUNNING, instead. I don't understand how/why
> | pppoe(4) messes with IFF_UP, yet, but it probably should not.
> |
>
> What's funny about it? It just clears it on error like all the other
> wireless drivers do. Can you please explain what you mean?
iwn(4) clears IFF_UP in no fewer than five places, three times in
interrupt handlers, once in the watchdog routine, and once in the ioctl
routine. I know that all wireless drivers don't do that.
IFF_UP means "administratively enabled." It expresses the
administrator's intent that the interface should operate. We leave
it to other flags and variables to indicate whether the interface can
actually operate: IFF_RUNNING and media state.
It looks to me like iwn(4) should clear IFF_RUNNING when the radio is
inactive, but it should never clear IFF_UP.
Dave
--
David Young OJC Technologies
dyoung@ojctech.com Urbana, IL * (217) 278-3933
From: christos@zoulas.com (Christos Zoulas)
To: David Young <dyoung@pobox.com>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, kardel@pip.acrys.com
Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "!cpu_softintr_p()" ... subr_kmem.c
Date: Mon, 24 Aug 2009 13:58:23 -0400
On Aug 24, 12:15pm, dyoung@pobox.com (David Young) wrote:
-- Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "
| On Sun, Aug 23, 2009 at 03:29:07PM -0400, Christos Zoulas wrote:
| > On Aug 23, 5:45pm, dyoung@pobox.com (David Young) wrote:
| > -- Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "
| >
| > | I don't know if it is related, but iwn(4) and pppoe(4) are both doing
| > | "funny" things with IFF_UP. It is inexcusable in the case of iwn(4),
| > | which should modify IFF_RUNNING, instead. I don't understand how/why
| > | pppoe(4) messes with IFF_UP, yet, but it probably should not.
| > |
| >
| > What's funny about it? It just clears it on error like all the other
| > wireless drivers do. Can you please explain what you mean?
|
| iwn(4) clears IFF_UP in no fewer than five places, three times in
| interrupt handlers, once in the watchdog routine, and once in the ioctl
| routine. I know that all wireless drivers don't do that.
Well, it is no worse than if_ipw, if_iwi, if_wpi... They are the same
code basically.
| IFF_UP means "administratively enabled." It expresses the
| administrator's intent that the interface should operate. We leave
| it to other flags and variables to indicate whether the interface can
| actually operate: IFF_RUNNING and media state.
|
| It looks to me like iwn(4) should clear IFF_RUNNING when the radio is
| inactive, but it should never clear IFF_UP.
Then we need to change [in the pci directory only]:
if_de.c: sc->tulip_if.if_flags &= ~(IFF_UP|IFF_RUNNING);
if_ipw.c: ic->ic_ifp->if_flags &= ~IFF_UP;
if_ipw.c: sc->sc_ic.ic_ifp->if_flags &= ~IFF_UP;
if_ipw.c: ifp->if_flags &= ~IFF_UP;
if_ipw.c:fail: ifp->if_flags &= ~IFF_UP;
if_iwi.c: sc->sc_ic.ic_ifp->if_flags &= ~IFF_UP;
if_iwi.c: sc->sc_ic.ic_ifp->if_flags &= ~IFF_UP;
if_iwi.c: ifp->if_flags &= ~IFF_UP;
if_iwi.c:fail: ifp->if_flags &= ~IFF_UP;
if_iwn.c: ifp->if_flags &= ~IFF_UP;
if_iwn.c: ifp->if_flags &= ~IFF_UP;
if_iwn.c: sc->sc_ic.ic_ifp->if_flags &= ~IFF_UP;
if_iwn.c: ifp->if_flags &= ~IFF_UP;
if_iwn.c: ifp->if_flags &= ~IFF_UP;
if_lmc.c: sc->ifp->if_flags &= ~IFF_UP; /* down */
if_lmc.c: sc->ifp->if_flags &= ~IFF_UP; /* down */
if_wpi.c: ifp->if_flags &= ~IFF_UP;
if_wpi.c: sc->sc_ic.ic_ifp->if_flags &= ~IFF_UP;
if_wpi.c: ifp->if_flags &= ~IFF_UP;
And also:
cxgb_main.c: ifp->if_flags |= IFF_UP;
if_de.c: ifp->if_flags |= IFF_UP;
if_lmc.c: sc->ifp->if_flags |= IFF_UP; /* up and not running */
if_lmc.c: sc->ifp->if_flags |= IFF_UP; /* a Unix tradition */
if_nfe.c: ifp->if_flags |= IFF_UP;
if_ti.c: ifp->if_flags |= IFF_UP;
if_txp.c: ifp->if_flags |= IFF_UP;
But if we do that, what can set IFF_RUNNING again? I.e. how do we reset
the card?
christos
State-Changed-From-To: open->closed
State-Changed-By: kardel@NetBSD.org
State-Changed-When: Mon, 03 Jun 2019 06:40:22 +0000
State-Changed-Why:
no change since 2009
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.