NetBSD Problem Report #41918

From kardel@pip.acrys.com  Sat Aug 22 11:39:08 2009
Return-Path: <kardel@pip.acrys.com>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id A627763C29A
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 22 Aug 2009 11:39:08 +0000 (UTC)
Message-Id: <200908221032.n7MAW2kf002414@pip.acrys.com>
Date: Sat, 22 Aug 2009 12:32:02 +0200 (MEST)
From: kardel@netbsd.org
Reply-To: kardel@netbsd.org
To: gnats-bugs@gnats.NetBSD.org
Subject: pppoe BROKEN: panic:kernel diagnostic assertion "!cpu_softintr_p()" ... subr_kmem.c
X-Send-Pr-Version: 3.95

>Number:         41918
>Category:       kern
>Synopsis:       pppoe reconnects break with panic
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Aug 22 11:40:00 +0000 2009
>Closed-Date:    Mon Jun 03 06:40:22 +0000 2019
>Last-Modified:  Mon Jun 03 06:40:22 +0000 2019
>Originator:     Frank Kardel
>Release:        NetBSD 5.99.15
>Organization:

>Environment:


System: NetBSD pip 5.99.15 NetBSD 5.99.15 (PIP) #0: Sat Aug 22 11:09:52 MEST 2009 kardel@pip:/src/obj.i386/sys/arch/i386/compile/PIP i386
Architecture: i386
Machine: i386
>Description:
	pppoe reconnect reliably panic the kernel.
	hand copied stack trace (because of kern/41095 still being there):
	panic:kernel diagnostic assertion "!cpu_softintr_p()" failed: file ".../sys/kern/subr_kmem.c", line 194
	kmem_alloc
	kmem_zalloc
	in6_ifattach
	in6_if_up
	sppp_lcp_tlu
	sppp_cp_input
	sppp_input
	pppoe_softintr_handler
	softint_dispatch

	The violation of calling kmem_zalloc() (likely in in6_ifaddrs_schedule()) from a softint seems clear. Possibly pppoe shouldn't call in6_ifattach from softint.
	Currently this leads to a reliable panic every 24h with german DSL providers that think forceful disconnection every 24h is a bright idea.
>How-To-Repeat:
	pick a -current (e. g. 20090822060000Z) kernel. fire up pppoe, let it connect, disconnect network cable, wait for lcp echo to time out (phase=dead), re-connect network cable, wait for reconnect via pppoe, enjoy panic.

>Fix:
	may change the way pppoe handles the re-connect.

>Release-Note:

>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "!cpu_softintr_p()" ... subr_kmem.c
Date: Sun, 23 Aug 2009 00:32:58 +0200

 On Sat, Aug 22, 2009 at 11:40:00AM +0000, kardel@pip.acrys.com wrote:
 > The violation of calling kmem_zalloc() (likely in
 > in6_ifaddrs_schedule()) from a softint seems clear.
 > Possibly pppoe shouldn't call in6_ifattach from softint.

 Pppoe does not call in6_ifattach, but if_up().
 I don't see why this should not be allowed from a softint (so IMHO the upper
 layers should be fixed instead), but if we decide it isn't allowed to do so,
 we could move it to a full thread context, though I don't like that either.

 The way IFF_UP is used in this code certainly is ... uhm ... "creative", 
 so completely reworking it is worth considering, but not as simple
 a task at it looks at first glance.

 Martin

From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: dyoung@netbsd.org
Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "!cpu_softintr_p()"
 ... subr_kmem.c
Date: Sun, 23 Aug 2009 08:43:16 +0200

 Besides that there is always room for improvement, the problem occurred 
 just recently - a feeled 1-2weeks ago (I noticed it with my -current 
 update of last week).
 So it is likely the deferral of the setting of link-local and loopback 
 addresses to a thread. The workqueue setup code uses kmem_zalloc that 
 imports the non softintr
 requirement into in6_ifattach() (in6_ifattach.c:1.83 2009-08-13). Using 
 malloc() instead might be an option (in6_createmkludge()) does that.

 Frank

 Martin Husemann wrote:
 > The following reply was made to PR kern/41918; it has been noted by GNATS.
 >
 > From: Martin Husemann <martin@duskware.de>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "!cpu_softintr_p()" ... subr_kmem.c
 > Date: Sun, 23 Aug 2009 00:32:58 +0200
 >
 >  On Sat, Aug 22, 2009 at 11:40:00AM +0000, kardel@pip.acrys.com wrote:
 >  > The violation of calling kmem_zalloc() (likely in
 >  > in6_ifaddrs_schedule()) from a softint seems clear.
 >  > Possibly pppoe shouldn't call in6_ifattach from softint.
 >  
 >  Pppoe does not call in6_ifattach, but if_up().
 >  I don't see why this should not be allowed from a softint (so IMHO the upper
 >  layers should be fixed instead), but if we decide it isn't allowed to do so,
 >  we could move it to a full thread context, though I don't like that either.
 >  
 >  The way IFF_UP is used in this code certainly is ... uhm ... "creative", 
 >  so completely reworking it is worth considering, but not as simple
 >  a task at it looks at first glance.
 >  
 >  Martin
 >  
 >   

From: Frank Kardel <kardel@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: dyoung@netbsd.org
Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "!cpu_softintr_p()"
 ... subr_kmem.c
Date: Sun, 23 Aug 2009 08:52:58 +0200

 After reading the comment in mld6.c the malloc() there is seemingly 
 executed at a save place to avoid calling malloc() in an interrupt context.
 So we seem to have a real issue here with in6_attach being called in 
 interrupt contexts and having (currently) a need for dynamic allocations.

 Frank

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, kardel@pip.acrys.com
Cc: 
Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "!cpu_softintr_p()" ... subr_kmem.c
Date: Sun, 23 Aug 2009 06:54:35 -0400

 On Aug 23,  6:45am, kardel@netbsd.org (Frank Kardel) wrote:
 -- Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "

 |  Besides that there is always room for improvement, the problem occurred 
 |  just recently - a feeled 1-2weeks ago (I noticed it with my -current 
 |  update of last week).
 |  So it is likely the deferral of the setting of link-local and loopback 
 |  addresses to a thread. The workqueue setup code uses kmem_zalloc that 
 |  imports the non softintr
 |  requirement into in6_ifattach() (in6_ifattach.c:1.83 2009-08-13). Using 
 |  malloc() instead might be an option (in6_createmkludge()) does that.
 |  
 |  Frank

 That also seems to break if_iwn.c during boot.

 christos

From: David Young <dyoung@pobox.com>
To: Christos Zoulas <christos@zoulas.com>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
	kardel@pip.acrys.com
Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion
	"!cpu_softintr_p()" ... subr_kmem.c
Date: Sun, 23 Aug 2009 12:44:23 -0500

 On Sun, Aug 23, 2009 at 06:54:35AM -0400, Christos Zoulas wrote:
 > On Aug 23,  6:45am, kardel@netbsd.org (Frank Kardel) wrote:
 > -- Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "
 > 
 > |  Besides that there is always room for improvement, the problem occurred 
 > |  just recently - a feeled 1-2weeks ago (I noticed it with my -current 
 > |  update of last week).
 > |  So it is likely the deferral of the setting of link-local and loopback 
 > |  addresses to a thread. The workqueue setup code uses kmem_zalloc that 
 > |  imports the non softintr
 > |  requirement into in6_ifattach() (in6_ifattach.c:1.83 2009-08-13). Using 
 > |  malloc() instead might be an option (in6_createmkludge()) does that.
 > |  
 > |  Frank
 > 
 > That also seems to break if_iwn.c during boot.

 I don't know if it is related, but iwn(4) and pppoe(4) are both doing
 "funny" things with IFF_UP.  It is inexcusable in the case of iwn(4),
 which should modify IFF_RUNNING, instead.  I don't understand how/why
 pppoe(4) messes with IFF_UP, yet, but it probably should not.

 Dave

 -- 
 David Young             OJC Technologies
 dyoung@ojctech.com      Urbana, IL * (217) 278-3933

From: Martin Husemann <martin@duskware.de>
To: David Young <dyoung@pobox.com>, Christos Zoulas <christos@zoulas.com>,
	gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
	kardel@pip.acrys.com
Cc: 
Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "!cpu_softintr_p()" ... subr_kmem.c
Date: Sun, 23 Aug 2009 20:10:25 +0200

 On Sun, Aug 23, 2009 at 12:44:23PM -0500, David Young wrote:
 > which should modify IFF_RUNNING, instead.  I don't understand how/why
 > pppoe(4) messes with IFF_UP, yet, but it probably should not.

 Hysterical raisinins mostly. This part was originally done for ISDN connections.
 It prevented routing further packets down the interface while the connection
 was being established (which might take quite some time) - preventing a
 sure imediate quueue overflow and packet loss, basically causing retransmits
 and often have the (expensive) ISDN connection's idle timeout fire before
 the retry.

 I agree it is ugly, I'm open to suggestions how to do it differently, but
 we should probably disuss that outside of gnats separately.

 Martin

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, kardel@pip.acrys.com
Cc: 
Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "!cpu_softintr_p()" ... subr_kmem.c
Date: Sun, 23 Aug 2009 15:29:07 -0400

 On Aug 23,  5:45pm, dyoung@pobox.com (David Young) wrote:
 -- Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "

 |  I don't know if it is related, but iwn(4) and pppoe(4) are both doing
 |  "funny" things with IFF_UP.  It is inexcusable in the case of iwn(4),
 |  which should modify IFF_RUNNING, instead.  I don't understand how/why
 |  pppoe(4) messes with IFF_UP, yet, but it probably should not.
 |  

 What's funny about it? It just clears it on error like all the other
 wireless drivers do. Can you please explain what you mean?

 christos

From: David Young <dyoung@pobox.com>
To: Christos Zoulas <christos@zoulas.com>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
	kardel@pip.acrys.com
Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion
	"!cpu_softintr_p()" ... subr_kmem.c
Date: Mon, 24 Aug 2009 12:15:09 -0500

 On Sun, Aug 23, 2009 at 03:29:07PM -0400, Christos Zoulas wrote:
 > On Aug 23,  5:45pm, dyoung@pobox.com (David Young) wrote:
 > -- Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "
 > 
 > |  I don't know if it is related, but iwn(4) and pppoe(4) are both doing
 > |  "funny" things with IFF_UP.  It is inexcusable in the case of iwn(4),
 > |  which should modify IFF_RUNNING, instead.  I don't understand how/why
 > |  pppoe(4) messes with IFF_UP, yet, but it probably should not.
 > |  
 > 
 > What's funny about it? It just clears it on error like all the other
 > wireless drivers do. Can you please explain what you mean?

 iwn(4) clears IFF_UP in no fewer than five places, three times in
 interrupt handlers, once in the watchdog routine, and once in the ioctl
 routine.  I know that all wireless drivers don't do that.

 IFF_UP means "administratively enabled."  It expresses the
 administrator's intent that the interface should operate.  We leave
 it to other flags and variables to indicate whether the interface can
 actually operate: IFF_RUNNING and media state.

 It looks to me like iwn(4) should clear IFF_RUNNING when the radio is
 inactive, but it should never clear IFF_UP.

 Dave

 -- 
 David Young             OJC Technologies
 dyoung@ojctech.com      Urbana, IL * (217) 278-3933

From: christos@zoulas.com (Christos Zoulas)
To: David Young <dyoung@pobox.com>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, kardel@pip.acrys.com
Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "!cpu_softintr_p()" ... subr_kmem.c
Date: Mon, 24 Aug 2009 13:58:23 -0400

 On Aug 24, 12:15pm, dyoung@pobox.com (David Young) wrote:
 -- Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "

 | On Sun, Aug 23, 2009 at 03:29:07PM -0400, Christos Zoulas wrote:
 | > On Aug 23,  5:45pm, dyoung@pobox.com (David Young) wrote:
 | > -- Subject: Re: kern/41918: pppoe BROKEN: panic:kernel diagnostic assertion "
 | > 
 | > |  I don't know if it is related, but iwn(4) and pppoe(4) are both doing
 | > |  "funny" things with IFF_UP.  It is inexcusable in the case of iwn(4),
 | > |  which should modify IFF_RUNNING, instead.  I don't understand how/why
 | > |  pppoe(4) messes with IFF_UP, yet, but it probably should not.
 | > |  
 | > 
 | > What's funny about it? It just clears it on error like all the other
 | > wireless drivers do. Can you please explain what you mean?
 | 
 | iwn(4) clears IFF_UP in no fewer than five places, three times in
 | interrupt handlers, once in the watchdog routine, and once in the ioctl
 | routine.  I know that all wireless drivers don't do that.

 Well, it is no worse than if_ipw, if_iwi, if_wpi... They are the same
 code basically.

 | IFF_UP means "administratively enabled."  It expresses the
 | administrator's intent that the interface should operate.  We leave
 | it to other flags and variables to indicate whether the interface can
 | actually operate: IFF_RUNNING and media state.
 | 
 | It looks to me like iwn(4) should clear IFF_RUNNING when the radio is
 | inactive, but it should never clear IFF_UP.

 Then we need to change [in the pci directory only]:

     if_de.c:                sc->tulip_if.if_flags &= ~(IFF_UP|IFF_RUNNING);
     if_ipw.c:               ic->ic_ifp->if_flags &= ~IFF_UP;
     if_ipw.c:               sc->sc_ic.ic_ifp->if_flags &= ~IFF_UP;
     if_ipw.c:                       ifp->if_flags &= ~IFF_UP;
     if_ipw.c:fail:  ifp->if_flags &= ~IFF_UP;

     if_iwi.c:               sc->sc_ic.ic_ifp->if_flags &= ~IFF_UP;
     if_iwi.c:               sc->sc_ic.ic_ifp->if_flags &= ~IFF_UP;
     if_iwi.c:                       ifp->if_flags &= ~IFF_UP;
     if_iwi.c:fail:  ifp->if_flags &= ~IFF_UP;

     if_iwn.c:                       ifp->if_flags &= ~IFF_UP;
     if_iwn.c:                               ifp->if_flags &= ~IFF_UP;
     if_iwn.c:               sc->sc_ic.ic_ifp->if_flags &= ~IFF_UP;
     if_iwn.c:                       ifp->if_flags &= ~IFF_UP;
     if_iwn.c:                               ifp->if_flags &= ~IFF_UP;

     if_lmc.c:  sc->ifp->if_flags &= ~IFF_UP; /* down */
     if_lmc.c:  sc->ifp->if_flags &= ~IFF_UP; /* down */

     if_wpi.c:                               ifp->if_flags &= ~IFF_UP;
     if_wpi.c:               sc->sc_ic.ic_ifp->if_flags &= ~IFF_UP;
     if_wpi.c:                       ifp->if_flags &= ~IFF_UP;

 And also:

     cxgb_main.c:            ifp->if_flags |= IFF_UP;
     if_de.c:		ifp->if_flags |= IFF_UP;
     if_lmc.c:		sc->ifp->if_flags |= IFF_UP; /* up and not running */
     if_lmc.c:		sc->ifp->if_flags |= IFF_UP; /* a Unix tradition */
     if_nfe.c:               ifp->if_flags |= IFF_UP;
     if_ti.c:                ifp->if_flags |= IFF_UP;
     if_txp.c:               ifp->if_flags |= IFF_UP;

 But if we do that, what can set IFF_RUNNING again? I.e. how do we reset
 the card?

 christos

State-Changed-From-To: open->closed
State-Changed-By: kardel@NetBSD.org
State-Changed-When: Mon, 03 Jun 2019 06:40:22 +0000
State-Changed-Why:
no change since 2009


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.