NetBSD Problem Report #48935
From www@NetBSD.org Sat Jun 21 10:58:23 2014
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 27C20A6541
for <gnats-bugs@gnats.NetBSD.org>; Sat, 21 Jun 2014 10:58:23 +0000 (UTC)
Message-Id: <20140621105821.AE8A2A6545@mollari.NetBSD.org>
Date: Sat, 21 Jun 2014 10:58:21 +0000 (UTC)
From: tilman@code-monkey.de
Reply-To: tilman@code-monkey.de
To: gnats-bugs@NetBSD.org
Subject: ppp: ipcp gets stuck in stopped state
X-Send-Pr-Version: www-1.0
>Number: 48935
>Category: kern
>Synopsis: ppp: ipcp gets stuck in stopped state
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Jun 21 11:00:01 +0000 2014
>Last-Modified: Wed Jul 16 18:05:00 +0000 2014
>Originator: Tilman Sauerbeck
>Release: 6.1.4_PATCH
>Organization:
>Environment:
NetBSD ganesha 6.1.4_PATCH NetBSD 6.1.4_PATCH (SHEEVAPLUG) #0: Sat Jun 21 12:26:35 CEST 2014 tilman@brimstone:/tmp/netbsd/src/sys/arch/evbarm/compile/obj/SHEEVAPLUG evbarm
>Description:
I'm using a DSL modem to connect to the internet via pppoe. My ISP closes the connection if it has been up for 24 hours.
The problem is that seldomly, re-establishing the connection will fail.
I have tracked down the problem:
What happens is that during the connection attempt, a timeout occurs when configuring IPCP (= sppp_to_event() is called for IPCP with rst_counter being 0).
This leads to IPCP's tlf hook to be run, puts IPCP in the stopped state and runs sppp_lcp_check_and_close(). The latter function then realizes that there are no more NCPs started, and calls lcp.Close(). This leads to sppp_lcp_tld() being called, which currently only brings down those CPs that are marked "started" in lcp.protos.
The issue is that at this point, IPCP is *not* included in the lcp.protos bitmask and thus is not brought down and closed but will remain in the stopped state.
>How-To-Repeat:
Wait for a timeout to happen when IPCP is configured.
>Fix:
Something like this might help. I haven't been able to reproduce a timeout with this patch though so I don't know for sure if it will fix the problem:
diff --git a/sys/net/if_spppsubr.c b/sys/net/if_spppsubr.c
index 590bc50..c04f5b2 100644
--- a/sys/net/if_spppsubr.c
+++ b/sys/net/if_spppsubr.c
@@ -2591,7 +2591,6 @@ sppp_lcp_tld(struct sppp *sp)
{
STDDCL;
int i;
- uint32_t mask;
sp->pp_phase = SPPP_PHASE_TERMINATE;
@@ -2606,9 +2605,13 @@ sppp_lcp_tld(struct sppp *sp)
* the Close second to prevent the upper layers from sending
* ``a flurry of terminate-request packets'', as the RFC
* describes it.
+ *
+ * Note that we ignore lcp.protos here: this prevents CPs
+ * from remaining in the stopped state forever.
*/
- for (i = 0, mask = 1; i < IDX_COUNT; i++, mask <<= 1)
- if ((sp->lcp.protos & mask) && ((cps[i])->flags & CP_LCP) == 0) {
+ for (i = 0; i < IDX_COUNT; i++)
+ if ((sp->state[cps[i]->protoidx] != STATE_INITIAL) &&
+ ((cps[i])->flags & CP_LCP) == 0) {
(cps[i])->Down(sp);
(cps[i])->Close(sp);
}
>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/48935: ppp: ipcp gets stuck in stopped state
Date: Sat, 21 Jun 2014 15:45:58 +0200
On Sat, Jun 21, 2014 at 11:00:01AM +0000, tilman@code-monkey.de wrote
> The issue is that at this point, IPCP is *not* included in the
> lcp.protos bitmask and thus is not brought down and closed but will
> remain in the stopped state.
I wonder if we should instead do more initialization in sppp_lcp_init().
Martin
From: Tilman Sauerbeck <tilman@code-monkey.de>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, martin@duskware.de
Subject: Re: kern/48935: ppp: ipcp gets stuck in stopped state
Date: Tue, 15 Jul 2014 17:26:12 +0200
Martin Husemann [2014-06-21 13:50]:
> The following reply was made to PR kern/48935; it has been noted by GNATS.
>
> From: Martin Husemann <martin@duskware.de>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: kern/48935: ppp: ipcp gets stuck in stopped state
> Date: Sat, 21 Jun 2014 15:45:58 +0200
>
> On Sat, Jun 21, 2014 at 11:00:01AM +0000, tilman@code-monkey.de wrote
> > The issue is that at this point, IPCP is *not* included in the
> > lcp.protos bitmask and thus is not brought down and closed but will
> > remain in the stopped state.
>
> I wonder if we should instead do more initialization in sppp_lcp_init().
Can you elaborate? I don't yet see what you have in mind.
Or are you suggesting to introduce another field next to lcp.protos
that would store the "active" protocols?
Thanks,
Tilman
From: Martin Husemann <martin@duskware.de>
To: Tilman Sauerbeck <tilman@code-monkey.de>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/48935: ppp: ipcp gets stuck in stopped state
Date: Tue, 15 Jul 2014 20:56:54 +0200
On Tue, Jul 15, 2014 at 05:26:12PM +0200, Tilman Sauerbeck wrote:
> > I wonder if we should instead do more initialization in sppp_lcp_init().
>
> Can you elaborate? I don't yet see what you have in mind.
> Or are you suggesting to introduce another field next to lcp.protos
> that would store the "active" protocols?
No, I mean when sppp_lcp_init() runs, there can not be any active protocols,
so it should probably just clear lcp.protos (and maybe a few other variables).
Martin
From: Tilman Sauerbeck <tilman@code-monkey.de>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, martin@duskware.de
Subject: Re: kern/48935: ppp: ipcp gets stuck in stopped state
Date: Wed, 16 Jul 2014 20:01:51 +0200
Martin Husemann [2014-07-15 19:00]:
> The following reply was made to PR kern/48935; it has been noted by GNATS.
>
> From: Martin Husemann <martin@duskware.de>
> To: Tilman Sauerbeck <tilman@code-monkey.de>
> Cc: gnats-bugs@NetBSD.org
> Subject: Re: kern/48935: ppp: ipcp gets stuck in stopped state
> Date: Tue, 15 Jul 2014 20:56:54 +0200
>
> On Tue, Jul 15, 2014 at 05:26:12PM +0200, Tilman Sauerbeck wrote:
> > > I wonder if we should instead do more initialization in sppp_lcp_init().
> >
> > Can you elaborate? I don't yet see what you have in mind.
> > Or are you suggesting to introduce another field next to lcp.protos
> > that would store the "active" protocols?
>
> No, I mean when sppp_lcp_init() runs, there can not be any active protocols,
> so it should probably just clear lcp.protos (and maybe a few other variables).
I don't get how that would help in the scenario where I'm seeing the
bug: what happens in my setup is that I'm creating the pppoe device
once, when the system is coming up. At that time, sppp_lcp_init()
is also called.
However, it is not called again when the connection is brought up or down.
So making sppp_lcp_init() do more work would have no effect at all
because it simply isn't run again.
Thanks,
Tilman
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.