NetBSD Problem Report #48935

From www@NetBSD.org  Sat Jun 21 10:58:23 2014
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 27C20A6541
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 21 Jun 2014 10:58:23 +0000 (UTC)
Message-Id: <20140621105821.AE8A2A6545@mollari.NetBSD.org>
Date: Sat, 21 Jun 2014 10:58:21 +0000 (UTC)
From: tilman@code-monkey.de
Reply-To: tilman@code-monkey.de
To: gnats-bugs@NetBSD.org
Subject: ppp: ipcp gets stuck in stopped state
X-Send-Pr-Version: www-1.0

>Number:         48935
>Category:       kern
>Synopsis:       ppp: ipcp gets stuck in stopped state
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jun 21 11:00:01 +0000 2014
>Last-Modified:  Wed Jul 16 18:05:00 +0000 2014
>Originator:     Tilman Sauerbeck
>Release:        6.1.4_PATCH
>Organization:
>Environment:
NetBSD ganesha 6.1.4_PATCH NetBSD 6.1.4_PATCH (SHEEVAPLUG) #0: Sat Jun 21 12:26:35 CEST 2014  tilman@brimstone:/tmp/netbsd/src/sys/arch/evbarm/compile/obj/SHEEVAPLUG evbarm
>Description:
I'm using a DSL modem to connect to the internet via pppoe. My ISP closes the connection if it has been up for 24 hours.

The problem is that seldomly, re-establishing the connection will fail.

I have tracked down the problem:
What happens is that during the connection attempt, a timeout occurs when configuring IPCP (= sppp_to_event() is called for IPCP with rst_counter being 0).
This leads to IPCP's tlf hook to be run, puts IPCP in the stopped state and runs sppp_lcp_check_and_close(). The latter function then realizes that there are no more NCPs started, and calls lcp.Close(). This leads to sppp_lcp_tld() being called, which currently only brings down those CPs that are marked "started" in lcp.protos.

The issue is that at this point, IPCP is *not* included in the lcp.protos bitmask and thus is not brought down and closed but will remain in the stopped state.

>How-To-Repeat:
Wait for a timeout to happen when IPCP is configured.
>Fix:
Something like this might help. I haven't been able to reproduce a timeout with this patch though so I don't know for sure if it will fix the problem:

diff --git a/sys/net/if_spppsubr.c b/sys/net/if_spppsubr.c
index 590bc50..c04f5b2 100644
--- a/sys/net/if_spppsubr.c
+++ b/sys/net/if_spppsubr.c
@@ -2591,7 +2591,6 @@ sppp_lcp_tld(struct sppp *sp)
 {
 	STDDCL;
 	int i;
-	uint32_t mask;

 	sp->pp_phase = SPPP_PHASE_TERMINATE;

@@ -2606,9 +2605,13 @@ sppp_lcp_tld(struct sppp *sp)
 	 * the Close second to prevent the upper layers from sending
 	 * ``a flurry of terminate-request packets'', as the RFC
 	 * describes it.
+	 *
+	 * Note that we ignore lcp.protos here: this prevents CPs
+	 * from remaining in the stopped state forever.
 	 */
-	for (i = 0, mask = 1; i < IDX_COUNT; i++, mask <<= 1)
-		if ((sp->lcp.protos & mask) && ((cps[i])->flags & CP_LCP) == 0) {
+	for (i = 0; i < IDX_COUNT; i++)
+		if ((sp->state[cps[i]->protoidx] != STATE_INITIAL) &&
+		    ((cps[i])->flags & CP_LCP) == 0) {
 			(cps[i])->Down(sp);
 			(cps[i])->Close(sp);
 		}

>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/48935: ppp: ipcp gets stuck in stopped state
Date: Sat, 21 Jun 2014 15:45:58 +0200

 On Sat, Jun 21, 2014 at 11:00:01AM +0000, tilman@code-monkey.de wrote
 > The issue is that at this point, IPCP is *not* included in the
 > lcp.protos bitmask and thus is not brought down and closed but will
 > remain in the stopped state.

 I wonder if we should instead do more initialization in sppp_lcp_init().

 Martin

From: Tilman Sauerbeck <tilman@code-monkey.de>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, martin@duskware.de
Subject: Re: kern/48935: ppp: ipcp gets stuck in stopped state
Date: Tue, 15 Jul 2014 17:26:12 +0200

 Martin Husemann [2014-06-21 13:50]:
 > The following reply was made to PR kern/48935; it has been noted by GNATS.
 > 
 > From: Martin Husemann <martin@duskware.de>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: kern/48935: ppp: ipcp gets stuck in stopped state
 > Date: Sat, 21 Jun 2014 15:45:58 +0200
 > 
 >  On Sat, Jun 21, 2014 at 11:00:01AM +0000, tilman@code-monkey.de wrote
 >  > The issue is that at this point, IPCP is *not* included in the
 >  > lcp.protos bitmask and thus is not brought down and closed but will
 >  > remain in the stopped state.
 >  
 >  I wonder if we should instead do more initialization in sppp_lcp_init().

 Can you elaborate? I don't yet see what you have in mind.
 Or are you suggesting to introduce another field next to lcp.protos
 that would store the "active" protocols?

 Thanks,
 Tilman

From: Martin Husemann <martin@duskware.de>
To: Tilman Sauerbeck <tilman@code-monkey.de>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/48935: ppp: ipcp gets stuck in stopped state
Date: Tue, 15 Jul 2014 20:56:54 +0200

 On Tue, Jul 15, 2014 at 05:26:12PM +0200, Tilman Sauerbeck wrote:
 > >  I wonder if we should instead do more initialization in sppp_lcp_init().
 > 
 > Can you elaborate? I don't yet see what you have in mind.
 > Or are you suggesting to introduce another field next to lcp.protos
 > that would store the "active" protocols?

 No, I mean when sppp_lcp_init() runs, there can not be any active protocols,
 so it should probably just clear lcp.protos (and maybe a few other variables).

 Martin

From: Tilman Sauerbeck <tilman@code-monkey.de>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, martin@duskware.de
Subject: Re: kern/48935: ppp: ipcp gets stuck in stopped state
Date: Wed, 16 Jul 2014 20:01:51 +0200

 Martin Husemann [2014-07-15 19:00]:
 > The following reply was made to PR kern/48935; it has been noted by GNATS.
 > 
 > From: Martin Husemann <martin@duskware.de>
 > To: Tilman Sauerbeck <tilman@code-monkey.de>
 > Cc: gnats-bugs@NetBSD.org
 > Subject: Re: kern/48935: ppp: ipcp gets stuck in stopped state
 > Date: Tue, 15 Jul 2014 20:56:54 +0200
 > 
 >  On Tue, Jul 15, 2014 at 05:26:12PM +0200, Tilman Sauerbeck wrote:
 >  > >  I wonder if we should instead do more initialization in sppp_lcp_init().
 >  > 
 >  > Can you elaborate? I don't yet see what you have in mind.
 >  > Or are you suggesting to introduce another field next to lcp.protos
 >  > that would store the "active" protocols?
 >  
 >  No, I mean when sppp_lcp_init() runs, there can not be any active protocols,
 >  so it should probably just clear lcp.protos (and maybe a few other variables).

 I don't get how that would help in the scenario where I'm seeing the
 bug: what happens in my setup is that I'm creating the pppoe device
 once, when the system is coming up. At that time, sppp_lcp_init()
 is also called.
 However, it is not called again when the connection is brought up or down. 

 So making sppp_lcp_init() do more work would have no effect at all
 because it simply isn't run again.

 Thanks,
 Tilman

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.