NetBSD Problem Report #46411

From Wolfgang.Stukenbrock@nagler-company.com  Fri May  4 12:49:10 2012
Return-Path: <Wolfgang.Stukenbrock@nagler-company.com>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id 8591563B915
	for <gnats-bugs@gnats.NetBSD.org>; Fri,  4 May 2012 12:49:10 +0000 (UTC)
Message-Id: <20120504124918.789095F5F2@e010.nagler-company.com>
Date: Fri,  4 May 2012 14:49:18 +0200 (CEST)
From: Wolfgang.Stukenbrock@nagler-company.com
Reply-To: Wolfgang.Stukenbrock@nagler-company.com
To: gnats-bugs@gnats.NetBSD.org
Subject: isdnd(8) crashes if dialout failed
X-Send-Pr-Version: 3.95

>Number:         46411
>Category:       bin
>Synopsis:       isdnd(8) crashes if dialout failed
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    martin
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri May 04 12:50:00 +0000 2012
>Closed-Date:    Sun Sep 23 13:41:57 +0000 2018
>Last-Modified:  Sun Sep 23 13:41:57 +0000 2018
>Originator:     Wolfgang Stukenbrock
>Release:        NetBSD 5.1.2
>Organization:
Dr. Nagler & Company GmbH

>Environment:


System: NetBSD e010 5.1.2 NetBSD 5.1.2 (NSW-svc-ISDN) #1: Fri May 4 13:27:01 CEST 2012 wgstuken@e010:/usr/src/sys/arch/i386/compile/NSW-svc-ISDN i386
Architecture: i386
Machine: i386
>Description:
	I'm trying to get my AVM-B1 card up and running and stumbled over
	the following problem.
	I've setup ippp0 for test purpose to get the system to call "something"
	because it looks like I've a problem with out telephon system.
	I've started isdnd with max debugging "-d 0xffff" and got the following
	output:

May  4 13:56:11 e010 isdnd[589]: DBG time OK! - last 1336132570:221936 now 1336132571:232045
May  4 13:56:12 e010 isdnd[589]: DBG time OK! - last 1336132571:232413 now 1336132572:242547
May  4 13:56:13 e010 isdnd[589]: DBG time OK! - last 1336132572:242919 now 1336132573:253088
May  4 13:56:14 e010 isdnd[589]: DBG time OK! - last 1336132573:253447 now 1336132574:263551
May  4 13:56:15 e010 isdnd[589]: DBG time OK! - last 1336132574:264946 now 1336132575:274059
May  4 13:56:16 e010 isdnd[589]: DBG time OK! - last 1336132575:275427 now 1336132576:284568
May  4 13:56:17 e010 isdnd[589]: DBG time OK! - last 1336132576:285926 now 1336132577:295055
May  4 13:56:18 e010 isdnd[589]: DBG time OK! - last 1336132577:296430 now 1336132578:305557
May  4 13:56:19 e010 isdnd[589]: DBG time OK! - last 1336132578:306918 now 1336132579:316071
May  4 13:56:19 e010 isdnd[589]: DBG setup_dialout: entry I4BPPP ok!
May  4 13:56:19 e010 isdnd[589]: DBG find_by_device_for_dialout: found entry -1!
May  4 13:56:19 e010 isdnd[589]: DBG msg_dialout: dial req from ippp0
May  4 13:56:19 e010 isdnd[589]: DBG FSM event [msg-dialout]: [idle => dialing]
May  4 13:56:19 e010 isdnd[589]: DBG F_DIAL: local dial out request
May  4 13:56:19 e010 isdnd[589]: DBG select_first_dialno: only one no, no = 42
May  4 13:56:19 e010 isdnd[589]: DBG rate=90 sec/unit (day=5, beg=9:00, end=18:2.2d, current=0:13)
May  4 13:56:19 e010 isdnd[589]: CHD 00003 I4BPPP rate 90 sec/unit (rate)
May  4 13:56:19 e010 isdnd[589]: DBG sendm_connect_req: ctrl = 0, chan = -1
May  4 13:56:19 e010 isdnd[589]: DBG decr_free_channels: ctrl 0, now 1 chan free
May  4 13:56:19 e010 isdnd[589]: CHD 00003 I4BPPP dialing out from 767 to 42
May  4 13:56:19 e010 isdnd[589]: DBG time < 1 - last 1336132579:317485 now 1336132579:589002
May  4 13:56:19 e010 isdnd[589]: DBG set_channel_busy: controller [0] channel B0 set to BUSY!
May  4 13:56:19 e010 isdnd[589]: CHD 00003 I4BPPP outgoing call proceeding (ctl 0, ch 0)
May  4 13:56:19 e010 isdnd[589]: DBG time < 1 - last 1336132579:317485 now 1336132579:593085
May  4 13:56:20 e010 isdnd[589]: DBG time OK! - last 1336132579:317485 now 1336132580:596698
May  4 13:56:20 e010 isdnd[589]: DBG rate=90 sec/unit (day=5, beg=9:00, end=18:2.2d, current=0:13)
May  4 13:56:21 e010 isdnd[589]: DBG time OK! - last 1336132580:598106 now 1336132581:607199
May  4 13:56:21 e010 isdnd[589]: DBG rate=90 sec/unit (day=5, beg=9:00, end=18:2.2d, current=0:13)
May  4 13:56:22 e010 isdnd[589]: DBG time OK! - last 1336132581:608592 now 1336132582:617700
May  4 13:56:22 e010 isdnd[589]: DBG rate=90 sec/unit (day=5, beg=9:00, end=18:2.2d, current=0:13)
May  4 13:56:23 e010 isdnd[589]: DBG time OK! - last 1336132582:619075 now 1336132583:628201
May  4 13:56:23 e010 isdnd[589]: DBG rate=90 sec/unit (day=5, beg=9:00, end=18:2.2d, current=0:13)
May  4 13:56:24 e010 isdnd[589]: DBG time OK! - last 1336132583:629579 now 1336132584:654219
May  4 13:56:24 e010 isdnd[589]: DBG rate=90 sec/unit (day=5, beg=9:00, end=18:2.2d, current=0:13)
May  4 13:56:24 e010 isdnd[589]: CHD 00003 I4BPPP outgoing call disconnected (remote)
May  4 13:56:24 e010 isdnd[589]: CHD 00003 I4BPPP cause 0: normal call clearing (I4B)
May  4 13:56:24 e010 isdnd[589]: DBG set_channel_idle: controller [0] channel B0 set to IDLE!
May  4 13:56:24 e010 isdnd[589]: DBG incr_free_channels: ctrl 0, now 2 chan free
May  4 13:56:24 e010 isdnd[589]: ERR next_state: newstate > N_STATES
May  4 13:56:24 e010 isdnd[589]: DMN fatal error, daemon terminating, exitval = 1

	OK - the dial fails, but the state machine of isdnd seems to blow up in
	this situation forcing isdnd to quit.
	Not very nice.

	The config entry is the one from the sample config file, I only changed the four numbers.
	You can see them above. (767 and 42)

	I've added additional output where the exit happens and got "isdnd: event 2 cur 1 new 17 MAX 17"
	Code part:
		newstate = state_tab[event][currstate].newstate;
		if (newstate >= N_STATES)
		{
		  logit(LL_ERR, "next_state: newstate > N_STATES");
		  logit(LL_ERR, "event %d cur %d new %d MAX %d", event, currstate, newstate, N_STATES);
		  error_exit(1, "next_state: newstate > N_STATES");
		}

	This must be an EV_MDI event - at least this is index two in the state_tab.
	This is defined as 
		{{F_ill, ST_ILL},
		{F_DFL, ST_SUSE},
		{F_ill, ST_ILL},
		{F_DFL, ST_SUSE},
		{F_DFL, ST_SUSE}, 
		{F_ill, ST_ILL},
		{F_ill, ST_ILL},
		{F_ACBW,ST_ACB_WAITDIAL},
		{F_ill, ST_ILL},
		{F_ACBR, ST_SUSE},
		{F_ACBR,ST_SUSE},
		{F_IDIS,ST_IDLE},
		{F_IDIS,ST_IDLE},
		{F_MDI, ST_IDLE},
		{F_ill, ST_ILL},
		{F_MDI, ST_IDLE},
		{F_ill, ST_ILL}},

	This selects ST_SUSE (and function F_DFL) as new state.
	ST_SUSE is defined as 17 in isdnd.h - comment say: "subroutine sets new state"

	OK - here seems to be a problem around line 404 in fsm.c ...
	ST_SUSE should be a valid state while processing is done, so "if (newstate > N_STATES)"
	or "if (newstate >= N_STATES && newstate != ST_SUSE)" should fix the problem.
>How-To-Repeat:
	The crash happens all the time and an internal error is reported. So if you setup
	something where the dial fails, it should happen again.
>Fix:
	If my analyses above are correct, just change ">=" into ">" and avoid program exit
	if ST_SUSE is reached.

	At least isdnd now does no longer crash here and I endup in some retry until if
	final fails. That looks correct to me.

	But I've killed isdnd while redial was pending and now got warnings like
	"WRN msg_disconnect_ind: cdid not found"
	and the dialing state is never left anymore.

	I'm not shure if this is related to the fix above.
	It looks link another problem and isdnd seems not to initialize everything correctly
	after restart.

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: bin-bug-people->martin
Responsible-Changed-By: martin@NetBSD.org
Responsible-Changed-When: Fri, 04 May 2012 13:09:31 +0000
Responsible-Changed-Why:
Take ;-)


State-Changed-From-To: open->closed
State-Changed-By: maxv@NetBSD.org
State-Changed-When: Sun, 23 Sep 2018 13:41:57 +0000
State-Changed-Why:
isdnd was removed


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.