NetBSD Problem Report #41724

From buhrow@lothlorien.nfbcal.org  Tue Jul 14 17:19:48 2009
Return-Path: <buhrow@lothlorien.nfbcal.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id B977263BADF
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 14 Jul 2009 17:19:48 +0000 (UTC)
Message-Id: <200907141719.n6EHJmJw000608@lothlorien.nfbcal.org>
Date: Tue, 14 Jul 2009 10:19:48 -0700 (PDT)
From: Brian Buhrow <buhrow@lothlorien.nfbcal.org>
To: gnats-bugs@gnats.NetBSD.org
Subject: Using the IPMI watchdog timer under NetBSD 3.x, 4.x and 5.x

>Number:         41724
>Category:       kern
>Synopsis:       The watchdog tickler in the ipmi(4) driver needs to play with the polling locks for access to the bmc device itself.
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jul 14 17:20:00 +0000 2009
>Originator:     Brian Buhrow
>Release:        NetBSD 4.0_STABLE
>Organization:

>Environment:


System: NetBSD nfbcal.nfbcal.org 4.0_STABLE NetBSD 4.0_STABLE (RBL_MP) #0: Tue Mar 4 19:23:27 PST 2008 buhrow@asterisk.nfbcal.org:/usr/src/sys/arch/i386/compile/RBL_MP i386
Architecture: x86
Machine: i386
>Description:

	The ipmi(4) driver supports the environmental sensors of the BMC
management board as well as the hardware watchdog timer of the same board.
The watchdog functions access the bmc board independently of the
environmental sensor functions.  As a result, the two features end up
fighting for access to the bmc controller, resulting in less reliable
sensor readings, and occasional unintended triggerings of the hardware
watchdog timer.
In addition, under NetBSD-3, errors in  tickling the hardware timer can
result in uvm_faults, leading to extreme instability under NetBSD-3 when
the watchdog timer is enabled.
Symptoms of the problem look like:

Jul 11 04:30:57 www1 /netbsd: ipmi: watchdog tickle returned 0x19
Jul 11 04:30:57 www1 wdogctl[12]: unable to tickle watchdog timer ipmi0: Input/output error
Jul 11 04:58:50 www1 /netbsd: ipmi: watchdog tickle returned 0xffffffff
Jul 11 04:58:50 www1 wdogctl[12]: unable to tickle watchdog timer ipmi0: Input/output error
Jul 11 05:04:21 www1 /netbsd: ipmi: watchdog tickle returned 0xffffffff
Jul 11 05:04:21 www1 wdogctl[12]: unable to tickle watchdog timer ipmi0: Input/output error
Jul 11 05:09:53 www1 /netbsd: ipmi: watchdog tickle returned 0xffffffff
Jul 11 05:09:53 www1 wdogctl[12]: unable to tickle watchdog timer ipmi0: Input/output error
Jul 11 05:15:24 www1 /netbsd: ipmi: watchdog tickle returned 0xffffffff
Jul 11 05:20:56 www1 /netbsd: ipmi: watchdog tickle returned 0x19

>How-To-Repeat:

	To activate the problem, do the following:

1.  Find a machine running NetBSD-3 or newer with an ipmi(4) compatible BMC
board in it.

2.  Enable the wdogctl command in /etc/rc.conf
I use something like:
wdogctl=YES	wdogctl_flags="-u -p 30 ipmi0"

3.  Run /etc/rc.d/wdogctl start

4.  Wait for the errors to roll in.

5.   Wait longer for the panics and spontaneous reboots.


>Fix:


	I've cooked up two patches for this problem, both of which fix it
nicely.  The first is for NetBSD-5.x and later.
The second is for NetBSD-3.x and 4.x.  The patch, shown below is against
NetBSD-3.1 sources, but applies cleanly to NetBSD-4.x sources as well.
	The NetBSD-5 patch should apply to the -current sources as well,
though I haven't tested it.
	If these could be pulled into the 5.x and 4.x trees, that would be
most appreciated.  If any fixes are being done to the 3.x tree, pulling it
into that would be great too, but I imagine that tree is closed.

<NetBSD-5.x patch>
Index: ipmi.c
===================================================================
RCS file: /cvsroot/src/sys/arch/x86/x86/ipmi.c,v
retrieving revision 1.21.2.7
diff -u -r1.21.2.7 ipmi.c
--- ipmi.c	23 Dec 2008 03:44:17 -0000	1.21.2.7
+++ ipmi.c	14 Jul 2009 15:48:15 -0000
@@ -1870,6 +1870,8 @@
 	else
 		sc->sc_wdog.smw_period = smwdog->smw_period;

+	if (!cold)
+		mutex_enter(&sc->sc_lock);
 	s = splsoftclock();
 	/* see if we can properly task to the watchdog */
 	rc = ipmi_sendcmd(sc, BMC_SA, BMC_LUN, APP_NETFN,
@@ -1878,6 +1880,8 @@
 	if (rc) {
 		printf("ipmi: APP_GET_WATCHDOG_TIMER returned 0x%x\n", rc);
 		splx(s);
+		if (!cold)
+			mutex_exit(&sc->sc_lock);
 		return EIO;
 	}

@@ -1896,6 +1900,8 @@
 	    APP_SET_WATCHDOG_TIMER, sizeof(swdog), &swdog);
 	rc = ipmi_recvcmd(sc, 0, &len, NULL);
 	splx(s);
+	if (!cold)
+		mutex_exit(&sc->sc_lock);
 	if (rc) {
 		printf("ipmi: APP_SET_WATCHDOG_TIMER returned 0x%x\n", rc);
 		return EIO;
@@ -1910,12 +1916,16 @@
 	struct ipmi_softc	*sc = smwdog->smw_cookie;
 	int			s, rc, len;

+	if (!cold)
+		mutex_enter(&sc->sc_lock);
 	s = splsoftclock();
 	/* tickle the watchdog */
 	rc = ipmi_sendcmd(sc, BMC_SA, BMC_LUN, APP_NETFN,
 	    APP_RESET_WATCHDOG, 0, NULL);
 	rc = ipmi_recvcmd(sc, 0, &len, NULL);
 	splx(s);
+	if (!cold)
+		mutex_exit(&sc->sc_lock);
 	if (rc) {
 		printf("ipmi: watchdog tickle returned 0x%x\n", rc);
 		return EIO;


<NetBSD-3.x and 4.x patch>
Index: ipmi.c
===================================================================
RCS file: /cvsroot/src/sys/arch/x86/x86/ipmi.c,v
retrieving revision 1.4.8.2
diff -u -r1.4.8.2 ipmi.c
--- ipmi.c	15 Oct 2007 21:50:32 -0000	1.4.8.2
+++ ipmi.c	14 Jul 2009 15:33:29 -0000
@@ -1869,6 +1869,8 @@
 	else
 		sc->sc_wdog.smw_period = smwdog->smw_period;

+	if (!cold)
+		(void) lockmgr(&sc->sc_lock, LK_EXCLUSIVE, NULL);
 	s = splsoftclock();
 	/* see if we can properly task to the watchdog */
 	rc = ipmi_sendcmd(sc, BMC_SA, BMC_LUN, APP_NETFN,
@@ -1878,6 +1880,8 @@
 		printf("%s: APP_GET_WATCHDOG_TIMER returned 0x%x\n",
 		    DEVNAME(sc), rc);
 		splx(s);
+		if (!cold)
+			(void) lockmgr(&sc->sc_lock, LK_RELEASE, NULL);
 		return EIO;
 	}

@@ -1896,6 +1900,8 @@
 	    APP_SET_WATCHDOG_TIMER, sizeof(swdog), &swdog);
 	rc = ipmi_recvcmd(sc, 0, &len, NULL);
 	splx(s);
+	if (!cold)
+		(void) lockmgr(&sc->sc_lock, LK_RELEASE, NULL);
 	if (rc) {
 		printf("%s: APP_SET_WATCHDOG_TIMER returned 0x%x\n",
 		    DEVNAME(sc), rc);
@@ -1911,12 +1917,16 @@
 	struct ipmi_softc	*sc = smwdog->smw_cookie;
 	int			s, rc, len;

+	if (!cold)
+		(void) lockmgr(&sc->sc_lock, LK_EXCLUSIVE, NULL);
 	s = splsoftclock();
 	/* tickle the watchdog */
 	rc = ipmi_sendcmd(sc, BMC_SA, BMC_LUN, APP_NETFN,
 	    APP_RESET_WATCHDOG, 0, NULL);
 	rc = ipmi_recvcmd(sc, 0, &len, NULL);
 	splx(s);
+	if (!cold)
+		(void) lockmgr(&sc->sc_lock, LK_RELEASE, NULL);
 	if (rc) {
 		printf("%s: watchdog tickle returned 0x%x\n", DEVNAME(sc), rc);
 		return EIO;

>Unformatted:
 doesn't work

 From: buhrow
 Reply-To: buhrow
 X-send-pr-version: 3.95

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.