NetBSD Problem Report #46833

From www@NetBSD.org  Fri Aug 24 20:22:29 2012
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id 968C863B882
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 24 Aug 2012 20:22:29 +0000 (UTC)
Message-Id: <20120824202228.C252463B85F@www.NetBSD.org>
Date: Fri, 24 Aug 2012 20:22:28 +0000 (UTC)
From: ftigeot@wolfpond.org
Reply-To: ftigeot@wolfpond.org
To: gnats-bugs@NetBSD.org
Subject: NetBSD 6.0_BETA2 shutdowns under load
X-Send-Pr-Version: www-1.0

>Number:         46833
>Category:       port-amd64
>Synopsis:       NetBSD 6.0_BETA2 shutdowns under load
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-amd64-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Aug 24 20:25:00 +0000 2012
>Closed-Date:    Mon Oct 07 07:49:24 +0000 2013
>Last-Modified:  Mon Oct 07 07:49:24 +0000 2013
>Originator:     Francois Tigeot
>Release:        NetBSD 6.0_BETA2
>Organization:
>Environment:
NetBSD netbsd.zefyris.com 6.0_BETA2 NetBSD 6.0_BETA2 (GENERIC) amd64
>Description:
I left a NetBSD 6.0-BETA2 system running PostgreSQL benchmarks. It
powered-off by itself with this message on console:

CRITICAL TEMPERATURE! SHUTTING DOWN

Hardware is a 1U dual Xeon X5650 system with 24 GB RAM (Dell R410).

Machine only gets slightly warm to touch under load; no other operating
system shows this behavior.


>How-To-Repeat:
Run pgbench in select mode with a large database and at least 20 client processes.
I can get NetBSD to shutdown in a matter of minutes 100% of the time
with this configuration:

- PostgreSQL 9.2-beta3
- Pgbench with a scale 800 database (~= 11GB)
- pgbench -h 127.0.0.1 -j 1 -c 24 -T 600 -S bench

(The last command runs 24 client processes for 10 minutes.)
>Fix:

>Release-Note:

>Audit-Trail:
From: "Jeremy C. Reed" <reed@reedmedia.net>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Fri, 24 Aug 2012 15:38:59 -0500 (CDT)

 On Fri, 24 Aug 2012, ftigeot@wolfpond.org wrote:

 > CRITICAL TEMPERATURE! SHUTTING DOWN

 I assume you have powerd running.  Do you also have /etc/envsys.conf 
 configured?  What does this show "grep temp[0-9] /var/run/dmesg.boot"? 
 And what does "envstat" show before and during tests?

From: Bernd Ernesti <netbsd@lists.veego.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Fri, 24 Aug 2012 22:37:24 +0200

 On Fri, Aug 24, 2012 at 08:25:00PM +0000, ftigeot@wolfpond.org wrote:
 > I left a NetBSD 6.0-BETA2 system running PostgreSQL benchmarks. It
 > powered-off by itself with this message on console:
 > 
 > CRITICAL TEMPERATURE! SHUTTING DOWN

 What does the envstat command print?

 Bernd

From: Jukka Ruohonen <jruohonen@iki.fi>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Sat, 25 Aug 2012 06:06:54 +0300

 On Fri, Aug 24, 2012 at 08:25:00PM +0000, ftigeot@wolfpond.org wrote:
 > I left a NetBSD 6.0-BETA2 system running PostgreSQL benchmarks. It
 > powered-off by itself with this message on console:
 > 
 > CRITICAL TEMPERATURE! SHUTTING DOWN

 This is my fault. Edit the script in:

 	/etc/powerd/scripts/sensor_temperature.

 by commenting out the shutdown command. The rationale for the command was
 that there are quite a few laptops that really require the option. In other
 words, these systems will heat so much that they will hit the in-cpu reset.
 In such cases a graceful shutdown is desirable.

 - Jukka.

From: Francois Tigeot <ftigeot@wolfpond.org>
To: gnats-bugs@NetBSD.org
Cc: port-amd64-maintainer@NetBSD.org, gnats-admin@NetBSD.org,
        netbsd-bugs@NetBSD.org, ftigeot@wolfpond.org
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Sat, 25 Aug 2012 09:47:49 +0200

 --qDbXVdCdHGoSgWSk
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 On Fri, Aug 24, 2012 at 08:40:04PM +0000, Jeremy C. Reed wrote:
 > The following reply was made to PR port-amd64/46833; it has been noted by GNATS.
 > 
 > From: "Jeremy C. Reed" <reed@reedmedia.net>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
 > Date: Fri, 24 Aug 2012 15:38:59 -0500 (CDT)
 > 
 >  On Fri, 24 Aug 2012, ftigeot@wolfpond.org wrote:
 >  
 >  > CRITICAL TEMPERATURE! SHUTTING DOWN
 >  
 >  I assume you have powerd running.  Do you also have /etc/envsys.conf 
 >  configured?  What does this show "grep temp[0-9] /var/run/dmesg.boot"? 
 >  And what does "envstat" show before and during tests?

 Just to be clear this is a default NetBSD installation from a recent
 snapshot iso, I have not configured anything with regard to power management
 (and I didn't expect to have to on a non-laptop machine).

 - a /usr/sbin/powerd daemon is effectively running
 - /etc/envsys.conf only contains commented-out lines
 - dmesg messages:
   coretemp0 at cpu0: thermal sensor, 1 C resolution
   coretemp1 at cpu1: thermal sensor, 1 C resolution
   coretemp2 at cpu2: thermal sensor, 1 C resolution
   coretemp3 at cpu3: thermal sensor, 1 C resolution
   coretemp4 at cpu4: thermal sensor, 1 C resolution
   coretemp5 at cpu5: thermal sensor, 1 C resolution
   coretemp6 at cpu6: thermal sensor, 1 C resolution
   coretemp7 at cpu7: thermal sensor, 1 C resolution
   coretemp8 at cpu8: thermal sensor, 1 C resolution
   coretemp9 at cpu9: thermal sensor, 1 C resolution
   coretemp10 at cpu10: thermal sensor, 1 C resolution
   coretemp11 at cpu11: thermal sensor, 1 C resolution

 I have attached files with envstat output just after power up and
 just after getting the CRITICAL TEMPERATURE message.

 -- 
 Francois Tigeot

 --qDbXVdCdHGoSgWSk
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="envstat.freshboot.txt"

                        Current  CritMax  WarnMax  WarnMin  CritMin  Unit
 [coretemp0]
    cpu0 temperature:    35.000                                     degC
 [coretemp1]
    cpu1 temperature:    37.000                                     degC
 [coretemp10]
   cpu10 temperature:    36.000                                     degC
 [coretemp11]
   cpu11 temperature:    36.000                                     degC
 [coretemp2]
    cpu2 temperature:    29.000                                     degC
 [coretemp3]
    cpu3 temperature:    40.000                                     degC
 [coretemp4]
    cpu4 temperature:    33.000                                     degC
 [coretemp5]
    cpu5 temperature:    34.000                                     degC
 [coretemp6]
    cpu6 temperature:    32.000                                     degC
 [coretemp7]
    cpu7 temperature:    31.000                                     degC
 [coretemp8]
    cpu8 temperature:    34.000                                     degC
 [coretemp9]
    cpu9 temperature:    32.000                                     degC
 [ipmi0]
            Voltage1:   235.002                                         V
             Voltage:       N/A
           Intrusion:      TRUE
             Status1:     FALSE
              Status:     FALSE
      FAN MOD 6B RPM:      5400                                1920  RPM
      FAN MOD 6A RPM:      7920                                2640  RPM
      FAN MOD 5B RPM:      5400                                1920  RPM
      FAN MOD 5A RPM:      7920                                2640  RPM
      FAN MOD 4B RPM:      8880                                1920  RPM
      FAN MOD 4A RPM:     11400                                2640  RPM
      FAN MOD 3B RPM:      5880                                1920  RPM
      FAN MOD 3A RPM:      7680                                2640  RPM
      FAN MOD 2B RPM:      6000                                1920  RPM
      FAN MOD 2A RPM:      7800                                2640  RPM
      FAN MOD 1B RPM:      6000                                1920  RPM
      FAN MOD 1A RPM:      7680                                2640  RPM
         Planar Temp:    43.184   95.407   90.385    8.034    3.013 degC
       Ambient Temp2:    27.116   47.201   42.180    8.034    3.013 degC
               Temp6:    43.184   47.201   42.180    8.034    3.013 degC
               Temp5:    35.150   47.201   42.180    8.034    3.013 degC
               Temp4:    29.124                                     degC
       Ambient Temp1:    25.107                                     degC
        Ambient Temp:       N/A
               Temp3:    38.163                                     degC
               Temp2:       N/A
               Temp1:   -60.257   90.385   85.364                   degC
                Temp:   -64.274   90.385   85.364                   degC

 --qDbXVdCdHGoSgWSk
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="envstat.critical.txt"

                        Current  CritMax  WarnMax  WarnMin  CritMin  Unit
 [coretemp0]
    cpu0 temperature:    68.000                                     degC
 [coretemp1]
    cpu1 temperature:    59.000                                     degC
 [coretemp10]
   cpu10 temperature:    62.000                                     degC
 [coretemp11]
   cpu11 temperature:    66.000                                     degC
 [coretemp2]
    cpu2 temperature:    61.000                                     degC
 [coretemp3]
    cpu3 temperature:    62.000                                     degC
 [coretemp4]
    cpu4 temperature:    67.000                                     degC
 [coretemp5]
    cpu5 temperature:    58.000                                     degC
 [coretemp6]
    cpu6 temperature:    59.000                                     degC
 [coretemp7]
    cpu7 temperature:    60.000                                     degC
 [coretemp8]
    cpu8 temperature:    62.000                                     degC
 [coretemp9]
    cpu9 temperature:    60.000                                     degC
 [ipmi0]
            Voltage1:   235.002                                         V
             Voltage:       N/A
           Intrusion:      TRUE
             Status1:     FALSE
              Status:     FALSE
      FAN MOD 6B RPM:      5400                                1920  RPM
      FAN MOD 6A RPM:      7920                                2640  RPM
      FAN MOD 5B RPM:      5400                                1920  RPM
      FAN MOD 5A RPM:      7920                                2640  RPM
      FAN MOD 4B RPM:      8760                                1920  RPM
      FAN MOD 4A RPM:     11400                                2640  RPM
      FAN MOD 3B RPM:      5880                                1920  RPM
      FAN MOD 3A RPM:      7800                                2640  RPM
      FAN MOD 2B RPM:      6000                                1920  RPM
      FAN MOD 2A RPM:      7680                                2640  RPM
      FAN MOD 1B RPM:      6000                                1920  RPM
      FAN MOD 1A RPM:      8040                                2640  RPM
         Planar Temp:    50.214   95.407   90.385    8.034    3.013 degC
       Ambient Temp2:    27.116   47.201   42.180    8.034    3.013 degC
               Temp6:    43.184   47.201   42.180    8.034    3.013 degC
               Temp5:    48.205   47.201   42.180    8.034    3.013 degC
               Temp4:    32.137                                     degC
       Ambient Temp1:    25.107                                     degC
        Ambient Temp:       N/A
               Temp3:    44.188                                     degC
               Temp2:       N/A
               Temp1:   -32.137   90.385   85.364                   degC
                Temp:   -32.137   90.385   85.364                   degC

 --qDbXVdCdHGoSgWSk--

From: Francois Tigeot <ftigeot@wolfpond.org>
To: gnats-bugs@NetBSD.org
Cc: port-amd64-maintainer@NetBSD.org, gnats-admin@NetBSD.org,
        netbsd-bugs@NetBSD.org, ftigeot@wolfpond.org
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Sat, 25 Aug 2012 09:52:44 +0200

 On Sat, Aug 25, 2012 at 03:10:06AM +0000, Jukka Ruohonen wrote:
 > The following reply was made to PR port-amd64/46833; it has been noted by GNATS.
 > 
 > From: Jukka Ruohonen <jruohonen@iki.fi>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
 > Date: Sat, 25 Aug 2012 06:06:54 +0300
 > 
 >  On Fri, Aug 24, 2012 at 08:25:00PM +0000, ftigeot@wolfpond.org wrote:
 >  > I left a NetBSD 6.0-BETA2 system running PostgreSQL benchmarks. It
 >  > powered-off by itself with this message on console:
 >  > 
 >  > CRITICAL TEMPERATURE! SHUTTING DOWN
 >  
 >  This is my fault. Edit the script in:
 >  
 >  	/etc/powerd/scripts/sensor_temperature.
 >  
 >  by commenting out the shutdown command. The rationale for the command was
 >  that there are quite a few laptops that really require the option. In other
 >  words, these systems will heat so much that they will hit the in-cpu reset.
 >  In such cases a graceful shutdown is desirable.

 I see two problems here:
 - the machine is a server, not a laptop
 - temperature never gets critical.

 What about asking the user if power management needs to be enabled or not at
 installation time ?

 -- 
 Francois Tigeot

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Sat, 25 Aug 2012 08:16:28 +0000 (UTC)

 ftigeot@wolfpond.org (Francois Tigeot) writes:

 Hi,

 >                        Current  CritMax  WarnMax  WarnMin  CritMin  Unit
 >               Temp6:    43.184   47.201   42.180    8.034    3.013 degC
 >               Temp5:    48.205   47.201   42.180    8.034    3.013 degC

 Temp6 exceeds WarnMax
 Temp5 exceeds CritMax

 powerd will shut down the machine when a sensor goes 'critical'.

 Maybe the sensors do not read out correctly or NetBSD assumes a wrong
 conversion function. Can you verify that your your server really isn't
 running too hot? Often you can see the sensor readouts in BIOS or
 through IPMI.

 N.B. the thresholds look like being tuned for operation in a real
 air-conditioned and cooled computer center. Shutting down the machine
 when the cooling fails seems to be reasonable to me.

 -- 
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

From: Francois Tigeot <ftigeot@wolfpond.org>
To: gnats-bugs@NetBSD.org
Cc: port-amd64-maintainer@NetBSD.org, gnats-admin@NetBSD.org,
        netbsd-bugs@NetBSD.org, ftigeot@wolfpond.org
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Sat, 25 Aug 2012 11:16:00 +0200

 Hi,

 On Sat, Aug 25, 2012 at 08:20:05AM +0000, Michael van Elst wrote:
 > The following reply was made to PR port-amd64/46833; it has been noted by GNATS.
 > 
 > From: mlelstv@serpens.de (Michael van Elst)
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
 > Date: Sat, 25 Aug 2012 08:16:28 +0000 (UTC)
 > 
 >  ftigeot@wolfpond.org (Francois Tigeot) writes:
 >  
 >  >                        Current  CritMax  WarnMax  WarnMin  CritMin  Unit
 >  >               Temp6:    43.184   47.201   42.180    8.034    3.013 degC
 >  >               Temp5:    48.205   47.201   42.180    8.034    3.013 degC
 >  
 >  Temp6 exceeds WarnMax
 >  Temp5 exceeds CritMax
 >  
 >  powerd will shut down the machine when a sensor goes 'critical'.
 >  
 >  Maybe the sensors do not read out correctly or NetBSD assumes a wrong
 >  conversion function. Can you verify that your your server really isn't
 >  running too hot? Often you can see the sensor readouts in BIOS or
 >  through IPMI.

 Fan speeds vary automatically according to temperature; they where far
 from running at fullspeed when powerd decided to shut down the system,
 and believe me they're *loud*.
 It's impossible to miss the sound when the machine gets hot and really
 starts pumping air.

 This Xeon box has been running without any issue under far heavier loads
 with Linux and other *BSD systems. Never got a complaint, not even a
 beep or a warning led.

 BIOS setup doesn't show anything wrt environment sensors and I haven't
 found a working ipmi client in pkgsrc yet.

 >  N.B. the thresholds look like being tuned for operation in a real
 >  air-conditioned and cooled computer center. Shutting down the machine
 >  when the cooling fails seems to be reasonable to me.

 The server is sitting on a test bench and not a regular machine room but
 this shouldn't make too much difference. Ambient temperature is 24C.
 What are these Temp sensors supposed to monitor anyway ? Some report minus
 30C values, which seems about right for Siberia and not machine rooms...

 -- 
 Francois Tigeot

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Sat, 25 Aug 2012 13:31:50 +0200

 On Sat, Aug 25, 2012 at 07:50:05AM +0000, Francois Tigeot wrote:
 >  [ipmi0]
 [..]
 >          Planar Temp:    43.184   95.407   90.385    8.034    3.013 degC
 >        Ambient Temp2:    27.116   47.201   42.180    8.034    3.013 degC
 >                Temp6:    43.184   47.201   42.180    8.034    3.013 degC
 >                Temp5:    35.150   47.201   42.180    8.034    3.013 degC

 The critical/warning thresholds look all completely bogus, I can believe
 the actual temperatures. Bugs in the SMBIOS?

 Martin

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Sat, 25 Aug 2012 11:34:06 +0000 (UTC)

 ftigeot@wolfpond.org (Francois Tigeot) writes:

 >What are these Temp sensors supposed to monitor anyway ? Some report minus
 >30C values, which seems about right for Siberia and not machine rooms...

 Temp5/Temp6 seem to measure temperature inside the chassis, but not really
 close to power regulators or the CPU.

 An invalid value such as -30C usually means a read-out of zero, i.e.
 there is no sensor signal because the machine isn't equipped with that
 sensor.

 So, to me this all works as designed, but you are willing to accept
 more than what the, pretty conservative, BIOS thresholds allow.
 This is fine and you can adjust the behaviour of NetBSD by configuring
 your own thresholds in /etc/envsys.conf.


 -- 
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Sat, 25 Aug 2012 12:03:44 +0000 (UTC)

 martin@duskware.de (Martin Husemann) writes:

 > >          Planar Temp:    43.184   95.407   90.385    8.034    3.013 degC
 > >        Ambient Temp2:    27.116   47.201   42.180    8.034    3.013 degC
 > >                Temp6:    43.184   47.201   42.180    8.034    3.013 degC
 > >                Temp5:    35.150   47.201   42.180    8.034    3.013 degC
 > 
 > The critical/warning thresholds look all completely bogus, I can believe
 > the actual temperatures. Bugs in the SMBIOS?

 The thresholds look fine to me. Lots of machines have upper thresholds
 in the 35-45C range and lower thresholds in the 5-10C range. The normal
 ambient temperature for such systems is something between 16C and 24C.

 Fujitsu Primergy TX300 S6      lc=1 lw=6 uw=37 uc=42
 Dell PowerEdge R610            lc=3 lw=8 uw=42 uc=47
 HP ProLiant DL380 G7           no lower thresholds, uc=41

 Other systems do not measure "Ambient" values but "System" or "Planar" values
 which is more something like the CPU or voltage regulator temperature.

 Dell PowerEdge R610            lc=3 lw=8 uw=92 uc=97
 SGI Altix XE500                no lower thresholds,  uw=81, uc=82

 -- 
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: port-amd64-maintainer@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, ftigeot@wolfpond.org
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Sat, 25 Aug 2012 14:42:16 +0200

 On Sat, Aug 25, 2012 at 12:05:03PM +0000, Michael van Elst wrote:
 >  Other systems do not measure "Ambient" values but "System" or "Planar" values
 >  which is more something like the CPU or voltage regulator temperature.
 >  
 >  Dell PowerEdge R610            lc=3 lw=8 uw=92 uc=97
 >  SGI Altix XE500                no lower thresholds,  uw=81, uc=82

 Ah, ok - I've only seen systems of this variety (and never with lower bounds).

 Martin

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: port-amd64-maintainer@NetBSD.org, gnats-admin@NetBSD.org,
        netbsd-bugs@NetBSD.org, ftigeot@wolfpond.org
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Sun, 26 Aug 2012 18:57:08 +0200

 On Sat, Aug 25, 2012 at 11:35:02AM +0000, Martin Husemann wrote:
 > The following reply was made to PR port-amd64/46833; it has been noted by GNATS.
 > 
 > From: Martin Husemann <martin@duskware.de>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
 > Date: Sat, 25 Aug 2012 13:31:50 +0200
 > 
 >  On Sat, Aug 25, 2012 at 07:50:05AM +0000, Francois Tigeot wrote:
 >  >  [ipmi0]
 >  [..]
 >  >          Planar Temp:    43.184   95.407   90.385    8.034    3.013 degC
 >  >        Ambient Temp2:    27.116   47.201   42.180    8.034    3.013 degC
 >  >                Temp6:    43.184   47.201   42.180    8.034    3.013 degC
 >  >                Temp5:    35.150   47.201   42.180    8.034    3.013 degC
 >  
 >  The critical/warning thresholds look all completely bogus, I can believe
 >  the actual temperatures. Bugs in the SMBIOS?

 I have these values on a Dell poweredge 2950:
                        Current  CritMax  CritMin  CritCap     Unit
       Ambient Temp:     26.111   47.201    3.013              degC
 And a recent supermicro server:
                        Current  CritMax  WarnMax  WarnMin  CritMin  Unit
           PCH Temp@:    43.184   95.407   90.385   -5.021   -8.034 degC
         Peripheral :    28.120   77.330   75.321   -5.021   -7.030 degC
         System Temp:    23.098   77.330   75.321   -5.021   -7.030 degC

 So Francois's values don't look so bogus to me.
 It would be interesting to know what temp5 and temp6 are connected to.
 Maybe there limits are set too low.

 Francois, ipmitool from pkgsrc is working over the network if you have
 the BMC IP address configured. If you have a drac card, you should also
 be able to find temperatures here.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Mark Davies <mark@ecs.vuw.ac.nz>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Mon, 27 Aug 2012 13:23:45 +1200

 On Mon, 27 Aug 2012, you wrote:
 > So Francois's values don't look so bogus to me.
 > It would be interesting to know what temp5 and temp6 are connected
 > to. Maybe there limits are set too low.
 > 
 > Francois, ipmitool from pkgsrc is working over the network if you
 > have the BMC IP address configured. If you have a drac card, you
 > should also be able to find temperatures here.

 Not sure if this helps at all but all my poweredge r610's have 
 complained from boot
   ipmi0: critical over limit on 'Temp6'
 since I've had them (a full envstat is below)
 I've never worried about it as it didn't seem to actually be an issue, 
 over several years of running, and when you look at temperatures in 
 the DRAC it doesn't even list Temp6 so assumed it was a bogus sensor.

 [ipmi0]
           Voltage1:   230.984                                        V
            Voltage:   230.984                                        V
              Temp6:    56.240   47.201   42.180    8.034    3.013 degC
              Temp5:       N/A
              Temp4:    23.098                                     degC
          Intrusion:      TRUE
            Status1:     FALSE
             Status:     FALSE
     FAN MOD 6B RPM:      6720                                1920  RPM
     FAN MOD 5B RPM:      3480                                1920  RPM
     FAN MOD 4B RPM:      3480                                1920  RPM
     FAN MOD 3B RPM:      3480                                1920  RPM
     FAN MOD 2B RPM:      3480                                1920  RPM
     FAN MOD 1B RPM:      3480                                1920  RPM
     FAN MOD 6A RPM:      6720                                1920  RPM
     FAN MOD 5A RPM:      4680                                1920  RPM
     FAN MOD 4A RPM:      4560                                1920  RPM
     FAN MOD 3A RPM:      4560                                1920  RPM
     FAN MOD 2A RPM:      4560                                1920  RPM
     FAN MOD 1A RPM:      4560                                1920  RPM
        Planar Temp:    34.146   97.415   92.394    8.034    3.013 degC
      Ambient Temp2:    17.073   47.201   42.180    8.034    3.013 degC
      Ambient Temp1:    17.073                                     degC
              Temp3:    28.120                                     degC
       Ambient Temp:    18.077                                     degC
              Temp2:    26.111                                     degC
              Temp1:    50.214   90.385   85.364                   degC
               Temp:   -62.265   90.385   85.364                   degC


 Now that, with 6.0_RC1, powerd starts automatically and immediately 
 shuts the machine down I've shut it up with the following override in 
 envsys.conf

 ipmi0 {
    sensor2 { critical-max = 90C; warning-max = 85C; }
 }

 cheers
 mark

From: Francois Tigeot <ftigeot@wolfpond.org>
To: Manuel Bouyer <bouyer@antioche.eu.org>
Cc: gnats-bugs@NetBSD.org, port-amd64-maintainer@NetBSD.org,
        gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org, ftigeot@wolfpond.org
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Mon, 27 Aug 2012 08:49:56 +0200

 On Sun, Aug 26, 2012 at 06:57:08PM +0200, Manuel Bouyer wrote:
 > 
 > Francois, ipmitool from pkgsrc is working over the network if you have
 > the BMC IP address configured. If you have a drac card, you should also
 > be able to find temperatures here.

 The part of "ipmitool sensor" output with temp data:

 Temp             | na         | degrees C  | na    | na        | na        | na        | 85.000    | 90.000    | na        
 Temp             | na         | degrees C  | na    | na        | na        | na        | 85.000    | 90.000    | na        
 Temp             | na         | degrees C  | na    | na        | na        | na        | na        | na        | na        
 Temp             | na         | degrees C  | na    | na        | na        | na        | na        | na        | na        
 Ambient Temp     | na         | degrees C  | na    | na        | na        | na        | na        | na        | na        
 Ambient Temp     | na         | degrees C  | na    | na        | na        | na        | na        | na        | na        
 Temp             | na         | degrees C  | na    | na        | na        | na        | na        | na        | na        
 Temp             | na         | degrees C  | na    | na        | 3.000     | 8.000     | 42.000    | 47.000    | na        
 Temp             | na         | degrees C  | na    | na        | 3.000     | 8.000     | 42.000    | 47.000    | na        
 Ambient Temp     | 25.000     | degrees C  | ok    | na        | 3.000     | 8.000     | 42.000    | 47.000    | na        
 Planar Temp      | na         | degrees C  | na    | na        | 3.000     | 8.000     | 90.000    | 95.000    | na        

 -- 
 Francois Tigeot

From: Francois Tigeot <ftigeot@wolfpond.org>
To: gnats-bugs@NetBSD.org
Cc: port-amd64-maintainer@NetBSD.org, gnats-admin@NetBSD.org,
        netbsd-bugs@NetBSD.org, ftigeot@wolfpond.org
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Mon, 27 Aug 2012 08:59:45 +0200

 On Mon, Aug 27, 2012 at 01:25:02AM +0000, Mark Davies wrote:
 >  
 >  Not sure if this helps at all but all my poweredge r610's have 
 >  complained from boot
 >    ipmi0: critical over limit on 'Temp6'
 >  since I've had them (a full envstat is below)
 >  I've never worried about it as it didn't seem to actually be an issue, 
 >  over several years of running, and when you look at temperatures in 
 >  the DRAC it doesn't even list Temp6 so assumed it was a bogus sensor.

 What makes me think the most the reports are bogus in my case is the
 actual hardware doesn't give a damn:

 - When temperature really starts to get hot, fans start spinning faster
   and louder. much louder.

 - But when powerd reports critical temperature, the fans keep running at
   minimal speed.

 -- 
 Francois Tigeot

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Mon, 27 Aug 2012 07:02:28 +0000

 On Mon, Aug 27, 2012 at 07:00:13AM +0000, Francois Tigeot wrote:
  >  What makes me think the most the reports are bogus in my case is the
  >  actual hardware doesn't give a damn:
  >  
  >  - When temperature really starts to get hot, fans start spinning faster
  >    and louder. much louder.
  >  
  >  - But when powerd reports critical temperature, the fans keep running at
  >    minimal speed.

 ...it is also possible that the problem is that we aren't doing the
 right thing with the fans.

 -- 
 David A. Holland
 dholland@netbsd.org

From: Francois Tigeot <ftigeot@wolfpond.org>
To: gnats-bugs@NetBSD.org
Cc: port-amd64-maintainer@NetBSD.org, gnats-admin@NetBSD.org,
        netbsd-bugs@NetBSD.org, ftigeot@wolfpond.org
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Mon, 27 Aug 2012 09:09:45 +0200

 On Mon, Aug 27, 2012 at 07:05:04AM +0000, David Holland wrote:
 > The following reply was made to PR port-amd64/46833; it has been noted by GNATS.
 > 
 > From: David Holland <dholland-bugs@netbsd.org>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
 > Date: Mon, 27 Aug 2012 07:02:28 +0000
 > 
 >  On Mon, Aug 27, 2012 at 07:00:13AM +0000, Francois Tigeot wrote:
 >   >  What makes me think the most the reports are bogus in my case is the
 >   >  actual hardware doesn't give a damn:
 >   >  
 >   >  - When temperature really starts to get hot, fans start spinning faster
 >   >    and louder. much louder.
 >   >  
 >   >  - But when powerd reports critical temperature, the fans keep running at
 >   >    minimal speed.
 >  
 >  ...it is also possible that the problem is that we aren't doing the
 >  right thing with the fans.

 Now than I think about it, I only remember fans getting loud with non-NetBSD
 operating systems.

 Does this mean NetBSD's powerd takes over fan handling from the firmware or
 whatever manages them by default ?

 -- 
 Francois Tigeot

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Mon, 27 Aug 2012 07:22:05 +0000

 On Mon, Aug 27, 2012 at 07:10:08AM +0000, Francois Tigeot wrote:
  >>  ...it is also possible that the problem is that we aren't doing the
  >>  right thing with the fans.
  >  
  > Now than I think about it, I only remember fans getting loud with non-NetBSD
  > operating systems.
  >  
  > Does this mean NetBSD's powerd takes over fan handling from the firmware or
  > whatever manages them by default ?

 It is (presumably) ACPI. Other people can tell you more about that
 than I can...

 -- 
 David A. Holland
 dholland@netbsd.org

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Francois Tigeot <ftigeot@wolfpond.org>
Cc: gnats-bugs@NetBSD.org, port-amd64-maintainer@NetBSD.org,
        gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Mon, 27 Aug 2012 09:54:00 +0200

 On Mon, Aug 27, 2012 at 08:59:45AM +0200, Francois Tigeot wrote:
 > On Mon, Aug 27, 2012 at 01:25:02AM +0000, Mark Davies wrote:
 > >  
 > >  Not sure if this helps at all but all my poweredge r610's have 
 > >  complained from boot
 > >    ipmi0: critical over limit on 'Temp6'
 > >  since I've had them (a full envstat is below)
 > >  I've never worried about it as it didn't seem to actually be an issue, 
 > >  over several years of running, and when you look at temperatures in 
 > >  the DRAC it doesn't even list Temp6 so assumed it was a bogus sensor.
 > 
 > What makes me think the most the reports are bogus in my case is the
 > actual hardware doesn't give a damn:

 Another option is that the limits reported by hardware are bogus,
 and the BIOS/firmware doesn't take care about them.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: Francois Tigeot <ftigeot@wolfpond.org>
Cc: gnats-bugs@NetBSD.org, port-amd64-maintainer@NetBSD.org,
        gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Mon, 27 Aug 2012 09:58:18 +0200

 On Mon, Aug 27, 2012 at 08:49:56AM +0200, Francois Tigeot wrote:
 > The part of "ipmitool sensor" output with temp data:
 > 
 > [...]
 > Temp             | na         | degrees C  | na    | na        | 3.000     | 8.000     | 42.000    | 47.000    | na        
 > Temp             | na         | degrees C  | na    | na        | 3.000     | 8.000     | 42.000    | 47.000    | na        

 I didn't see the sensors that cause problems in envstat; the closest match
 would be these 2 ones (the limits matches). It looks like the values
 are ignored by ipmitools. NetBSD should probably ignore them too,
 now we need to find why ipmitool ignores the values and NetBSD does not ...

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: port-amd64-maintainer@NetBSD.org, gnats-admin@NetBSD.org,
        netbsd-bugs@NetBSD.org, ftigeot@wolfpond.org
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Mon, 27 Aug 2012 09:59:43 +0200

 On Mon, Aug 27, 2012 at 07:05:04AM +0000, David Holland wrote:
 >  On Mon, Aug 27, 2012 at 07:00:13AM +0000, Francois Tigeot wrote:
 >   >  What makes me think the most the reports are bogus in my case is the
 >   >  actual hardware doesn't give a damn:
 >   >  
 >   >  - When temperature really starts to get hot, fans start spinning faster
 >   >    and louder. much louder.
 >   >  
 >   >  - But when powerd reports critical temperature, the fans keep running at
 >   >    minimal speed.
 >  
 >  ...it is also possible that the problem is that we aren't doing the
 >  right thing with the fans.

 On all the Dell hardware I have, the FANS are not controller by the OS
 (after a cooling outage I can confirm that the fans gets really loud
 when temperature rises :)

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Jukka Ruohonen <jruohonen@iki.fi>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Mon, 27 Aug 2012 21:22:11 +0300

 On Sat, Aug 25, 2012 at 09:52:44AM +0200, Francois Tigeot wrote:
 > I see two problems here:
 > - the machine is a server, not a laptop
 > - temperature never gets critical.
 > 
 > What about asking the user if power management needs to be enabled or not
 > at installation time ?

 This is not really power-management per se. But I agree, and I am sure
 I find time to rework powerd(8) to follow rc.conf(5)-like YES/NO -scheme
 for most of the knobs.

 But everyone, how do we resolve this now and here? Should the commit be
 reverted? It seems that ATM the default is "right for some machines, and
 wrong for some machines".

 - Jukka.

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Mon, 27 Aug 2012 23:31:36 +0000 (UTC)

 bouyer@antioche.eu.org (Manuel Bouyer) writes:

 >On Mon, Aug 27, 2012 at 08:59:45AM +0200, Francois Tigeot wrote:
 >> On Mon, Aug 27, 2012 at 01:25:02AM +0000, Mark Davies wrote:
 >> >  
 >> >  Not sure if this helps at all but all my poweredge r610's have 
 >> >  complained from boot
 >> >    ipmi0: critical over limit on 'Temp6'
 >> >  since I've had them (a full envstat is below)
 >> >  I've never worried about it as it didn't seem to actually be an issue, 
 >> >  over several years of running, and when you look at temperatures in 
 >> >  the DRAC it doesn't even list Temp6 so assumed it was a bogus sensor.
 >> 
 >> What makes me think the most the reports are bogus in my case is the
 >> actual hardware doesn't give a damn:

 >Another option is that the limits reported by hardware are bogus,
 >and the BIOS/firmware doesn't take care about them.

 When you query a poweredge r610 with IPMI, it reports 7 sensors
 named 'Temp', all are in the state 'Disabled' and ipmitool says that
 no readout is available (but some thresholds are).

 The only temperature sensor reporting something is one of the three
 'Ambient Temp' sensors.

 Temp             | na         | degrees C  | na    | na        | na        | na        | 85.000    | 90.000    | na        
 Temp             | na         | degrees C  | na    | na        | na        | na        | 85.000    | 90.000    | na        
 Temp             | na         | degrees C  | na    | 64.000    | na        | -128.000  | -128.000  | na        | na        
 Ambient Temp     | na         | degrees C  | na    | 64.000    | na        | -128.000  | -128.000  | na        | na        
 Temp             | na         | degrees C  | na    | 64.000    | na        | -128.000  | -128.000  | na        | na        
 Ambient Temp     | na         | degrees C  | na    | 64.000    | na        | -128.000  | -128.000  | na        | na        
 Ambient Temp     | 21.000     | degrees C  | ok    | na        | 3.000     | 8.000     | 42.000    | 47.000    | na        
 Planar Temp      | na         | degrees C  | na    | na        | 3.000     | 8.000     | 92.000    | 97.000    | na        
 Temp             | na         | degrees C  | na    | na        | na        | na        | na        | na        | na        
 Temp             | na         | degrees C  | na    | na        | 3.000     | 8.000     | 42.000    | 47.000    | na        
 Temp             | na         | degrees C  | na    | na        | 3.000     | 8.000     | 42.000    | 47.000    | na        

 Temp             | 01h | ns  |  3.1 | Disabled
 Temp             | 02h | ns  |  3.2 | Disabled
 Temp             | 05h | ns  | 10.1 | Disabled
 Ambient Temp     | 07h | ns  | 10.1 | Disabled
 Temp             | 06h | ns  | 10.2 | Disabled
 Ambient Temp     | 08h | ns  | 10.2 | Disabled
 Ambient Temp     | 0Eh | ok  |  7.1 | 21 degrees C
 Planar Temp      | 0Fh | ns  |  7.1 | Disabled
 CPU Temp Interf  | 76h | ns  |  7.1 | Disabled
 Temp             | 0Ah | ns  |  8.1 | Disabled
 Temp             | 0Bh | ns  |  8.1 | Disabled
 Temp             | 0Ch | ns  |  8.1 | Disabled

 Entities 3.1/3.2 are the CPUs
 Entities 10.1/10.2 are the Riser cards
 Entity 7.1 is the system board
 Entity 8.1 is unspecified


 So it looks like our driver doesn't ignore the disabled state correctly.


 -- 
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: port-amd64-maintainer@NetBSD.org, gnats-admin@NetBSD.org,
        netbsd-bugs@NetBSD.org, ftigeot@wolfpond.org
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Tue, 28 Aug 2012 09:48:13 +0200

 On Mon, Aug 27, 2012 at 11:35:02PM +0000, Michael van Elst wrote:
 >  When you query a poweredge r610 with IPMI, it reports 7 sensors
 >  named 'Temp', all are in the state 'Disabled' and ipmitool says that
 >  no readout is available (but some thresholds are).
 >  
 >  The only temperature sensor reporting something is one of the three
 >  'Ambient Temp' sensors.
 >  
 >  Temp             | na         | degrees C  | na    | na        | na        | na        | 85.000    | 90.000    | na        
 >  Temp             | na         | degrees C  | na    | na        | na        | na        | 85.000    | 90.000    | na        
 >  Temp             | na         | degrees C  | na    | 64.000    | na        | -128.000  | -128.000  | na        | na        
 >  Ambient Temp     | na         | degrees C  | na    | 64.000    | na        | -128.000  | -128.000  | na        | na        
 >  Temp             | na         | degrees C  | na    | 64.000    | na        | -128.000  | -128.000  | na        | na        
 >  Ambient Temp     | na         | degrees C  | na    | 64.000    | na        | -128.000  | -128.000  | na        | na        
 >  Ambient Temp     | 21.000     | degrees C  | ok    | na        | 3.000     | 8.000     | 42.000    | 47.000    | na        
 >  Planar Temp      | na         | degrees C  | na    | na        | 3.000     | 8.000     | 92.000    | 97.000    | na        
 >  Temp             | na         | degrees C  | na    | na        | na        | na        | na        | na        | na        
 >  Temp             | na         | degrees C  | na    | na        | 3.000     | 8.000     | 42.000    | 47.000    | na        
 >  Temp             | na         | degrees C  | na    | na        | 3.000     | 8.000     | 42.000    | 47.000    | na        
 >  
 >  Temp             | 01h | ns  |  3.1 | Disabled
 >  Temp             | 02h | ns  |  3.2 | Disabled
 >  Temp             | 05h | ns  | 10.1 | Disabled
 >  Ambient Temp     | 07h | ns  | 10.1 | Disabled
 >  Temp             | 06h | ns  | 10.2 | Disabled
 >  Ambient Temp     | 08h | ns  | 10.2 | Disabled
 >  Ambient Temp     | 0Eh | ok  |  7.1 | 21 degrees C
 >  Planar Temp      | 0Fh | ns  |  7.1 | Disabled
 >  CPU Temp Interf  | 76h | ns  |  7.1 | Disabled
 >  Temp             | 0Ah | ns  |  8.1 | Disabled
 >  Temp             | 0Bh | ns  |  8.1 | Disabled
 >  Temp             | 0Ch | ns  |  8.1 | Disabled
 >  
 >  Entities 3.1/3.2 are the CPUs
 >  Entities 10.1/10.2 are the Riser cards
 >  Entity 7.1 is the system board
 >  Entity 8.1 is unspecified
 >  
 >  
 >  So it looks like our driver doesn't ignore the disabled state correctly.

 that's what I suspect too. I had a look at the ipmi specs, but I coulnd't
 spot how a sensor would be marked disabled. A more detailled reading
 is needed, but this is a large document.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: bouyer@antioche.eu.org
Cc: gnats-bugs@NetBSD.org, port-amd64-maintainer@NetBSD.org,
	gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org, ftigeot@wolfpond.org
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Thu, 31 Jan 2013 07:36:46 +0000 (UTC)

 hi,

 >>  So it looks like our driver doesn't ignore the disabled state correctly.
 > 
 > that's what I suspect too. I had a look at the ipmi specs, but I coulnd't
 > spot how a sensor would be marked disabled. A more detailled reading
 > is needed, but this is a large document.

 how about disabling either ipmi driver in GENERIC or automatic shutdown
 script for now, so that it's at least usable on the affected systems?

 YAMAMOTO Takashi

From: yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi)
To: bouyer@antioche.eu.org
Cc: gnats-bugs@NetBSD.org, port-amd64-maintainer@NetBSD.org,
	gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org, ftigeot@wolfpond.org
Subject: Re: port-amd64/46833: NetBSD 6.0_BETA2 shutdowns under load
Date: Tue,  7 May 2013 11:02:41 +0000 (UTC)

 --Boundary-20130507195845-1067504
 Content-Type: Text/Plain; charset=us-ascii

 hi,

 does the attached patch make sense?

 totally untested because i have no access to ipmi-capable systems.

 YAMAMOTO Takashi

 --Boundary-20130507195845-1067504
 Content-Type: Text/Plain; charset=us-ascii
 Content-Disposition: attachment; filename="a.diff"

 Index: ipmi.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/x86/x86/ipmi.c,v
 retrieving revision 1.54
 diff -u -p -r1.54 ipmi.c
 --- ipmi.c	19 Mar 2013 06:34:28 -0000	1.54
 +++ ipmi.c	7 May 2013 10:58:48 -0000
 @@ -146,7 +146,11 @@ int	ipmi_enabled = 0;

  #define IPMI_ENTITY_PWRSUPPLY		0x0A

 -#define IPMI_INVALID_SENSOR		(1L << 5)
 +#define IPMI_SENSOR_SCANNING_ENABLED	(1L << 6)
 +#define IPMI_SENSOR_UNAVAILABLE		(1L << 5)
 +#define IPMI_INVALID_SENSOR_P(x) \
 +	(((x) & (IPMI_SENSOR_SCANNING_ENABLED|IPMI_SENSOR_UNAVAILABLE)) \
 +	== IPMI_SENSOR_SCANNING_ENABLED)

  #define IPMI_SDR_TYPEFULL		1
  #define IPMI_SDR_TYPECOMPACT		2
 @@ -1716,7 +1720,7 @@ read_sensor(struct ipmi_softc *sc, struc
  	    s1->m, s1->m_tolerance, s1->b, s1->b_accuracy, s1->rbexp, s1->linear);
  	dbg_printf(10, "values=%.2x %.2x %.2x %.2x %s\n",
  	    data[0],data[1],data[2],data[3], edata->desc);
 -	if (data[1] & IPMI_INVALID_SENSOR) {
 +	if (IPMI_INVALID_SENSOR_P(data[1])) {
  		/* Check if sensor is valid */
  		edata->state = ENVSYS_SINVALID;
  	} else {

 --Boundary-20130507195845-1067504--

From: "YAMAMOTO Takashi" <yamt@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/46833 CVS commit: src/sys/arch/x86/x86
Date: Mon, 12 Aug 2013 15:40:34 +0000

 Module Name:	src
 Committed By:	yamt
 Date:		Mon Aug 12 15:40:34 UTC 2013

 Modified Files:
 	src/sys/arch/x86/x86: ipmi.c

 Log Message:
 fix validness check of sensor value

 this change is intended to mirror what ipmitool does.
 (their macros for these bits are IS_READING_UNAVAILABLE and
 IS_SCANNING_DISABLED.)

 see also:
     second-gen-interface-spec-v2-rev1-4
     Table 35-15, Get Sensor Reading Command

 might fix PR/46833 from Francois Tigeot

 reviewed by Masanobu SAITOH and Tom Ivar Helbekkmo
 tested by Tom Ivar Helbekkmo


 To generate a diff of this commit:
 cvs rdiff -u -r1.54 -r1.55 src/sys/arch/x86/x86/ipmi.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 07 Oct 2013 06:30:55 +0000
State-Changed-Why:
Did the commit (back in August) improve things?


From: Francois Tigeot <ftigeot@wolfpond.org>
To: gnats-bugs@NetBSD.org
Cc: port-amd64-maintainer@NetBSD.org, netbsd-bugs@NetBSD.org,
        gnats-admin@NetBSD.org, dholland@NetBSD.org, ftigeot@wolfpond.org
Subject: Re: port-amd64/46833 (NetBSD 6.0_BETA2 shutdowns under load)
Date: Mon, 7 Oct 2013 09:26:20 +0200

 On Mon, Oct 07, 2013 at 06:30:55AM +0000, dholland@NetBSD.org wrote:
 > Synopsis: NetBSD 6.0_BETA2 shutdowns under load
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: dholland@NetBSD.org
 > State-Changed-When: Mon, 07 Oct 2013 06:30:55 +0000
 > State-Changed-Why:
 > Did the commit (back in August) improve things?

 I can't confirm it, I don't have access to this kind of machine anymore.

 -- 
 Francois Tigeot

State-Changed-From-To: feedback->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 07 Oct 2013 07:49:24 +0000
State-Changed-Why:
Submitter can't test; let's assume it's fixed as we're unlikely to be
able to find other hardware with the same problem. (Or if we think we
do, it might not actually be the same, etc.)


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.