NetBSD Problem Report #50939

From kivinen@fireball.acr.fi  Fri Mar 11 14:17:49 2016
Return-Path: <kivinen@fireball.acr.fi>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id CEE9D7ABE5
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 11 Mar 2016 14:17:49 +0000 (UTC)
Message-Id: <201603111254.u2BCsClJ009422@fireball.acr.fi>
Date: Fri, 11 Mar 2016 14:54:12 +0200 (EET)
From: kivinen@iki.fi
Reply-To: kivinen@iki.fi
To: gnats-bugs@NetBSD.org
Subject: Bug in GCC optionization causing i386 net-snmpd to crash
X-Send-Pr-Version: 3.95

>Number:         50939
>Category:       pkg
>Synopsis:       snmpd crashes when compiled with gcc -O2
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    adam
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Mar 11 14:20:01 +0000 2016
>Closed-Date:    Fri Mar 05 04:53:32 +0000 2021
>Last-Modified:  Fri Mar 05 04:53:32 +0000 2021
>Originator:     Tero Kivinen
>Release:        NetBSD 7.0_STABLE
>Organization:
IKI ry
>Environment:
System: NetBSD seuraava.iki.fi 7.0_STABLE NetBSD 7.0_STABLE (GENERIC) #0: Sat Mar 5 20:05:29 EET 2016 kivinen@seuraava.iki.fi:/usr/obj/sys/arch/i386/compile/GENERIC i386
Architecture: i386
Machine: i386
>Description:

	I have maching running net-snmpd and immediately when the
	monitoring script connects to the snmpd and tries to read cpu
	statistics the
	net-snmp-5.7.3/agent/mibgroup/hardware/cpu/cpu_sysctl.c
	crashes. If the net-snmpd is compiled without optimizations it
	does not crash. This only happens on the i386 architecture, it
	does not appear on amd64 architecture.

	Before the crash the system will print error message to the
	syslog saying:

	sysctl vm.vm_meter failed (errno 0)

	Using gdb to debug the code it seems it starts executing
	netsnmp_cpu_arch_load, and does the first few calls nomally,
	i.e. the cpu_stats call (line 200) etc, and then does the
	mem_mib call (line 218), but before actually storing the
	mem_stats output to the cpu->* structure (at line 220) it goes
	on and runs the NetBSD specific code reading kern.cp_time
	(line 233 forward) and after that is done it jumps back to
	check the error status of the mem_mib call (at line 219), thus
	printing out error message about the sysctl vm.vm_meter
	failing (even when it actually did succeed), and then it tries
	to store the data to cpu->* structure (at line 220), but as
	cpu variable has been trashed at this point, it has value of
	0x77 and this will cause crash.

>How-To-Repeat:

	Install NetBSD 7.0 from CVS on i386 machine. Install
	/usr/pkgsrc/net/net-snmp and the net-snmp will crash
	immediately when it calls the netsnmp_cpu_arch_load.

	I.e. start snmpd

	/etc/rc.d/snmpd start

	In our system it crashed in less than minute. 

>Fix:

	cd /usr/pkgsrc/net/net-snmp
	make configure
	<edit all Makefiles, and remove -O2 and -O2 from the CFLAGS>
	make install
	/etc/rc.d/snmpd start

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: toolchain-manager->adam
Responsible-Changed-By: maya@NetBSD.org
Responsible-Changed-When: Thu, 29 Sep 2016 21:19:02 +0000
Responsible-Changed-Why:
Over to maintainer


From: David Holland <dholland-pbugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: toolchain/50939: Bug in GCC optionization causing i386 net-snmpd
 to crash
Date: Fri, 30 Sep 2016 06:41:16 +0000

 On Fri, Mar 11, 2016 at 02:20:01PM +0000, kivinen@iki.fi wrote:
  > 	Using gdb to debug the code it seems it starts executing
  > 	netsnmp_cpu_arch_load, and does the first few calls nomally,
  > 	i.e. the cpu_stats call (line 200) etc, and then does the
  > 	mem_mib call (line 218), but before actually storing the
  > 	mem_stats output to the cpu->* structure (at line 220) it goes
  > 	on and runs the NetBSD specific code reading kern.cp_time
  > 	(line 233 forward) and after that is done it jumps back to
  > 	check the error status of the mem_mib call (at line 219), thus
  > 	printing out error message about the sysctl vm.vm_meter
  > 	failing (even when it actually did succeed), and then it tries
  > 	to store the data to cpu->* structure (at line 220), but as
  > 	cpu variable has been trashed at this point, it has value of
  > 	0x77 and this will cause crash.

 This sounds like it is overwriting its stack, probably in the mem_mib
 call. Then when it returns form the mem_mib call it manages to go to
 the wrong place. Can you check in the debugger if this is the case?

 What gets trashed if you overwrite the stack can depend heavily on
 compiler optimizations, so it's not necessarily a gcc bug.

 I don't see anything obviously wrong with the code, but that isn't
 conclusive.

 Also, is this happening on real i386, or in a 32-bit chroot on an
 amd64? Might also be a problem with the compat32 sysctl().

 -- 
 David A. Holland
 dholland@netbsd.org

From: "Gavan Fantom" <gavan@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/50939 CVS commit: pkgsrc/net/net-snmp
Date: Fri, 6 Oct 2017 02:39:38 +0000

 Module Name:	pkgsrc
 Committed By:	gavan
 Date:		Fri Oct  6 02:39:38 UTC 2017

 Modified Files:
 	pkgsrc/net/net-snmp: Makefile distinfo
 	pkgsrc/net/net-snmp/patches:
 	    patch-agent_mibgroup_hardware_cpu_cpu__sysctl.c

 Log Message:
 net-snmp: Prevent crash on NetBSD/i386

 A compiler bug causes incorrect compilation of the NetBSD-specific
 code in cpu_sysctl.c. This results in a crash shortly after startup if
 the machine has 2 or more CPUs.

 Disable optimisation in netsnmp_cpu_arch_load() only.
 This works around the problem reported in PR pkg/50939.


 To generate a diff of this commit:
 cvs rdiff -u -r1.120 -r1.121 pkgsrc/net/net-snmp/Makefile
 cvs rdiff -u -r1.90 -r1.91 pkgsrc/net/net-snmp/distinfo
 cvs rdiff -u -r1.6 -r1.7 \
     pkgsrc/net/net-snmp/patches/patch-agent_mibgroup_hardware_cpu_cpu__sysctl.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Gavan Fantom <gavan@coolfactor.org>
To: gnats-bugs@netbsd.org, tech-toolchain@netbsd.org
Cc: dholland-pbugs@netbsd.org, maya@NetBSD.org, kivinen@iki.fi,
 adam@netbsd.org
Subject: Re: pkg/50939: Bug in GCC optionization causing i386 net-snmpd, to
 crash
Date: Fri, 6 Oct 2017 02:14:39 +0100

 Some time ago, David Holland wrote:
 >   This sounds like it is overwriting its stack, probably in the mem_mib
 >   call. Then when it returns form the mem_mib call it manages to go to
 >   the wrong place. Can you check in the debugger if this is the case?
 >
 >   What gets trashed if you overwrite the stack can depend heavily on
 >   compiler optimizations, so it's not necessarily a gcc bug.
 >
 >   I don't see anything obviously wrong with the code, but that isn't
 >   conclusive.
 >
 >   Also, is this happening on real i386, or in a 32-bit chroot on an
 >   amd64? Might also be a problem with the compat32 sysctl().

 I have reproduced this on NetBSD 7.1 on a real i386 machine.

 The problem appears to be a compiler bug. Consider the following code, 
 from the middle of netsnmp_cpu_arch_load:

          for (i = 0; i < cpu_num; i++) {
              netsnmp_cpu_info  *ncpu = netsnmp_cpu_get_byIdx( i, 1 );
              size_t j = i * CPUSTATES;
              ncpu->user_ticks = (unsigned long long)ncpu_stats[j + CP_USER];
              ncpu->nice_ticks = (unsigned long long)ncpu_stats[j + CP_NICE];
              ncpu->sys2_ticks = (unsigned long long)ncpu_stats[j + 
 CP_SYS]+cpu_stats[j + CP_INTR];
              ncpu->kern_ticks = (unsigned long long)ncpu_stats[j + CP_SYS];
              ncpu->idle_ticks = (unsigned long long)ncpu_stats[j + CP_IDLE];
              ncpu->intrpt_ticks = (unsigned long long)ncpu_stats[j + 
 CP_INTR];
          }

 This is translated into the following block of code (disassembled by 
 gdb). The block is entered via a conditional branch from elsewhere, if 
 cpu_num > 0.

     0xbba64c88 <+1039>:  movl   $0x1,0x4(%esp)
     0xbba64c90 <+1047>:  movl   $0x0,(%esp)
     0xbba64c97 <+1054>:  call   0xbba09460 <netsnmp_cpu_get_byIdx@plt>
     0xbba64c9c <+1059>:  mov    (%edi),%edx
     0xbba64c9e <+1061>:  mov    0x4(%edi),%ecx
     0xbba64ca1 <+1064>:  mov    %edx,0x2008(%eax)
     0xbba64ca7 <+1070>:  mov    %ecx,0x200c(%eax)
     0xbba64cad <+1076>:  mov    0x8(%edi),%edx
     0xbba64cb0 <+1079>:  mov    0xc(%edi),%ecx
     0xbba64cb3 <+1082>:  mov    %edx,0x2010(%eax)
     0xbba64cb9 <+1088>:  mov    %ecx,0x2014(%eax)
     0xbba64cbf <+1094>:  mov    0x10(%edi),%edx
     0xbba64cc2 <+1097>:  mov    0x14(%edi),%ecx
     0xbba64cc5 <+1100>:  add    0x54(%esp),%edx
     0xbba64cc9 <+1104>:  adc    0x58(%esp),%ecx
     0xbba64ccd <+1108>:  mov    %edx,0x2068(%eax)
     0xbba64cd3 <+1114>:  mov    %ecx,0x206c(%eax)
     0xbba64cd9 <+1120>:  mov    0x10(%edi),%edx
     0xbba64cdc <+1123>:  mov    0x14(%edi),%ecx
     0xbba64cdf <+1126>:  mov    %edx,0x2030(%eax)
     0xbba64ce5 <+1132>:  mov    %ecx,0x2034(%eax)
     0xbba64ceb <+1138>:  mov    0x20(%edi),%edx
     0xbba64cee <+1141>:  mov    0x24(%edi),%ecx
     0xbba64cf1 <+1144>:  mov    %edx,0x2020(%eax)
     0xbba64cf7 <+1150>:  mov    %ecx,0x2024(%eax)
     0xbba64cfd <+1156>:  mov    0x18(%edi),%edx
     0xbba64d00 <+1159>:  mov    0x1c(%edi),%ecx
     0xbba64d03 <+1162>:  mov    %edx,0x2038(%eax)
     0xbba64d09 <+1168>:  mov    %ecx,0x203c(%eax)
     0xbba64d0f <+1174>:  mov    -0x258(%ebx),%eax
     0xbba64d15 <+1180>:  mov    (%eax),%eax
     0xbba64d17 <+1182>:  cmp    $0x1,%eax
     0xbba64d1a <+1185>:  jle    0xbba64ace <netsnmp_cpu_arch_load+597>
     0xbba64d20 <+1191>:  movl   $0x1,0x4(%esp)
     0xbba64d28 <+1199>:  movl   $0x1,(%esp)
     0xbba64d2f <+1206>:  call   0xbba09460 <netsnmp_cpu_get_byIdx@plt>
     0xbba64d34 <+1211>:  mov    0x28(%edi),%edx
     0xbba64d37 <+1214>:  mov    0x2c(%edi),%ecx
     0xbba64d3a <+1217>:  mov    %edx,0x2008(%eax)
     0xbba64d40 <+1223>:  mov    %ecx,0x200c(%eax)
     0xbba64d46 <+1229>:  mov    0x30(%edi),%esi
     0xbba64d49 <+1232>:  mov    0x34(%edi),%edi
     0xbba64d4c <+1235>:  mov    %esi,0x2010(%eax)
     0xbba64d52 <+1241>:  mov    %edi,0x2014(%eax)

 The branch to 0xbba64ace is a branch back to continue the normal 
 execution of the code, where free(...) is called and life carries on.

 Note that the compiler appears to have partially unrolled the loop. But 
 this is the end of that block of code. The next block of code happens to 
 be the cleanup code sysctl(mem_mib, ...) failing, which logs "sysctl 
 vm.vm_meter failed". This appears to be purely coincidental, and the 
 real failure here is that execution just falls off the end of this 
 half-finished loop unrolling.

     0xbba64d58 <+1247>:  call   0xbba0abf0 <__errno@plt>
     0xbba64d5d <+1252>:  mov    (%eax),%eax
     0xbba64d5f <+1254>:  mov    %eax,0x8(%esp)
     0xbba64d63 <+1258>:  lea    -0x41e78(%ebx),%eax
     0xbba64d69 <+1264>:  mov    %eax,0x4(%esp)
     0xbba64d6d <+1268>:  movl   $0x3,(%esp)
     0xbba64d74 <+1275>:  call   0xbba0af70 <snmp_log@plt>
     0xbba64d79 <+1280>:  jmp    0xbba649cd <netsnmp_cpu_arch_load+340>

 It does look like a machine with only one CPU would be spared this fate 
 as it would exit the loop after the first iteration and not try to 
 execute the second, incomplete, iteration. This problem should be 
 reproducible on any NetBSD/i386 machine with at least 2 CPUs.

 Obviously in the short term, the package will need to work around this 
 by disabling optimisation, but this is clearly something the compiler is 
 getting wrong.

From: David Holland <dholland-pbugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: pkg/50939: Bug in GCC optionization causing i386 net-snmpd, to
 crash
Date: Fri, 6 Oct 2017 11:02:49 +0000

 On Fri, Oct 06, 2017 at 04:45:00AM +0000, Gavan Fantom wrote:
  >  Note that the compiler appears to have partially unrolled the loop. But 
  >  this is the end of that block of code. The next block of code happens to 
  >  be the cleanup code sysctl(mem_mib, ...) failing, which logs "sysctl 
  >  vm.vm_meter failed". This appears to be purely coincidental, and the 
  >  real failure here is that execution just falls off the end of this 
  >  half-finished loop unrolling.

 That's... creative of it. :-/

 -- 
 David A. Holland
 dholland@netbsd.org

State-Changed-From-To: open->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Fri, 05 Mar 2021 04:53:32 +0000
State-Changed-Why:
compiler bug, we're not going to be fixing it and there's been a workaround
in pkgsrc since 2017.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.