NetBSD Problem Report #53185

From bouyer@antioche.eu.org  Sun Apr 15 16:31:59 2018
Return-Path: <bouyer@antioche.eu.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id B3BB77A18F
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 15 Apr 2018 16:31:59 +0000 (UTC)
Message-Id: <20180415163155.B670D27EE@rochebonne.antioche.eu.org>
Date: Sun, 15 Apr 2018 18:31:55 +0200 (CEST)
From: bouyer@antioche.eu.org
Reply-To: bouyer@antioche.eu.org
To: gnats-bugs@NetBSD.org
Subject: axe(4) on evbarm cause panic, possibly compiler-related
X-Send-Pr-Version: 3.95

>Number:         53185
>Category:       port-arm
>Synopsis:       axe(4) on evbarm cause panic, possibly compiler-related
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-arm-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Apr 15 16:35:00 +0000 2018
>Closed-Date:    Thu Aug 29 08:47:03 +0000 2019
>Last-Modified:  Thu Aug 29 08:47:03 +0000 2019
>Originator:     Manuel Bouyer
>Release:        NetBSD 8.99.14
>Organization:
>Environment:
System: NetBSD chartplotter 8.99.14 NetBSD 8.99.14 (CHARTPLOTTER) #43: Fri Apr 13 19:38:31 CEST 2018 bouyer@bip.soc.lip6.fr:/dsk/l1/misc/bouyer/tmp/evbarm-earmhf/obj/dsk/l1/misc/bouyer/HEAD/clean/src/sys/arch/evbarm/compile/CHARTPLOTTER evbarm
Architecture: earmv7hf
Machine: evbarm
>Description:
	I have a axe(4) device connected to a allwinner A20-based board (olimex
	lime2). CHARTPLOTTER is derived from sunxi:
axe0: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
        ec_capabilities=1<VLAN_MTU>
        ec_enabled=0
        address: 38:c9:86:f1:6b:4d
        media: Ethernet autoselect (100baseTX full-duplex)
        status: active
        inet6 fe80::362e:ebb2:14b5:fb93%axe0/64 flags 0x0 scopeid 0x2
        inet6 2001:41d0:fe9d:1100:5545:9079:39fa:e695/64 flags 0x0
        inet 10.0.0.3/24 broadcast 255.255.255.0 flags 0x0

	Since upgrading to a 8.99.14 kernel (from 8.99.12), I get kernel
	panic when starting large transfers from remote to the axe(4)
	(like a scp, or pkg_add):
[      256.958847916] data_abort_handler: data_aborts fsr=0x1 far=0x9cda25ee
[      256.958847916] Fatal kernel mode data abort: 'Alignment Fault 1'
[      256.958847916] trapframe: 0x99e37df0
[      256.958847916] FSR=00000001, FAR=9cda25ee, spsr=20070113
[      256.958847916] r0 =00000000, r1 =f0082000, r2 =00000004, r3 =00000004
[      256.958847916] r4 =915eff00, r5 =0000023e, r6 =00001002, r7 =910af808
[      256.958847916] r8 =9cda25ee, r9 =0000.824078999] uhid1 at uhidev2 reportid 3: input=2, output=0, feat6ec, ssp=99e37e40, slr=8000bf50, pc =80095134

	lite network operation (such as an interactive ssh session) doens't
	cause this. I never seen this with 8.99.12 or earlier, although
	I used it the same way (especially, copying kernels to the
	local sd card while testing new sunxi drivers).

	0x80095134 points to the memcpy() call in axe_rxeof() at line 1251.
	More specifically:
0x80095124 <+272>:   ldr     r5, [r11, #-52] ; 0xffffffcc
0x80095128 <+276>:   b       0x80095274 <axe_rxeof+608>
0x8009512c <+280>:   cmp     r5, #3
0x80095130 <+284>:   bls     0x80095350 <axe_rxeof+828>
0x80095134 <+288>:   ldr     r3, [r8], #4
0x80095138 <+292>:   movw    r0, #2047       ; 0x7ff
0x8009513c <+296>:   sub     r2, r5, #4
0x80095140 <+300>:   str     r3, [r11, #-48] ; 0xffffffd0

	that would be the ldr which cause the trap, so this would be
	c->axe_buf which is misaligned, confirmed by the r8 value.

	I'm not sure how this could happen yet, but reading the sources,
	it looks like at line 1323:
                        buf += sizeof(csum_hdr);
	we're mis-aligning buf, as axe_csum_hdr is either 3 or 5 uint16_t.

	another thing that changed is the compiler. I wonder if the compiler
	could be optimising the memcpy() call the wrong way here,
	assuming buf is always aligned.
>How-To-Repeat:
	use a axe(4) on a arm CPU ?
>Fix:

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->toolchain-manager
Responsible-Changed-By: martin@NetBSD.org
Responsible-Changed-When: Mon, 16 Apr 2018 08:58:56 +0000
Responsible-Changed-Why:
Looks like a compiler issue


From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: toolchain/53185: axe(4) on evbarm cause panic, possibly
 compiler-related
Date: Mon, 16 Apr 2018 18:12:18 +0200

 On Sun, Apr 15, 2018 at 04:35:01PM +0000, bouyer@antioche.eu.org wrote:
 > System: NetBSD chartplotter 8.99.14 NetBSD 8.99.14 (CHARTPLOTTER) #43: Fri Apr 13 19:38:31 CEST 2018 bouyer@bip.soc.lip6.fr:/dsk/l1/misc/bouyer/tmp/evbarm-earmhf/obj/dsk/l1/misc/bouyer/HEAD/clean/src/sys/arch/evbarm/compile/CHARTPLOTTER evbarm
 > Architecture: earmv7hf
 > Machine: evbarm
 > >Description:
 > 	I have a axe(4) device connected to a allwinner A20-based board (olimex
 > 	lime2). CHARTPLOTTER is derived from sunxi:
 > axe0: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
 >         ec_capabilities=1<VLAN_MTU>
 >         ec_enabled=0
 >         address: 38:c9:86:f1:6b:4d
 >         media: Ethernet autoselect (100baseTX full-duplex)
 >         status: active
 >         inet6 fe80::362e:ebb2:14b5:fb93%axe0/64 flags 0x0 scopeid 0x2
 >         inet6 2001:41d0:fe9d:1100:5545:9079:39fa:e695/64 flags 0x0
 >         inet 10.0.0.3/24 broadcast 255.255.255.0 flags 0x0
 > 
 > 	Since upgrading to a 8.99.14 kernel (from 8.99.12), I get kernel
 > 	panic when starting large transfers from remote to the axe(4)
 > 	(like a scp, or pkg_add):
 > [      256.958847916] data_abort_handler: data_aborts fsr=0x1 far=0x9cda25ee
 > [      256.958847916] Fatal kernel mode data abort: 'Alignment Fault 1'
 > [      256.958847916] trapframe: 0x99e37df0
 > [      256.958847916] FSR=00000001, FAR=9cda25ee, spsr=20070113
 > [      256.958847916] r0 =00000000, r1 =f0082000, r2 =00000004, r3 =00000004
 > [      256.958847916] r4 =915eff00, r5 =0000023e, r6 =00001002, r7 =910af808
 > [      256.958847916] r8 =9cda25ee, r9 =0000.824078999] uhid1 at uhidev2 reportid 3: input=2, output=0, feat6ec, ssp=99e37e40, slr=8000bf50, pc =80095134
 > 
 > 	lite network operation (such as an interactive ssh session) doens't
 > 	cause this. I never seen this with 8.99.12 or earlier, although
 > 	I used it the same way (especially, copying kernels to the
 > 	local sd card while testing new sunxi drivers).
 > 
 > 	0x80095134 points to the memcpy() call in axe_rxeof() at line 1251.
 > 	More specifically:
 > 0x80095124 <+272>:   ldr     r5, [r11, #-52] ; 0xffffffcc
 > 0x80095128 <+276>:   b       0x80095274 <axe_rxeof+608>
 > 0x8009512c <+280>:   cmp     r5, #3
 > 0x80095130 <+284>:   bls     0x80095350 <axe_rxeof+828>
 > 0x80095134 <+288>:   ldr     r3, [r8], #4
 > 0x80095138 <+292>:   movw    r0, #2047       ; 0x7ff
 > 0x8009513c <+296>:   sub     r2, r5, #4
 > 0x80095140 <+300>:   str     r3, [r11, #-48] ; 0xffffffd0
 > 
 > 	that would be the ldr which cause the trap, so this would be
 > 	c->axe_buf which is misaligned, confirmed by the r8 value.
 > 
 > 	I'm not sure how this could happen yet, but reading the sources,
 > 	it looks like at line 1323:
 >                         buf += sizeof(csum_hdr);
 > 	we're mis-aligning buf, as axe_csum_hdr is either 3 or 5 uint16_t.
 > 
 > 	another thing that changed is the compiler. I wonder if the compiler
 > 	could be optimising the memcpy() call the wrong way here,
 > 	assuming buf is always aligned.

 The compiler may be right after all.
 from http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15414.html
 "When compiling for a ARMv6 or ARMv7-A/R processor, the ARM Compiler will assume that it can use unaligned accesses"

 And
 "Further, unaligned accesses are only allowed to regions marked as Normal memory type, and unaligned access support must be enabled by setting the SCTLR.A bit in the system control coprocessor. Attempts to perform unaligned accesses when not allowed will cause an alignment fault (data abort)."

 Are we setting the SCTLR.A bit ? Also in kernel mode ?

 If not, should the kernel be compiled with -mno-unaligned-access ?

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: toolchain/53185: axe(4) on evbarm cause panic, possibly
 compiler-related
Date: Mon, 16 Apr 2018 18:38:58 +0200

 On Mon, Apr 16, 2018 at 04:15:01PM +0000, Manuel Bouyer wrote:
 >  > 
 >  > 	0x80095134 points to the memcpy() call in axe_rxeof() at line 1251.
 >  > 	More specifically:
 >  > 0x80095124 <+272>:   ldr     r5, [r11, #-52] ; 0xffffffcc
 >  > 0x80095128 <+276>:   b       0x80095274 <axe_rxeof+608>
 >  > 0x8009512c <+280>:   cmp     r5, #3
 >  > 0x80095130 <+284>:   bls     0x80095350 <axe_rxeof+828>
 >  > 0x80095134 <+288>:   ldr     r3, [r8], #4
 >  > 0x80095138 <+292>:   movw    r0, #2047       ; 0x7ff
 >  > 0x8009513c <+296>:   sub     r2, r5, #4
 >  > 0x80095140 <+300>:   str     r3, [r11, #-48] ; 0xffffffd0

 gcc 5.5.0 in netbsd-8 would use 2 ldrh instead of one ldr here;
 which would have worked in my case.

 But, in the general case, the loop does
 		buf += rxlen
 and I don't think there's a requirement for rxlen to be a multiple of 2 here.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference
 --

Responsible-Changed-From-To: toolchain-manager->port-arm-manager
Responsible-Changed-By: bouyer@NetBSD.org
Responsible-Changed-When: Mon, 16 Apr 2018 18:55:16 +0000
Responsible-Changed-Why:
Looks like the compiler is right, either the system control register is
not properly set up or we should compile with -mno-unaligned-access


Responsible-Changed-From-To: port-arm-manager->port-arm-maintainer
Responsible-Changed-By: dholland@NetBSD.org
Responsible-Changed-When: Wed, 09 Jan 2019 19:10:04 +0000
Responsible-Changed-Why:
use right group


From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: toolchain/53185: axe(4) on evbarm cause panic, possibly
 compiler-related
Date: Wed, 9 Jan 2019 19:11:44 +0000

 Not sent to gnats.

    ------

 From: Manuel Bouyer <bouyer@antioche.eu.org>
 To: toolchain-manager@netbsd.org, gnats-admin@netbsd.org,
 	netbsd-bugs@netbsd.org
 Subject: Re: toolchain/53185: axe(4) on evbarm cause panic, possibly
 	compiler-related
 Date: Mon, 16 Apr 2018 20:53:34 +0200

 On Mon, Apr 16, 2018 at 04:15:01PM +0000, Manuel Bouyer wrote:
 > [trimmed]
 >  If not, should the kernel be compiled with -mno-unaligned-access ?

 a kernel built with -mno-unaligned-access works fine.

 -- 
 Manuel Bouyer <bouyer@antioche.eu.org>
      NetBSD: 26 ans d'experience feront toujours la difference

State-Changed-From-To: open->closed
State-Changed-By: mrg@NetBSD.org
State-Changed-When: Thu, 29 Aug 2019 08:47:03 +0000
State-Changed-Why:
workaround was commited a while ago; present in -9.  a different
workaround has been commited to -current for other drivers.. at
best, this could be a pullup, but it is not needed.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.