NetBSD Problem Report #53185
From bouyer@antioche.eu.org Sun Apr 15 16:31:59 2018
Return-Path: <bouyer@antioche.eu.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id B3BB77A18F
for <gnats-bugs@gnats.NetBSD.org>; Sun, 15 Apr 2018 16:31:59 +0000 (UTC)
Message-Id: <20180415163155.B670D27EE@rochebonne.antioche.eu.org>
Date: Sun, 15 Apr 2018 18:31:55 +0200 (CEST)
From: bouyer@antioche.eu.org
Reply-To: bouyer@antioche.eu.org
To: gnats-bugs@NetBSD.org
Subject: axe(4) on evbarm cause panic, possibly compiler-related
X-Send-Pr-Version: 3.95
>Number: 53185
>Category: port-arm
>Synopsis: axe(4) on evbarm cause panic, possibly compiler-related
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-arm-maintainer
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Apr 15 16:35:00 +0000 2018
>Closed-Date: Thu Aug 29 08:47:03 +0000 2019
>Last-Modified: Thu Aug 29 08:47:03 +0000 2019
>Originator: Manuel Bouyer
>Release: NetBSD 8.99.14
>Organization:
>Environment:
System: NetBSD chartplotter 8.99.14 NetBSD 8.99.14 (CHARTPLOTTER) #43: Fri Apr 13 19:38:31 CEST 2018 bouyer@bip.soc.lip6.fr:/dsk/l1/misc/bouyer/tmp/evbarm-earmhf/obj/dsk/l1/misc/bouyer/HEAD/clean/src/sys/arch/evbarm/compile/CHARTPLOTTER evbarm
Architecture: earmv7hf
Machine: evbarm
>Description:
I have a axe(4) device connected to a allwinner A20-based board (olimex
lime2). CHARTPLOTTER is derived from sunxi:
axe0: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
ec_capabilities=1<VLAN_MTU>
ec_enabled=0
address: 38:c9:86:f1:6b:4d
media: Ethernet autoselect (100baseTX full-duplex)
status: active
inet6 fe80::362e:ebb2:14b5:fb93%axe0/64 flags 0x0 scopeid 0x2
inet6 2001:41d0:fe9d:1100:5545:9079:39fa:e695/64 flags 0x0
inet 10.0.0.3/24 broadcast 255.255.255.0 flags 0x0
Since upgrading to a 8.99.14 kernel (from 8.99.12), I get kernel
panic when starting large transfers from remote to the axe(4)
(like a scp, or pkg_add):
[ 256.958847916] data_abort_handler: data_aborts fsr=0x1 far=0x9cda25ee
[ 256.958847916] Fatal kernel mode data abort: 'Alignment Fault 1'
[ 256.958847916] trapframe: 0x99e37df0
[ 256.958847916] FSR=00000001, FAR=9cda25ee, spsr=20070113
[ 256.958847916] r0 =00000000, r1 =f0082000, r2 =00000004, r3 =00000004
[ 256.958847916] r4 =915eff00, r5 =0000023e, r6 =00001002, r7 =910af808
[ 256.958847916] r8 =9cda25ee, r9 =0000.824078999] uhid1 at uhidev2 reportid 3: input=2, output=0, feat6ec, ssp=99e37e40, slr=8000bf50, pc =80095134
lite network operation (such as an interactive ssh session) doens't
cause this. I never seen this with 8.99.12 or earlier, although
I used it the same way (especially, copying kernels to the
local sd card while testing new sunxi drivers).
0x80095134 points to the memcpy() call in axe_rxeof() at line 1251.
More specifically:
0x80095124 <+272>: ldr r5, [r11, #-52] ; 0xffffffcc
0x80095128 <+276>: b 0x80095274 <axe_rxeof+608>
0x8009512c <+280>: cmp r5, #3
0x80095130 <+284>: bls 0x80095350 <axe_rxeof+828>
0x80095134 <+288>: ldr r3, [r8], #4
0x80095138 <+292>: movw r0, #2047 ; 0x7ff
0x8009513c <+296>: sub r2, r5, #4
0x80095140 <+300>: str r3, [r11, #-48] ; 0xffffffd0
that would be the ldr which cause the trap, so this would be
c->axe_buf which is misaligned, confirmed by the r8 value.
I'm not sure how this could happen yet, but reading the sources,
it looks like at line 1323:
buf += sizeof(csum_hdr);
we're mis-aligning buf, as axe_csum_hdr is either 3 or 5 uint16_t.
another thing that changed is the compiler. I wonder if the compiler
could be optimising the memcpy() call the wrong way here,
assuming buf is always aligned.
>How-To-Repeat:
use a axe(4) on a arm CPU ?
>Fix:
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->toolchain-manager
Responsible-Changed-By: martin@NetBSD.org
Responsible-Changed-When: Mon, 16 Apr 2018 08:58:56 +0000
Responsible-Changed-Why:
Looks like a compiler issue
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: toolchain/53185: axe(4) on evbarm cause panic, possibly
compiler-related
Date: Mon, 16 Apr 2018 18:12:18 +0200
On Sun, Apr 15, 2018 at 04:35:01PM +0000, bouyer@antioche.eu.org wrote:
> System: NetBSD chartplotter 8.99.14 NetBSD 8.99.14 (CHARTPLOTTER) #43: Fri Apr 13 19:38:31 CEST 2018 bouyer@bip.soc.lip6.fr:/dsk/l1/misc/bouyer/tmp/evbarm-earmhf/obj/dsk/l1/misc/bouyer/HEAD/clean/src/sys/arch/evbarm/compile/CHARTPLOTTER evbarm
> Architecture: earmv7hf
> Machine: evbarm
> >Description:
> I have a axe(4) device connected to a allwinner A20-based board (olimex
> lime2). CHARTPLOTTER is derived from sunxi:
> axe0: flags=0x8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500
> ec_capabilities=1<VLAN_MTU>
> ec_enabled=0
> address: 38:c9:86:f1:6b:4d
> media: Ethernet autoselect (100baseTX full-duplex)
> status: active
> inet6 fe80::362e:ebb2:14b5:fb93%axe0/64 flags 0x0 scopeid 0x2
> inet6 2001:41d0:fe9d:1100:5545:9079:39fa:e695/64 flags 0x0
> inet 10.0.0.3/24 broadcast 255.255.255.0 flags 0x0
>
> Since upgrading to a 8.99.14 kernel (from 8.99.12), I get kernel
> panic when starting large transfers from remote to the axe(4)
> (like a scp, or pkg_add):
> [ 256.958847916] data_abort_handler: data_aborts fsr=0x1 far=0x9cda25ee
> [ 256.958847916] Fatal kernel mode data abort: 'Alignment Fault 1'
> [ 256.958847916] trapframe: 0x99e37df0
> [ 256.958847916] FSR=00000001, FAR=9cda25ee, spsr=20070113
> [ 256.958847916] r0 =00000000, r1 =f0082000, r2 =00000004, r3 =00000004
> [ 256.958847916] r4 =915eff00, r5 =0000023e, r6 =00001002, r7 =910af808
> [ 256.958847916] r8 =9cda25ee, r9 =0000.824078999] uhid1 at uhidev2 reportid 3: input=2, output=0, feat6ec, ssp=99e37e40, slr=8000bf50, pc =80095134
>
> lite network operation (such as an interactive ssh session) doens't
> cause this. I never seen this with 8.99.12 or earlier, although
> I used it the same way (especially, copying kernels to the
> local sd card while testing new sunxi drivers).
>
> 0x80095134 points to the memcpy() call in axe_rxeof() at line 1251.
> More specifically:
> 0x80095124 <+272>: ldr r5, [r11, #-52] ; 0xffffffcc
> 0x80095128 <+276>: b 0x80095274 <axe_rxeof+608>
> 0x8009512c <+280>: cmp r5, #3
> 0x80095130 <+284>: bls 0x80095350 <axe_rxeof+828>
> 0x80095134 <+288>: ldr r3, [r8], #4
> 0x80095138 <+292>: movw r0, #2047 ; 0x7ff
> 0x8009513c <+296>: sub r2, r5, #4
> 0x80095140 <+300>: str r3, [r11, #-48] ; 0xffffffd0
>
> that would be the ldr which cause the trap, so this would be
> c->axe_buf which is misaligned, confirmed by the r8 value.
>
> I'm not sure how this could happen yet, but reading the sources,
> it looks like at line 1323:
> buf += sizeof(csum_hdr);
> we're mis-aligning buf, as axe_csum_hdr is either 3 or 5 uint16_t.
>
> another thing that changed is the compiler. I wonder if the compiler
> could be optimising the memcpy() call the wrong way here,
> assuming buf is always aligned.
The compiler may be right after all.
from http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.faqs/ka15414.html
"When compiling for a ARMv6 or ARMv7-A/R processor, the ARM Compiler will assume that it can use unaligned accesses"
And
"Further, unaligned accesses are only allowed to regions marked as Normal memory type, and unaligned access support must be enabled by setting the SCTLR.A bit in the system control coprocessor. Attempts to perform unaligned accesses when not allowed will cause an alignment fault (data abort)."
Are we setting the SCTLR.A bit ? Also in kernel mode ?
If not, should the kernel be compiled with -mno-unaligned-access ?
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: toolchain/53185: axe(4) on evbarm cause panic, possibly
compiler-related
Date: Mon, 16 Apr 2018 18:38:58 +0200
On Mon, Apr 16, 2018 at 04:15:01PM +0000, Manuel Bouyer wrote:
> >
> > 0x80095134 points to the memcpy() call in axe_rxeof() at line 1251.
> > More specifically:
> > 0x80095124 <+272>: ldr r5, [r11, #-52] ; 0xffffffcc
> > 0x80095128 <+276>: b 0x80095274 <axe_rxeof+608>
> > 0x8009512c <+280>: cmp r5, #3
> > 0x80095130 <+284>: bls 0x80095350 <axe_rxeof+828>
> > 0x80095134 <+288>: ldr r3, [r8], #4
> > 0x80095138 <+292>: movw r0, #2047 ; 0x7ff
> > 0x8009513c <+296>: sub r2, r5, #4
> > 0x80095140 <+300>: str r3, [r11, #-48] ; 0xffffffd0
gcc 5.5.0 in netbsd-8 would use 2 ldrh instead of one ldr here;
which would have worked in my case.
But, in the general case, the loop does
buf += rxlen
and I don't think there's a requirement for rxlen to be a multiple of 2 here.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--
Responsible-Changed-From-To: toolchain-manager->port-arm-manager
Responsible-Changed-By: bouyer@NetBSD.org
Responsible-Changed-When: Mon, 16 Apr 2018 18:55:16 +0000
Responsible-Changed-Why:
Looks like the compiler is right, either the system control register is
not properly set up or we should compile with -mno-unaligned-access
Responsible-Changed-From-To: port-arm-manager->port-arm-maintainer
Responsible-Changed-By: dholland@NetBSD.org
Responsible-Changed-When: Wed, 09 Jan 2019 19:10:04 +0000
Responsible-Changed-Why:
use right group
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: toolchain/53185: axe(4) on evbarm cause panic, possibly
compiler-related
Date: Wed, 9 Jan 2019 19:11:44 +0000
Not sent to gnats.
------
From: Manuel Bouyer <bouyer@antioche.eu.org>
To: toolchain-manager@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: toolchain/53185: axe(4) on evbarm cause panic, possibly
compiler-related
Date: Mon, 16 Apr 2018 20:53:34 +0200
On Mon, Apr 16, 2018 at 04:15:01PM +0000, Manuel Bouyer wrote:
> [trimmed]
> If not, should the kernel be compiled with -mno-unaligned-access ?
a kernel built with -mno-unaligned-access works fine.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
State-Changed-From-To: open->closed
State-Changed-By: mrg@NetBSD.org
State-Changed-When: Thu, 29 Aug 2019 08:47:03 +0000
State-Changed-Why:
workaround was commited a while ago; present in -9. a different
workaround has been commited to -current for other drivers.. at
best, this could be a pullup, but it is not needed.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.