NetBSD Problem Report #48954

From alnsn@NetBSD.org  Fri Jun 27 09:45:03 2014
Return-Path: <alnsn@NetBSD.org>
Received: by mollari.NetBSD.org (Postfix, from userid 1459)
	id 9D813A653D; Fri, 27 Jun 2014 09:45:03 +0000 (UTC)
Message-Id: <20140627094503.9D813A653D@mollari.NetBSD.org>
Date: Fri, 27 Jun 2014 09:45:03 +0000 (UTC)
From: alnsn@NetBSD.org
Reply-To: alnsn@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: USB diagconstic message: actlen (-15996) > len (4)
X-Send-Pr-Version: 3.95

>Number:         48954
>Category:       kern
>Synopsis:       USB diagconstic message: actlen (-15996) > len (4)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Jun 27 09:50:00 +0000 2014
>Last-Modified:  Sun Jul 06 09:10:07 +0000 2014
>Originator:     Alexander Nasonov
>Release:        NetBSD 6.99.44
>Organization:
	TNF
>Environment:
	$NetBSD: usb.c,v 1.149 2014/03/16 05:20:29 dholland Exp $
	$NetBSD: usb_mem.c,v 1.64 2013/12/22 18:29:25 mlelstv Exp $
	$NetBSD: usb_pci.c,v 1.7 2008/04/28 20:23:55 martin Exp $
	$NetBSD: usb_quirks.c,v 1.80 2013/11/14 16:33:20 nonaka Exp $
	$NetBSD: usb_subr.c,v 1.196 2014/02/17 07:34:21 skrll Exp $
	$NetBSD: usbdi.c,v 1.160 2013/11/30 12:16:14 skrll Exp $
	$NetBSD: usbdi_util.c,v 1.62 2013/09/26 07:25:31 skrll Exp $
	$NetBSD: if_urtwn.c,v 1.30 2014/05/08 05:59:09 mrg Exp $
System: NetBSD neva 6.99.44 NetBSD 6.99.44 (GENERIC) #1: Thu Jun 26 11:53:57 BST 2014  alnsn@neva:/home/alnsn/netbsd-current/src/sys/arch/amd64/compile/obj/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
	urtwn driver is a bit unstable for me. I suspect something
	is corrupting memory. This diagnostic message is an extra
	confirmation of my suspicion:

	urtwn0: link state UP (was UNKNOWN)
	urtwn1: link state UP (was UNKNOWN)
	urtwn0: link state DOWN (was UP)
	urtwn0: link state UP (was DOWN)
	urtwn0: link state DOWN (was UP)
	urtwn0: link state UP (was DOWN)
	urtwn0: link state DOWN (was UP)
	usb_transfer_complete: actlen (-15996) > len (4)

	The kernel is built with DIAGNOSTIC, DEBUG, LOCKDEBUG, USB_DEBUG
	and URTWN_DEBUG options.
>How-To-Repeat:
	Not easily reproducable.
>Fix:
	Not known.

>Audit-Trail:
From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: re: kern/48954: USB diagconstic message: actlen (-15996) > len (4)
Date: Fri, 27 Jun 2014 21:21:10 +1000

 i've recently been using urtwn(4) as well (as you can see from my
 1.30 revision to if_urtwn.c :-).

 i've not see anything that suggested corrupted memory, though it
 does seem possible.  i have seen it lock up twice, unable to talk
 to the network at all, requiring being unplugged and reinserted
 to work again.

 so there is certainly something problematic, if not multiple things.

 in dmesg:

 urtwn0: link state UP (was UNKNOWN)
 urtwn0: link state DOWN (was UP)
 urtwn0: link state UP (was DOWN)

 and

 urtwn0: could not load firmware page 3

 and

 urtwn0: timeout waiting for MAC auto ON

 and

 urtwn0: device timeout

 none of which seem to say much useful.


 .mrg.

From: Alexander Nasonov <alnsn@yandex.ru>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, alnsn@NetBSD.org
Subject: Re: kern/48954: USB diagconstic message: actlen (-15996) > len (4)
Date: Fri, 27 Jun 2014 15:08:28 +0100

 matthew green wrote:
 >  i've not see anything that suggested corrupted memory, though it
 >  does seem possible.  i have seen it lock up twice, unable to talk
 >  to the network at all, requiring being unplugged and reinserted
 >  to work again.

 Repluging my card almost surely leads to a crash. Location of a crash
 is quite predictable but it depends on compilation flags and a verbosity
 of debugging messages.

 I picked one crash between usbd_setup_xfer and usbd_transfer
 calls:

 ffffffff8044b34c:       48 8b bb f8 32 00 00    mov    0x32f8(%rbx),%rdi
 ffffffff8044b353:       48 c7 44 24 08 4d 75    movq $0xffffffff8044754d,0x8(%rsp)
 ffffffff8044b35a:       44 80
 ffffffff8044b35c:       c7 04 24 00 00 00 00    movl   $0x0,(%rsp)
 ffffffff8044b363:       41 b9 05 00 00 00       mov    $0x5,%r9d
 ffffffff8044b369:       41 b8 00 40 00 00       mov    $0x4000,%r8d
 ffffffff8044b36f:       4c 89 e2                mov    %r12,%rdx
 ffffffff8044b372:       e8 e7 17 41 00          callq  ffffffff8085cb5e <usbd_setup_xfer>
 ffffffff8044b377:       48 8b bb f8 32 00 00    mov    0x32f8(%rbx),%rdi

                                                        ^^^^^^^^^^^^
                                                        IT CRASHES HERE

 ffffffff8044b37e:       e8 78 11 41 00          callq  ffffffff8085c4fb <usbd_transfer>

 Note that it's reading the same memory location 0x32f8(%rbx) twice but
 the second read crashes the kernel.

 Alex

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/48954: USB diagconstic message: actlen (-15996) > len (4)
Date: Fri, 27 Jun 2014 14:13:41 +0000

 On Fri, Jun 27, 2014 at 02:10:14PM +0000, Alexander Nasonov wrote:
  >  ffffffff8044b34c:       48 8b bb f8 32 00 00    mov    0x32f8(%rbx),%rdi
  >  ffffffff8044b353:       48 c7 44 24 08 4d 75    movq $0xffffffff8044754d,0x8(%rsp)
  >  ffffffff8044b35a:       44 80
  >  ffffffff8044b35c:       c7 04 24 00 00 00 00    movl   $0x0,(%rsp)
  >  ffffffff8044b363:       41 b9 05 00 00 00       mov    $0x5,%r9d
  >  ffffffff8044b369:       41 b8 00 40 00 00       mov    $0x4000,%r8d
  >  ffffffff8044b36f:       4c 89 e2                mov    %r12,%rdx
  >  ffffffff8044b372:       e8 e7 17 41 00          callq  ffffffff8085cb5e <usbd_setup_xfer>
  >  ffffffff8044b377:       48 8b bb f8 32 00 00    mov    0x32f8(%rbx),%rdi
  >  
  >                                                         ^^^^^^^^^^^^
  >                                                         IT CRASHES HERE
  >  
  >  ffffffff8044b37e:       e8 78 11 41 00          callq  ffffffff8085c4fb <usbd_transfer>
  >  
  >  Note that it's reading the same memory location 0x32f8(%rbx) twice but
  >  the second read crashes the kernel.

 That means either compiled code isn't preserving %rbx according to the
 function call ABI (unlikely) or the stack's being overwritten.

 -- 
 David A. Holland
 dholland@netbsd.org

From: Nick Hudson <skrll@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: alnsn@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org, 
 netbsd-bugs@netbsd.org
Subject: Re: kern/48954: USB diagconstic message: actlen (-15996) > len (4)
Date: Fri, 27 Jun 2014 15:34:43 +0100

 Have you tried setting kmem_guard_depth as described in kmem(9)?

 Nick

From: Alexander Nasonov <alnsn@yandex.ru>
To: Nick Hudson <skrll@netbsd.org>
Cc: gnats-bugs@NetBSD.org, alnsn@NetBSD.org, kern-bug-people@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48954: USB diagconstic message: actlen (-15996) > len (4)
Date: Fri, 27 Jun 2014 19:48:30 +0100

 Nick Hudson wrote:
 > Have you tried setting kmem_guard_depth as described in kmem(9)?

 kmem_guard_depth=50000 made no difference.

 usbd_get_string: getting lang failed, using 0
 urtwn0 at uhub1 port 1
 urtwn0: Realtek 802.11n WLAN Adapter, rev 2.00/2.00, addr 3
 urtwn0: MAC/BB RTL8188CUS, RF 6052 1T1R, address 80:1f:02:84:fb:fe
 urtwn0: 1 rx pipe, 2 tx pipes
 urtwn0: 11b rates: 1Mbps 2Mbps 5.5Mbps 11Mbps
 urtwn0: 11g rates: 1Mbps 2Mbps 5.5Mbps 11Mbps 6Mbps 9Mbps 12Mbps 18Mbps
 24Mbps 36Mbps 48Mbps 54Mbps
 uvm_fault(0xfffffe811cc122e8, 0x0, 2) -> e
 fatal page fault in supervisor mode
 trap type 6 code 2 rip ffffffff80899582 cs 8 rflags 10286 cr2 0 ilevel 6
 rsp fffffe80ca6fc6d0
 curlwp 0xfffffe811c501b40 pid 155.1 lowest kstack 0xfffffe80ca6f92c0
 panic: trap
 cpu3: Begin traceback...
 vpanic() at netbsd:vpanic+0x13c
 snprintf() at netbsd:snprintf
 startlwp() at netbsd:startlwp
 alltraps() at netbsd:alltraps+0x9e
 urtwn_init() at netbsd:urtwn_init+0x182c
 ether_ioctl() at netbsd:ether_ioctl+0xc8
 ieee80211_ioctl() at netbsd:ieee80211_ioctl+0x9a4
 urtwn_ioctl() at netbsd:urtwn_ioctl+0x91
 in6_update_ifa1() at netbsd:in6_update_ifa1+0x6fd
 in6_update_ifa() at netbsd:in6_update_ifa+0x36
 in6_control1() at netbsd:in6_control1+0x3c3
 in6_control() at netbsd:in6_control+0x78
 udp6_ioctl_wrapper() at netbsd:udp6_ioctl_wrapper+0x3e
 compat_ifioctl() at netbsd:compat_ifioctl+0x127
 doifioctl() at netbsd:doifioctl+0x43b
 soo_ioctl() at netbsd:soo_ioctl+0x2b8
 sys_ioctl() at netbsd:sys_ioctl+0x17e
 syscall() at netbsd:syscall+0x9a
 --- syscall (number 54) ---
 7f7ff74d088a:
 cpu3: End traceback...


 ffffffff804536a1 <urtwn_init>:
 ffffffff804536a1:       55                      push   %rbp
 ffffffff804536a2:       48 89 e5                mov    %rsp,%rbp
 ffffffff804536a5:       41 57                   push   %r15
 ffffffff804536a7:       41 56                   push   %r14
 ffffffff804536a9:       41 55                   push   %r13
 ffffffff804536ab:       41 54                   push   %r12
 ffffffff804536ad:       53                      push   %rbx
 ffffffff804536ae:       48 83 ec 58             sub    $0x58,%rsp
 ffffffff804536b2:       48 89 7d c8             mov    %rdi,-0x38(%rbp)
 ffffffff804536b6:       48 8b 1f                mov    (%rdi),%rbx
 ffffffff804536b9:       f6 05 f8 46 b0 00 02    testb
 $0x2,0xb046f8(%rip)        # ffffffff80f57db8 <urtwn_debug>
 ffffffff804536c0:       0f 85 fa 03 00 00       jne    ffffffff80453ac0
 <urtwn_init+0x41f>

 ...

 ffffffff80454e8b:       e8 68 e0 ff ff          callq  ffffffff80452ef8
 <urtwn_set_chan.constprop.7>
 ffffffff80454e90:       48 8b 8b 00 33 00 00    mov    0x3300(%rbx),%rcx
 ffffffff80454e97:       48 8d 93 f0 32 00 00    lea    0x32f0(%rbx),%rdx
 ffffffff80454e9e:       48 8b b3 68 11 00 00    mov    0x1168(%rbx),%rsi
 ffffffff80454ea5:       48 8b bb f8 32 00 00    mov    0x32f8(%rbx),%rdi
 ffffffff80454eac:       48 c7 44 24 08 1d 0f    movq
 $0xffffffff80450f1d,0x8(%rsp)
 ffffffff80454eb3:       45 80
 ffffffff80454eb5:       c7 04 24 00 00 00 00    movl   $0x0,(%rsp)
 ffffffff80454ebc:       41 b9 05 00 00 00       mov    $0x5,%r9d
 ffffffff80454ec2:       41 b8 00 40 00 00       mov    $0x4000,%r8d
 ffffffff80454ec8:       e8 b1 46 44 00          callq  ffffffff8089957e
 <usbd_setup_xfer>
 ffffffff80454ecd:       48 8b bb f8 32 00 00    mov    0x32f8(%rbx),%rdi
 ffffffff80454ed4:       e8 42 40 44 00          callq  ffffffff80898f1b
 <usbd_transfer>
 ffffffff80454ed9:       83 f8 01                cmp    $0x1,%eax

 Alex

From: Nick Hudson <skrll@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: Alexander Nasonov <alnsn@yandex.ru>, kern-bug-people@netbsd.org, 
 gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, alnsn@NetBSD.org
Subject: Re: kern/48954: USB diagconstic message: actlen (-15996) > len (4)
Date: Mon, 30 Jun 2014 08:07:25 +0100

 Does disabling IPv6 on your machine help?

 Nick

From: Alexander Nasonov <alnsn@yandex.ru>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, alnsn@NetBSD.org
Subject: Re: kern/48954: USB diagconstic message: actlen (-15996) > len (4)
Date: Sun, 6 Jul 2014 10:09:34 +0100

 Nick Hudson wrote:
 >  Does disabling IPv6 on your machine help?

 It makes no difference.

 After spending a bit more time looking at the bug, I found out that
 dmesg doesn't report stacktrace correctly. The kernel crashed in the
 usbd_setup_xfer at the instruction that corresponds to the first line
 of the function:

         xfer->pipe = pipe;

 Also, I got three more crashes from ehci_softintr. Two crashes were at
 memcpy+0x14 and the third was at

 crash> x/i 0xffffffff802a0582
 ehci_idone+0xa9:        movslq  58 (%r14),%rsi

 The new kernel doesn't have DIAGNOSTIC so I don't know whether the
 crashes in memcpy were preceeded by any diagnostic messages.

 Alex

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.