NetBSD Problem Report #49603

From gson@gson.org  Sun Jan 25 13:51:33 2015
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 7270BA654B
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 25 Jan 2015 13:51:33 +0000 (UTC)
Message-Id: <20150125135123.D2AD274419F@guava.gson.org>
Date: Sun, 25 Jan 2015 15:51:23 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@gnats.NetBSD.org
Subject: Single-stepping into syscall reboots -current/amd64 under qemu
X-Send-Pr-Version: 3.95

>Number:         49603
>Category:       kern
>Synopsis:       Single-stepping into syscall reboots -current/amd64 under qemu
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Jan 25 13:55:01 +0000 2015
>Closed-Date:    Thu Mar 09 19:06:04 +0000 2017
>Last-Modified:  Thu Mar 09 19:06:04 +0000 2017
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current, source date >= 2014.12.14.18.14.15
>Organization:
>Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
>Description:

When debugging a userland process using gdb under NetBSD-current/amd64
running in qemu virtual machine, single stepping into a system call
causes an instant reboot.  Root privileges are not required.

I ran an automated binary search to find when the problem first
appeard, and it pointed at src/sys/sys/ksyms.h 1.30, committed by
christos on CVS date 2014.12.14.18.14.15.

The reboot is consistently reproducible under qemu, but I have not
been able to reproduce it on physical hardware (tested on an AMD
Athlon64).  It also does not happen with the i386 port, only amd64.

>How-To-Repeat:

pkg_add py-anita
anita interact http://nyftp.netbsd.org/pub/NetBSD-daily/HEAD/201501250540Z/amd64/
(log in as root)
gdb /bin/sync
break sync
run
stepi
stepi
stepi

>Fix:

>Release-Note:

>Audit-Trail:
From: David Laight <david@l8s.co.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/49603: Single-stepping into syscall reboots -current/amd64 under qemu
Date: Mon, 26 Jan 2015 08:25:48 +0000

 On Sun, Jan 25, 2015 at 01:55:01PM +0000, Andreas Gustafsson wrote:
 > >Number:         49603
 > >Category:       kern
 > >Synopsis:       Single-stepping into syscall reboots -current/amd64 under qemu
 ...
 > When debugging a userland process using gdb under NetBSD-current/amd64
 > running in qemu virtual machine, single stepping into a system call
 > causes an instant reboot.  Root privileges are not required.
 > 
 > I ran an automated binary search to find when the problem first
 > appeard, and it pointed at src/sys/sys/ksyms.h 1.30, committed by
 > christos on CVS date 2014.12.14.18.14.15.

 That file looks unlikely.
 Sounds more like a qemu bug to do with faulting on the system
 call entry/exit instruction when 'single step' is enabled.

 > The reboot is consistently reproducible under qemu, but I have not
 > been able to reproduce it on physical hardware (tested on an AMD
 > Athlon64).  It also does not happen with the i386 port, only amd64.
 > 
 > >How-To-Repeat:
 > 
 > pkg_add py-anita
 > anita interact http://nyftp.netbsd.org/pub/NetBSD-daily/HEAD/201501250540Z/amd64/
 > (log in as root)
 > gdb /bin/sync
 > break sync
 > run
 > stepi
 > stepi
 > stepi

 Which instructions are being stepped over?
 Do you know if the syscall happens - ie is the
 error on the syscall entry, syscall exit or the
 following instruction.

 	David

 -- 
 David Laight: david@l8s.co.uk

From: Andreas Gustafsson <gson@gson.org>
To: David Laight <david@l8s.co.uk>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/49603: Single-stepping into syscall reboots -current/amd64 under qemu
Date: Mon, 26 Jan 2015 10:54:58 +0200

 David Laight wrote:
 >  That file looks unlikely.
 >  Sounds more like a qemu bug to do with faulting on the system
 >  call entry/exit instruction when 'single step' is enabled.

 I agree that it looks like a qemu bug, but I'm still wondering if
 Christos' commit might have caused some subtle change to the behvior
 of either gdb or the kernel to make it trigger the bug when it didn't
 before.

 >  Which instructions are being stepped over?

   (gdb) break sync
   Breakpoint 1 at 0x4007b0
   (gdb) run
   Starting program: /bin/sync 
   Breakpoint 1, 0x00007f7ff743c360 in sync () from /lib/libc.so.12
   (gdb) x/4i $pc
   => 0x7f7ff743c360 <sync>:       mov    $0x24,%eax
      0x7f7ff743c365 <sync+5>:     mov    %rcx,%r10
      0x7f7ff743c368 <sync+8>:     syscall 
      0x7f7ff743c36a <sync+10>:    retq   
   (gdb) stepi
   0x00007f7ff743c365 in sync () from /lib/libc.so.12
   (gdb) stepi
   0x00007f7ff743c368 in sync () from /lib/libc.so.12
   (gdb) stepi

   >> NetBSD/x86 BIOS Boot, Revision 5.10 (from NetBSD 7.99.4)
   >> Memory: 639/129920 k

 >  Do you know if the syscall happens - ie is the
 >  error on the syscall entry, syscall exit or the
 >  following instruction.

 On entry.  I just confirmed this by running "gdb /bin/cat" and setting
 a breakpoint in read().  When I executed the syscall instruction with
 "stepi", the VM rebooted immediately rather than waiting for a line of
 input from stdin first.
 --
 Andreas Gustafsson, gson@gson.org

From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/49603: Single-stepping into syscall reboots -current/amd64 under qemu
Date: Thu, 8 Oct 2015 12:11:20 +0300

 I have now re-run my automated test for this bug against a number
 additional NetBSD source dates, using qemu 2.4.0 from pkgsrc on a
 NetBSD 6.1.4 host. and I'm finding that the qemu guest will either
 reboot as reported in the original PR, or hang, depending on the
 source date being tested.

 My bisection script was only looking for a reboot, so my initial
 conclusion that stepping into a syscall was working before source date
 2014.12.14.18.14.15 was incorrect - the bisection runs for versions
 older than that date just happened to trigger a hang rather than a
 reboot.

 I also ran the test using qemu 1.1.2 on a Debian 7 host, and there it
 resulted in neither a reboot or lockup of the guest, but in qemu
 itself segfaulting.  That, at least, is definitely a qemu bug.

 Here is an updated recipe for reproducing the bug that doesn't require
 anita nor pkgsrc, and should work on any host that supports qemu (e.g.,
 Linux).  You will need a couple of gigabytes of free disk space for the
 uncompressed disk image.

    wget http://www.gson.org/bugs/qemu/NetBSD-amd64-2015.08.01.16.18.47-com0.img.gz
    gunzip NetBSD-amd64-2015.08.01.16.18.47-com0.img.gz
    qemu-system-x86_64 -nographic -snapshot NetBSD-amd64-2015.08.01.16.18.47-com0.img
    (wait for the qemu guest to boot to a login prompt)
    (log in as root; there is no password)
    gdb /bin/sync
    break sync
    run
    stepi
    stepi
    stepi
    (The qemu guest will either instantly reboot or hang, or qemu will segfault)
    (On real hardware, you just get another gdb prompt, and gdb is still responding)

 -- 
 Andreas Gustafsson, gson@gson.org

State-Changed-From-To: open->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Thu, 09 Mar 2017 19:06:04 +0000
State-Changed-Why:
I can no longer reproduce the bug with qemu 2.8.0nb3, which contains
the fix for PR 51934 back-ported from the qemu git mainline.
Presumably this PR and 51934 were caused by the same issue.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.