NetBSD Problem Report #49603
From gson@gson.org Sun Jan 25 13:51:33 2015
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id 7270BA654B
for <gnats-bugs@gnats.NetBSD.org>; Sun, 25 Jan 2015 13:51:33 +0000 (UTC)
Message-Id: <20150125135123.D2AD274419F@guava.gson.org>
Date: Sun, 25 Jan 2015 15:51:23 +0200 (EET)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@gnats.NetBSD.org
Subject: Single-stepping into syscall reboots -current/amd64 under qemu
X-Send-Pr-Version: 3.95
>Number: 49603
>Category: kern
>Synopsis: Single-stepping into syscall reboots -current/amd64 under qemu
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Jan 25 13:55:01 +0000 2015
>Closed-Date: Thu Mar 09 19:06:04 +0000 2017
>Last-Modified: Thu Mar 09 19:06:04 +0000 2017
>Originator: Andreas Gustafsson
>Release: NetBSD-current, source date >= 2014.12.14.18.14.15
>Organization:
>Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
>Description:
When debugging a userland process using gdb under NetBSD-current/amd64
running in qemu virtual machine, single stepping into a system call
causes an instant reboot. Root privileges are not required.
I ran an automated binary search to find when the problem first
appeard, and it pointed at src/sys/sys/ksyms.h 1.30, committed by
christos on CVS date 2014.12.14.18.14.15.
The reboot is consistently reproducible under qemu, but I have not
been able to reproduce it on physical hardware (tested on an AMD
Athlon64). It also does not happen with the i386 port, only amd64.
>How-To-Repeat:
pkg_add py-anita
anita interact http://nyftp.netbsd.org/pub/NetBSD-daily/HEAD/201501250540Z/amd64/
(log in as root)
gdb /bin/sync
break sync
run
stepi
stepi
stepi
>Fix:
>Release-Note:
>Audit-Trail:
From: David Laight <david@l8s.co.uk>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/49603: Single-stepping into syscall reboots -current/amd64 under qemu
Date: Mon, 26 Jan 2015 08:25:48 +0000
On Sun, Jan 25, 2015 at 01:55:01PM +0000, Andreas Gustafsson wrote:
> >Number: 49603
> >Category: kern
> >Synopsis: Single-stepping into syscall reboots -current/amd64 under qemu
...
> When debugging a userland process using gdb under NetBSD-current/amd64
> running in qemu virtual machine, single stepping into a system call
> causes an instant reboot. Root privileges are not required.
>
> I ran an automated binary search to find when the problem first
> appeard, and it pointed at src/sys/sys/ksyms.h 1.30, committed by
> christos on CVS date 2014.12.14.18.14.15.
That file looks unlikely.
Sounds more like a qemu bug to do with faulting on the system
call entry/exit instruction when 'single step' is enabled.
> The reboot is consistently reproducible under qemu, but I have not
> been able to reproduce it on physical hardware (tested on an AMD
> Athlon64). It also does not happen with the i386 port, only amd64.
>
> >How-To-Repeat:
>
> pkg_add py-anita
> anita interact http://nyftp.netbsd.org/pub/NetBSD-daily/HEAD/201501250540Z/amd64/
> (log in as root)
> gdb /bin/sync
> break sync
> run
> stepi
> stepi
> stepi
Which instructions are being stepped over?
Do you know if the syscall happens - ie is the
error on the syscall entry, syscall exit or the
following instruction.
David
--
David Laight: david@l8s.co.uk
From: Andreas Gustafsson <gson@gson.org>
To: David Laight <david@l8s.co.uk>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/49603: Single-stepping into syscall reboots -current/amd64 under qemu
Date: Mon, 26 Jan 2015 10:54:58 +0200
David Laight wrote:
> That file looks unlikely.
> Sounds more like a qemu bug to do with faulting on the system
> call entry/exit instruction when 'single step' is enabled.
I agree that it looks like a qemu bug, but I'm still wondering if
Christos' commit might have caused some subtle change to the behvior
of either gdb or the kernel to make it trigger the bug when it didn't
before.
> Which instructions are being stepped over?
(gdb) break sync
Breakpoint 1 at 0x4007b0
(gdb) run
Starting program: /bin/sync
Breakpoint 1, 0x00007f7ff743c360 in sync () from /lib/libc.so.12
(gdb) x/4i $pc
=> 0x7f7ff743c360 <sync>: mov $0x24,%eax
0x7f7ff743c365 <sync+5>: mov %rcx,%r10
0x7f7ff743c368 <sync+8>: syscall
0x7f7ff743c36a <sync+10>: retq
(gdb) stepi
0x00007f7ff743c365 in sync () from /lib/libc.so.12
(gdb) stepi
0x00007f7ff743c368 in sync () from /lib/libc.so.12
(gdb) stepi
>> NetBSD/x86 BIOS Boot, Revision 5.10 (from NetBSD 7.99.4)
>> Memory: 639/129920 k
> Do you know if the syscall happens - ie is the
> error on the syscall entry, syscall exit or the
> following instruction.
On entry. I just confirmed this by running "gdb /bin/cat" and setting
a breakpoint in read(). When I executed the syscall instruction with
"stepi", the VM rebooted immediately rather than waiting for a line of
input from stdin first.
--
Andreas Gustafsson, gson@gson.org
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/49603: Single-stepping into syscall reboots -current/amd64 under qemu
Date: Thu, 8 Oct 2015 12:11:20 +0300
I have now re-run my automated test for this bug against a number
additional NetBSD source dates, using qemu 2.4.0 from pkgsrc on a
NetBSD 6.1.4 host. and I'm finding that the qemu guest will either
reboot as reported in the original PR, or hang, depending on the
source date being tested.
My bisection script was only looking for a reboot, so my initial
conclusion that stepping into a syscall was working before source date
2014.12.14.18.14.15 was incorrect - the bisection runs for versions
older than that date just happened to trigger a hang rather than a
reboot.
I also ran the test using qemu 1.1.2 on a Debian 7 host, and there it
resulted in neither a reboot or lockup of the guest, but in qemu
itself segfaulting. That, at least, is definitely a qemu bug.
Here is an updated recipe for reproducing the bug that doesn't require
anita nor pkgsrc, and should work on any host that supports qemu (e.g.,
Linux). You will need a couple of gigabytes of free disk space for the
uncompressed disk image.
wget http://www.gson.org/bugs/qemu/NetBSD-amd64-2015.08.01.16.18.47-com0.img.gz
gunzip NetBSD-amd64-2015.08.01.16.18.47-com0.img.gz
qemu-system-x86_64 -nographic -snapshot NetBSD-amd64-2015.08.01.16.18.47-com0.img
(wait for the qemu guest to boot to a login prompt)
(log in as root; there is no password)
gdb /bin/sync
break sync
run
stepi
stepi
stepi
(The qemu guest will either instantly reboot or hang, or qemu will segfault)
(On real hardware, you just get another gdb prompt, and gdb is still responding)
--
Andreas Gustafsson, gson@gson.org
State-Changed-From-To: open->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Thu, 09 Mar 2017 19:06:04 +0000
State-Changed-Why:
I can no longer reproduce the bug with qemu 2.8.0nb3, which contains
the fix for PR 51934 back-ported from the qemu git mainline.
Presumably this PR and 51934 were caused by the same issue.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.