NetBSD Problem Report #7122
Received: (qmail 24369 invoked from network); 10 Mar 1999 17:31:27 -0000
Message-Id: <199903101731.RAA00381@shark1.cambridge.arm.com>
Date: Wed, 10 Mar 1999 17:31:13 GMT
From: Richard Earnshaw <rearnsha@cambridge.arm.com>
Reply-To: rearnsha@cambridge.arm.com
To: gnats-bugs@gnats.netbsd.org
Subject: Breakpoints lost under heavy swapping
X-Send-Pr-Version: 3.95
>Number: 7122
>Category: port-arm
>Synopsis: Breakpoints lost under heavy swapping
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: port-arm-maintainer
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Mar 10 09:35:00 +0000 1999
>Closed-Date: Wed Apr 09 15:46:50 +0000 2003
>Last-Modified: Wed Apr 09 15:46:50 +0000 2003
>Originator: Richard Earnshaw
>Release: NetBSD-current 19990305
>Organization:
ARM
--
>Environment:
System: NetBSD shark1 1.3K NetBSD 1.3K (SHARK) #15: Sat Mar 6 15:01:12 GMT 1999 rearnsha@shark1:/work/rearnsha/netbsd/sys/arch/arm32/compile/SHARK arm32
>Description:
I've been trying to track down a bug which is causing random
seg-faults on my shark when it is swapping heavily and noticed that
breakpoints set by gdb are not always being honoured. It would appear
that a page that has a breakpoint set looses this if the page gets
swapped out.
>How-To-Repeat:
Attach to a running program with gdb, set a breakpoint and continue
execution; exercise the system heavily so that the page containing
the breakpoint gets swapped out and then arrange for the process
under debug to execute the instruction containing the breakpoint.
Groan with frustration as the system continues to execute through
the breakpoint and promptly crashes before you have a chance to stop
it.
The specific example where I'm seeing this is /bin/sh when it is
blocked in wait4(); the breakpoint is on the instruction after the
SWI and the shell is waiting for a process that is causing a lot
of swapping. Eventually the sub-process terminates and the breakpoint
should be hit. At the time of termination, TOP is showing that the
process has a RES value of 0 and that it has been swapped
(name in angle-brackets).
>Fix:
The only work-around I have found so far is to modify the executable
under test to mlock the page where the breakpoint is needed into
physical memory. The breakpoint can then be set and the system is
unable to swap the page out.
>Release-Note:
>Audit-Trail:
From: Jason Thorpe <thorpej@nas.nasa.gov>
To: rearnsha@cambridge.arm.com
Cc: gnats-bugs@gnats.netbsd.org, tech-kern@netbsd.org, chuck@netbsd.org
Subject: Re: port-arm32/7122: Breakpoints lost under heavy swapping
Date: Wed, 10 Mar 1999 10:27:07 -0800
On Wed, 10 Mar 1999 17:31:13 GMT
Richard Earnshaw <rearnsha@cambridge.arm.com> wrote:
> >Synopsis: Breakpoints lost under heavy swapping
> >Description:
> I've been trying to track down a bug which is causing random
> seg-faults on my shark when it is swapping heavily and noticed that
> breakpoints set by gdb are not always being honoured. It would appear
> that a page that has a breakpoint set looses this if the page gets
> swapped out.
Hm.
A page of text is backed by a vnode, and thus, the vnode pager. So,
when a fault happens on a text page, the vnode pager pulls the page
from the backing vnode. Simple enough; that's how it's always worked
in NetBSD :-)
When you set a breakpoint, you're modifying the page. Since text pages
are read-only, we must map the page read-write, modify the page, and
set it back to read-only.
Since the text page may be shared, we must copy it before we modify it,
and turn it into an anonymous memory page, backed by swap. Make sure
it's marked dirty so that it's cleaned to swap so that faulting it back
in will work. This is just basic copy-on-write.
Here is my guess:
I suspect that somewhere along the line, in the new path that UVM takes
to do this sort of thing, the page is becoming reassociated with the
vnode from which it originally came, so that when the fault happens when
the process runs again, the text page is pulled from the vnode, thus
losing your breakpoint (and maybe `leaking' a page of swap, if the COW'd
text page was clened!)
Note that in UVM, we don't create a new VM object for the COW'd text
page (like we would have in Mach VM; COW was the main purpose of
object chains in Mach VM). Instead, an `anon' entry is created for
the page... if an `anon' exists for a page, it is used. If not, the
lookup just falls though to the original VM object.
Are we losing `anon' entries?
Chuck? Insights on this?
-- Jason R. Thorpe <thorpej@nas.nasa.gov>
From: Mark Brinicombe <mark@causality.com>
To: Jason Thorpe <thorpej@nas.nasa.gov>
Cc: rearnsha@cambridge.arm.com, gnats-bugs@gnats.netbsd.org,
tech-kern@netbsd.org, chuck@netbsd.org
Subject: Re: port-arm32/7122: Breakpoints lost under heavy swapping
Date: Wed, 10 Mar 1999 11:14:24 -0800 (PST)
Hi,
I have also started seeing problems with swapping but only recently.
paging in and out appears to be fine and it is only swapping in and out
that is actually the problem. Typically I have been seeing it with tcsh
during concurrent builds where either tcsh SEGVs or gives something like
error in wait -1 when it gets swapped back in.
My temporary workaround atm while I look into it from the MD end is to
define __SWAP_BROKEN. What problem does the mips have that it needs this ?
Cheers,
Mark
From: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
To: Jason Thorpe <thorpej@nas.nasa.gov>
Cc: rearnsha@cambridge.arm.com, gnats-bugs@gnats.netbsd.org,
tech-kern@netbsd.org, chuck@netbsd.org
Subject: Re: port-arm32/7122: Breakpoints lost under heavy swapping
Date: Wed, 10 Mar 1999 14:31:44 -0500
Hmm. Thinking about problems with anon pages on arm..
I have a Shark with local swap disk and with /tmp in an MFS, and I
found that "make build" was rather flaky, blowing up maybe every 10
minutes or so with spurious syntax errors, until I switched to using
cc -pipe for everything. I assume that the contents of the
intermediate preprocessor output files in /tmp were getting corrupted.
I should have reported this a while back.
I found it very easy to reproduce.. put /tmp in MFS, try to build a
kernel from scratch on a 32meg shark while trying to get other work
done.
Looking at what's in common between the lost breakpoints, and the mfs
problems, and assuming there's one bug underlying both problems.. MFS
stores the bits of the filesystem in the memory of the mount_mfs
process. I would guess that whatever bug is in there involves
something having to do with the kernel touching anonymous memory
regions out in userland..
- Bill
From: Richard Earnshaw <rearnsha@arm.com>
To: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
Cc: Jason Thorpe <thorpej@nas.nasa.gov>, gnats-bugs@gnats.netbsd.org,
tech-kern@netbsd.org, chuck@netbsd.org
Subject: Re: port-arm32/7122: Breakpoints lost under heavy swapping
Date: Wed, 10 Mar 1999 19:41:07 +0000
> Hmm. Thinking about problems with anon pages on arm..
>
> I have a Shark with local swap disk and with /tmp in an MFS, and I
> found that "make build" was rather flaky, blowing up maybe every 10
> minutes or so with spurious syntax errors, until I switched to using
> cc -pipe for everything. I assume that the contents of the
> intermediate preprocessor output files in /tmp were getting corrupted.
>
Hmmm... A while back I also tried an MFS /tmp, but found it unreliable in
a different manner. Basically, after a few hours some temporary files
whould appear not to have been deleted (though I think they were zero
size). Attempting to rm the files was a sure-fire way of making the
machine panic. It got so bad that I just disabled the MFS: I've been
intending to give it another go to see if it was fixed...
> I should have reported this a while back.
Ditto.
From: Ian Dall <Ian.Dall@dsto.defence.gov.au>
To: richard.earnshaw@arm.com
Cc: sommerfeld@orchard.arlington.ma.us, thorpej@nas.nasa.gov,
gnats-bugs@gnats.netbsd.org, tech-kern@netbsd.org, chuck@netbsd.org
Subject: Re: port-arm32/7122: Breakpoints lost under heavy swapping
Date: Thu, 11 Mar 1999 10:27:49 +1030 (CST)
Richard Earnshaw <rearnsha@arm.com> writes:
>> Hmm. Thinking about problems with anon pages on arm..
>>
>> I have a Shark with local swap disk and with /tmp in an MFS, and I
>> found that "make build" was rather flaky, blowing up maybe every 10
>> minutes or so with spurious syntax errors, until I switched to using
>> cc -pipe for everything. I assume that the contents of the
>> intermediate preprocessor output files in /tmp were getting corrupted.
>>
> Hmmm... A while back I also tried an MFS /tmp, but found it unreliable in
> a different manner. Basically, after a few hours some temporary files
> whould appear not to have been deleted (though I think they were zero
> size). Attempting to rm the files was a sure-fire way of making the
> machine panic. It got so bad that I just disabled the MFS: I've been
> intending to give it another go to see if it was fixed...
>> I should have reported this a while back.
> Ditto.
Hmm. I have been using an MFS /tmp for a long time now and have never
seen any problem. At one point at least, gcc was configured to use /usr/tmp
rather than /tmp so are you sure you are actually using your MFS anyway?
(I have TMPDIR set to /tmp).
Ian
From: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
To: Ian Dall <Ian.Dall@dsto.defence.gov.au>
Cc: richard.earnshaw@arm.com, thorpej@nas.nasa.gov,
gnats-bugs@gnats.netbsd.org, tech-kern@netbsd.org, chuck@netbsd.org
Subject: Re: port-arm32/7122: Breakpoints lost under heavy swapping
Date: Wed, 10 Mar 1999 19:01:43 -0500
> Hmm. I have been using an MFS /tmp for a long time now and have never
> seen any problem.
I've used it for a long time on x86 without any problems. it appears
to be flaky for me on arm32.
> Are you sure you are actually using your MFS anyway?
> (I have TMPDIR set to /tmp).
Yes.
I have TMPDIR set to /tmp as well and gcc -v clearly shows it putting
intermediate files in /tmp.
From: Ian Dall <Ian.Dall@dsto.defence.gov.au>
To: sommerfeld@orchard.arlington.ma.us
Cc: Ian.Dall@dsto.defence.gov.au, richard.earnshaw@arm.com,
thorpej@nas.nasa.gov, gnats-bugs@gnats.netbsd.org,
tech-kern@netbsd.org, chuck@netbsd.org
Subject: Re: port-arm32/7122: Breakpoints lost under heavy swapping
Date: Fri, 12 Mar 1999 10:46:03 +1030 (CST)
Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us> writes:
>> Hmm. I have been using an MFS /tmp for a long time now and have never
>> seen any problem.
> I've used it for a long time on x86 without any problems. it appears
> to be flaky for me on arm32.
Ah, Since this was being reported on tech-kern, I had assumed that
there was a claim that this was a MI problem. Looks like it must be
arm32 specific.
Ian
From: Chuck Cranor <chuck@research.att.com>
To: Jason Thorpe <thorpej@nas.nasa.gov>, rearnsha@cambridge.arm.com
Cc: gnats-bugs@gnats.netbsd.org, tech-kern@netbsd.org
Subject: Re: port-arm32/7122: Breakpoints lost under heavy swapping
Date: Sun, 21 Mar 1999 20:32:57 -0500
On Wed, Mar 10, 1999 at 10:27:07AM -0800, Jason Thorpe wrote:
> When you set a breakpoint, you're modifying the page. Since text pages
> are read-only, we must map the page read-write, modify the page, and
> set it back to read-only.
> Since the text page may be shared, we must copy it before we modify it,
> and turn it into an anonymous memory page, backed by swap. Make sure
> it's marked dirty so that it's cleaned to swap so that faulting it back
> in will work. This is just basic copy-on-write.
yes, that's the way it works. program text is mapped read-only/copy-on-write.
the ptrace code has to change the protection (temporary) to
read-write/copy-on-write in order to write the debug info. then it
changes it back.
> Here is my guess:
> I suspect that somewhere along the line, in the new path that UVM takes
> to do this sort of thing, the page is becoming reassociated with the
> vnode from which it originally came, so that when the fault happens when
> the process runs again, the text page is pulled from the vnode, thus
> losing your breakpoint (and maybe `leaking' a page of swap, if the COW'd
> text page was clened!)
> Are we losing `anon' entries?
> Chuck? Insights on this?
the only way to know for sure is to check. the thing to do when it fails
is to look at the map and see if there is an anon mapped in at the breakpoint
address or not. if there is, then i'd check the page for modified data.
given that the arm32 has seperate I/D VAC caches it is possible that the
data could have gotten lost in the cache handling code. if it is not there,
then it is possible that we are losing anon entries.
ddb "show map/f" should help some, but i don't think there is a call
to examine an amap (?).
chuck
State-Changed-From-To: open->feedback
State-Changed-By: mycroft
State-Changed-When: Tue Mar 23 10:46:57 PST 1999
State-Changed-Why:
Believe to be fixed as of today.
From: Richard Earnshaw <rearnsha@arm.com>
To: mycroft@netbsd.org
Cc: port-arm32-maintainer@netbsd.org
Subject: Re: port-arm32/7122
Date: Wed, 24 Mar 1999 19:53:29 +0000
> Synopsis: Breakpoints lost under heavy swapping
>
> State-Changed-From-To: open->feedback
> State-Changed-By: mycroft
> State-Changed-When: Tue Mar 23 10:46:57 PST 1999
> State-Changed-Why:
> Believe to be fixed as of today.
This doesn't seem to have been fixed either.
State-Changed-From-To: feedback->open
State-Changed-By: mycroft
State-Changed-When: Thu May 6 05:06:59 PDT 1999
State-Changed-Why:
Not fixed.
State-Changed-From-To: open->closed
State-Changed-By: sommerfeld
State-Changed-When: Tue May 18 12:38:41 PDT 1999
State-Changed-Why:
this was fixed by Charles around the time of the 1.4 branch cut and the fix is
in 1.4...
State-Changed-From-To: closed->open
State-Changed-By: sommerfeld
State-Changed-When: Thu May 20 20:46:36 PDT 1999
State-Changed-Why:
closed mistakenly.
Responsible-Changed-From-To: port-arm32-maintainer->port-arm-maintainer
Responsible-Changed-By: bjh21
Responsible-Changed-When: Mon Aug 5 17:46:52 PDT 2002
Responsible-Changed-Why:
The arm32 port has disappeared.
State-Changed-From-To: open->closed
State-Changed-By: rearnsha
State-Changed-When: Wed Apr 9 08:46:19 PDT 2003
State-Changed-Why:
Closing my own PR. I suspect this was one of the many pmap-related problems
that will most likely have been fixed by the substantial restructuring of
the arm32 pmap code for 1.6. I'll file another report if I ever come across
it again.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.