NetBSD Problem Report #7122

Received: (qmail 24369 invoked from network); 10 Mar 1999 17:31:27 -0000
Message-Id: <199903101731.RAA00381@shark1.cambridge.arm.com>
Date: Wed, 10 Mar 1999 17:31:13 GMT
From: Richard Earnshaw <rearnsha@cambridge.arm.com>
Reply-To: rearnsha@cambridge.arm.com
To: gnats-bugs@gnats.netbsd.org
Subject: Breakpoints lost under heavy swapping
X-Send-Pr-Version: 3.95

>Number:         7122
>Category:       port-arm
>Synopsis:       Breakpoints lost under heavy swapping
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    port-arm-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Mar 10 09:35:00 +0000 1999
>Closed-Date:    Wed Apr 09 15:46:50 +0000 2003
>Last-Modified:  Wed Apr 09 15:46:50 +0000 2003
>Originator:     Richard Earnshaw
>Release:        NetBSD-current 19990305
>Organization:
ARM
-- 
>Environment:

System: NetBSD shark1 1.3K NetBSD 1.3K (SHARK) #15: Sat Mar 6 15:01:12 GMT 1999 rearnsha@shark1:/work/rearnsha/netbsd/sys/arch/arm32/compile/SHARK arm32


>Description:
	I've been trying to track down a bug which is causing random 
	seg-faults on my shark when it is swapping heavily and noticed that
	breakpoints set by gdb are not always being honoured.  It would appear
	that a page that has a breakpoint set looses this if the page gets
	swapped out.

>How-To-Repeat:
	Attach to a running program with gdb, set a breakpoint and continue
	execution; exercise the system heavily so that the page containing 
	the breakpoint gets swapped out and then arrange for the process 
	under debug to execute the instruction containing the breakpoint.

	Groan with frustration as the system continues to execute through
	the breakpoint and promptly crashes before you have a chance to stop
	it.

	The specific example where I'm seeing this is /bin/sh when it is
	blocked in wait4(); the breakpoint is on the instruction after the
	SWI and the shell is waiting for a process that is causing a lot
	of swapping.  Eventually the sub-process terminates and the breakpoint
	should be hit.  At the time of termination, TOP is showing that the
	process has a RES value of 0 and that it has been swapped
	(name in angle-brackets).

>Fix:
	The only work-around I have found so far is to modify the executable
	under test to mlock the page where the breakpoint is needed into 
	physical memory.  The breakpoint can then be set and the system is
	unable to swap the page out.

>Release-Note:
>Audit-Trail:

From: Jason Thorpe <thorpej@nas.nasa.gov>
To: rearnsha@cambridge.arm.com
Cc: gnats-bugs@gnats.netbsd.org, tech-kern@netbsd.org, chuck@netbsd.org
Subject: Re: port-arm32/7122: Breakpoints lost under heavy swapping 
Date: Wed, 10 Mar 1999 10:27:07 -0800

 On Wed, 10 Mar 1999 17:31:13 GMT 
  Richard Earnshaw <rearnsha@cambridge.arm.com> wrote:

  > >Synopsis:       Breakpoints lost under heavy swapping

  > >Description:
  > 	I've been trying to track down a bug which is causing random 
  > 	seg-faults on my shark when it is swapping heavily and noticed that
  > 	breakpoints set by gdb are not always being honoured.  It would appear
  > 	that a page that has a breakpoint set looses this if the page gets
  > 	swapped out.

 Hm.

 A page of text is backed by a vnode, and thus, the vnode pager.  So,
 when a fault happens on a text page, the vnode pager pulls the page
 from the backing vnode.  Simple enough; that's how it's always worked
 in NetBSD :-)

 When you set a breakpoint, you're modifying the page.  Since text pages
 are read-only, we must map the page read-write, modify the page, and
 set it back to read-only.

 Since the text page may be shared, we must copy it before we modify it,
 and turn it into an anonymous memory page, backed by swap.  Make sure
 it's marked dirty so that it's cleaned to swap so that faulting it back
 in will work.  This is just basic copy-on-write.

 Here is my guess:

 I suspect that somewhere along the line, in the new path that UVM takes
 to do this sort of thing, the page is becoming reassociated with the
 vnode from which it originally came, so that when the fault happens when
 the process runs again, the text page is pulled from the vnode, thus
 losing your breakpoint (and maybe `leaking' a page of swap, if the COW'd
 text page was clened!)

 Note that in UVM, we don't create a new VM object for the COW'd text
 page (like we would have in Mach VM; COW was the main purpose of
 object chains in Mach VM).  Instead, an `anon' entry is created for
 the page... if an `anon' exists for a page, it is used.  If not, the
 lookup just falls though to the original VM object.

 Are we losing `anon' entries?

 Chuck?  Insights on this?

         -- Jason R. Thorpe <thorpej@nas.nasa.gov>


From: Mark Brinicombe <mark@causality.com>
To: Jason Thorpe <thorpej@nas.nasa.gov>
Cc: rearnsha@cambridge.arm.com, gnats-bugs@gnats.netbsd.org,
        tech-kern@netbsd.org, chuck@netbsd.org
Subject: Re: port-arm32/7122: Breakpoints lost under heavy swapping 
Date: Wed, 10 Mar 1999 11:14:24 -0800 (PST)

 Hi,
   I have also started seeing problems with swapping but only recently.
 paging in and out appears to be fine and it is only swapping in and out
 that is actually the problem. Typically I have been seeing it with tcsh
 during concurrent builds where either tcsh SEGVs or gives something like
 error in wait -1 when it gets swapped back in.

 My temporary workaround atm while I look into it from the MD end is to
 define __SWAP_BROKEN. What problem does the mips have that it needs this ?

 Cheers,
 				Mark



From: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
To: Jason Thorpe <thorpej@nas.nasa.gov>
Cc: rearnsha@cambridge.arm.com, gnats-bugs@gnats.netbsd.org,
        tech-kern@netbsd.org, chuck@netbsd.org
Subject: Re: port-arm32/7122: Breakpoints lost under heavy swapping 
Date: Wed, 10 Mar 1999 14:31:44 -0500

 Hmm.  Thinking about problems with anon pages on arm..

 I have a Shark with local swap disk and with /tmp in an MFS, and I
 found that "make build" was rather flaky, blowing up maybe every 10
 minutes or so with spurious syntax errors, until I switched to using
 cc -pipe for everything.  I assume that the contents of the
 intermediate preprocessor output files in /tmp were getting corrupted.

 I should have reported this a while back. 

 I found it very easy to reproduce.. put /tmp in MFS, try to build a
 kernel from scratch on a 32meg shark while trying to get other work
 done.

 Looking at what's in common between the lost breakpoints, and the mfs
 problems, and assuming there's one bug underlying both problems.. MFS
 stores the bits of the filesystem in the memory of the mount_mfs
 process. I would guess that whatever bug is in there involves
 something having to do with the kernel touching anonymous memory
 regions out in userland..

 				- Bill

From: Richard Earnshaw <rearnsha@arm.com>
To: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
Cc: Jason Thorpe <thorpej@nas.nasa.gov>, gnats-bugs@gnats.netbsd.org,
        tech-kern@netbsd.org, chuck@netbsd.org
Subject: Re: port-arm32/7122: Breakpoints lost under heavy swapping 
Date: Wed, 10 Mar 1999 19:41:07 +0000

 > Hmm.  Thinking about problems with anon pages on arm..
 > 
 > I have a Shark with local swap disk and with /tmp in an MFS, and I
 > found that "make build" was rather flaky, blowing up maybe every 10
 > minutes or so with spurious syntax errors, until I switched to using
 > cc -pipe for everything.  I assume that the contents of the
 > intermediate preprocessor output files in /tmp were getting corrupted.
 > 
 Hmmm...  A while back I also tried an MFS /tmp, but found it unreliable in 
 a different manner.  Basically, after a few hours some temporary files 
 whould appear not to have been deleted (though I think they were zero 
 size).  Attempting to rm the files was a sure-fire way of making the 
 machine panic.  It got so bad that I just disabled the MFS: I've been 
 intending to give it another go to see if it was fixed...

 > I should have reported this a while back. 

 Ditto.


From: Ian Dall <Ian.Dall@dsto.defence.gov.au>
To: richard.earnshaw@arm.com
Cc: sommerfeld@orchard.arlington.ma.us, thorpej@nas.nasa.gov,
        gnats-bugs@gnats.netbsd.org, tech-kern@netbsd.org, chuck@netbsd.org
Subject: Re: port-arm32/7122: Breakpoints lost under heavy swapping 
Date: Thu, 11 Mar 1999 10:27:49 +1030 (CST)

 Richard Earnshaw <rearnsha@arm.com> writes:

   >> Hmm.  Thinking about problems with anon pages on arm..
   >> 
   >> I have a Shark with local swap disk and with /tmp in an MFS, and I
   >> found that "make build" was rather flaky, blowing up maybe every 10
   >> minutes or so with spurious syntax errors, until I switched to using
   >> cc -pipe for everything.  I assume that the contents of the
   >> intermediate preprocessor output files in /tmp were getting corrupted.
   >> 
   > Hmmm...  A while back I also tried an MFS /tmp, but found it unreliable in 
   > a different manner.  Basically, after a few hours some temporary files 
   > whould appear not to have been deleted (though I think they were zero 
   > size).  Attempting to rm the files was a sure-fire way of making the 
   > machine panic.  It got so bad that I just disabled the MFS: I've been 
   > intending to give it another go to see if it was fixed...

   >> I should have reported this a while back. 

   > Ditto.

 Hmm. I have been using an MFS /tmp for a long time now and have never
 seen any problem. At one point at least, gcc was configured to use /usr/tmp
 rather than /tmp so are you sure you are actually using your MFS anyway?
 (I have TMPDIR set to /tmp).

 Ian


From: Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us>
To: Ian Dall <Ian.Dall@dsto.defence.gov.au>
Cc: richard.earnshaw@arm.com, thorpej@nas.nasa.gov,
        gnats-bugs@gnats.netbsd.org, tech-kern@netbsd.org, chuck@netbsd.org
Subject: Re: port-arm32/7122: Breakpoints lost under heavy swapping 
Date: Wed, 10 Mar 1999 19:01:43 -0500

 > Hmm. I have been using an MFS /tmp for a long time now and have never
 > seen any problem. 

 I've used it for a long time on x86 without any problems.  it appears
 to be flaky for me on arm32.

 > Are you sure you are actually using your MFS anyway?
 > (I have TMPDIR set to /tmp).

 Yes.

 I have TMPDIR set to /tmp as well and gcc -v clearly shows it putting
 intermediate files in /tmp.

From: Ian Dall <Ian.Dall@dsto.defence.gov.au>
To: sommerfeld@orchard.arlington.ma.us
Cc: Ian.Dall@dsto.defence.gov.au, richard.earnshaw@arm.com,
        thorpej@nas.nasa.gov, gnats-bugs@gnats.netbsd.org,
        tech-kern@netbsd.org, chuck@netbsd.org
Subject: Re: port-arm32/7122: Breakpoints lost under heavy swapping 
Date: Fri, 12 Mar 1999 10:46:03 +1030 (CST)

 Bill Sommerfeld <sommerfeld@orchard.arlington.ma.us> writes:

   >> Hmm. I have been using an MFS /tmp for a long time now and have never
   >> seen any problem. 

   > I've used it for a long time on x86 without any problems.  it appears
   > to be flaky for me on arm32.

 Ah, Since this was being reported on tech-kern, I had assumed that
 there was a claim that this was a MI problem. Looks like it must be
 arm32 specific.

 Ian

From: Chuck Cranor <chuck@research.att.com>
To: Jason Thorpe <thorpej@nas.nasa.gov>, rearnsha@cambridge.arm.com
Cc: gnats-bugs@gnats.netbsd.org, tech-kern@netbsd.org
Subject: Re: port-arm32/7122: Breakpoints lost under heavy swapping
Date: Sun, 21 Mar 1999 20:32:57 -0500

 On Wed, Mar 10, 1999 at 10:27:07AM -0800, Jason Thorpe wrote:
 > When you set a breakpoint, you're modifying the page.  Since text pages
 > are read-only, we must map the page read-write, modify the page, and
 > set it back to read-only.
 > Since the text page may be shared, we must copy it before we modify it,
 > and turn it into an anonymous memory page, backed by swap.  Make sure
 > it's marked dirty so that it's cleaned to swap so that faulting it back
 > in will work.  This is just basic copy-on-write.

 yes, that's the way it works.  program text is mapped read-only/copy-on-write.
 the ptrace code has to change the protection (temporary) to 
 read-write/copy-on-write in order to write the debug info.  then it
 changes it back.


 > Here is my guess:
 > I suspect that somewhere along the line, in the new path that UVM takes
 > to do this sort of thing, the page is becoming reassociated with the
 > vnode from which it originally came, so that when the fault happens when
 > the process runs again, the text page is pulled from the vnode, thus
 > losing your breakpoint (and maybe `leaking' a page of swap, if the COW'd
 > text page was clened!)
 > Are we losing `anon' entries?
 > Chuck?  Insights on this?

 the only way to know for sure is to check.   the thing to do when it fails
 is to look at the map and see if there is an anon mapped in at the breakpoint
 address or not.  if there is, then i'd check the page for modified data.
 given that the arm32 has seperate I/D VAC caches it is possible that the 
 data could have gotten lost in the cache handling code.   if it is not there,
 then it is possible that we are losing anon entries.

 ddb "show map/f" should help some, but i don't think there is a call
 to examine an amap (?).


 chuck

State-Changed-From-To: open->feedback 
State-Changed-By: mycroft 
State-Changed-When: Tue Mar 23 10:46:57 PST 1999 
State-Changed-Why:  
Believe to be fixed as of today. 

From: Richard Earnshaw <rearnsha@arm.com>
To: mycroft@netbsd.org
Cc: port-arm32-maintainer@netbsd.org
Subject: Re: port-arm32/7122 
Date: Wed, 24 Mar 1999 19:53:29 +0000

 > Synopsis: Breakpoints lost under heavy swapping
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: mycroft
 > State-Changed-When: Tue Mar 23 10:46:57 PST 1999
 > State-Changed-Why: 
 > Believe to be fixed as of today.

 This doesn't seem to have been fixed either.

State-Changed-From-To: feedback->open 
State-Changed-By: mycroft 
State-Changed-When: Thu May 6 05:06:59 PDT 1999 
State-Changed-Why:  
Not fixed. 
State-Changed-From-To: open->closed 
State-Changed-By: sommerfeld 
State-Changed-When: Tue May 18 12:38:41 PDT 1999 
State-Changed-Why:  
this was fixed by Charles around the time of the 1.4 branch cut and the fix is  
in 1.4... 
State-Changed-From-To: closed->open 
State-Changed-By: sommerfeld 
State-Changed-When: Thu May 20 20:46:36 PDT 1999 
State-Changed-Why:  
closed mistakenly. 
Responsible-Changed-From-To: port-arm32-maintainer->port-arm-maintainer 
Responsible-Changed-By: bjh21 
Responsible-Changed-When: Mon Aug 5 17:46:52 PDT 2002 
Responsible-Changed-Why:  
The arm32 port has disappeared. 
State-Changed-From-To: open->closed 
State-Changed-By: rearnsha 
State-Changed-When: Wed Apr 9 08:46:19 PDT 2003 
State-Changed-Why:  
Closing my own PR.  I suspect this was one of the many pmap-related problems 
that will most likely have been fixed by the substantial restructuring of 
the arm32 pmap code for 1.6.  I'll file another report if I ever come across 
it again. 
>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.