NetBSD Problem Report #44402

From www@NetBSD.org  Sun Jan 16 18:29:43 2011
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 3405263B966
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 16 Jan 2011 18:29:43 +0000 (UTC)
Message-Id: <20110116182942.02E1263B883@www.NetBSD.org>
Date: Sun, 16 Jan 2011 18:29:41 +0000 (UTC)
From: kristerw@netbsd.org
Reply-To: kristerw@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: page fault in supervisor mode in netbsd:filt_piperead
X-Send-Pr-Version: www-1.0

>Number:         44402
>Category:       kern
>Synopsis:       kernel crash: page fault in supervisor mode in netbsd:filt_piperead
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Jan 16 18:30:01 +0000 2011
>Last-Modified:  Sun Mar 18 05:05:50 +0000 2012
>Originator:     Krister Walfridsson
>Release:        current as of Jan 15
>Organization:
>Environment:
NetBSD scratch 5.99.43 NetBSD 5.99.43 (GENERIC) #0: Sat Jan 15 02:02:22 CET 2011  cato@pc5.kwa:/usr/local/tmp/nbsd110114/obj.i386/sys/arch/i386/compile/GENERIC i386
>Description:
Running the current ghc testsuite crash my kernel with

uvm_fault(0xce3c6774, 0, 1) -> 0xe
fatal page fault in supervisor mode
trap type 6 code 0 eip c0638803 cs 8 eflags 10296 cr2 0 ilevel 0
kernel: supervisor trap page fault, code=0
Stopped in pid 924.2 (ghc-stage2) at   netbsd:filt_piperead+0xf3: movl 0 (%ebx),%eax

This is on a rather new PC, so it does not have any serial port, and the USB keyboard does not work at the DDB prompt, so it is a bit problematic to get more information...
>How-To-Repeat:
I can reproduce the error within a minute by running the following script using a -current ghc:

#!/bin/sh
while :
do
        echo -n .
        rm -f cgrun037.comp.stderr
        /usr/local/tmp/ghc20110114/ghc/inplace/bin/ghc-stage2 -fforce-recomp -dcore-lint -dcmm-lint -dno-debug-output -no-user-package-conf -rtsopts  -o cgrun037 cgrun037.hs >cgrun037.comp.stderr 2>&1
done

>Fix:

>Release-Note:

>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/44402: page fault in supervisor mode in netbsd:filt_piperead
Date: Mon, 17 Jan 2011 05:56:33 +0000

 On Sun, Jan 16, 2011 at 06:30:01PM +0000, kristerw@netbsd.org wrote:
  > Running the current ghc testsuite crash my kernel with
  > 
  > uvm_fault(0xce3c6774, 0, 1) -> 0xe
  > fatal page fault in supervisor mode
  > trap type 6 code 0 eip c0638803 cs 8 eflags 10296 cr2 0 ilevel 0
  > kernel: supervisor trap page fault, code=0
  > Stopped in pid 924.2 (ghc-stage2) at   netbsd:filt_piperead+0xf3: movl 0 (%ebx),%eax

 Can you use gdb or objdump or whatever to figure out which line of
 filt_piperead this is?

 also, try setting ddb.commandonenter to "trace".

 -- 
 David A. Holland
 dholland@netbsd.org

From: Krister Walfridsson <krister.walfridsson@gmail.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/44402: page fault in supervisor mode in netbsd:filt_piperead
Date: Mon, 17 Jan 2011 23:11:56 +0100 (CET)

 The faulting address netbsd:filt_piperead+0xf3 correspond to 
 kern/sys_pipe.c:1404 which is the line "mutex_exit(rpipe->pipe_lock);" in

          if ((hint & NOTE_SUBMIT) == 0) {
                  mutex_exit(rpipe->pipe_lock);
          }


 Setting ddb.commandonenter to "trace" just loops the same info:

 filt_piperead(cef5a600,0,cef59cac,c04854b6,cef59c88,cef5a9b4,2,2,0,2) at 
 netbsd:filt_piperead+0xf3
 uvm_fault(0xceb59ba0, 0, 1) -> 0xe
 fatal page fault in supervisor mode
 trap type 6 code 0 eip c024e2a0 cs 8 eflags 10246 cr2 249 ilevel 8
 kernel: supervisor trap page fault, code=0
 Faulted in DDB; continuing...
 filt_piperead(cef5a600,0,cef59cac,c04854b6,cef59c88,cef5a9b4,2,2,0,2) at 
 netbsd:filt_piperead+0xf3
 uvm_fault(0xceb59ba0, 0, 1) -> 0xe
 fatal page fault in supervisor mode
 trap type 6 code 0 eip c024e2a0 cs 8 eflags 10246 cr2 249 ilevel 8
 kernel: supervisor trap page fault, code=0
 Faulted in DDB; continuing...
 [...]

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/44402: page fault in supervisor mode in netbsd:filt_piperead
Date: Mon, 31 Jan 2011 05:21:02 +0000

 On Mon, Jan 17, 2011 at 10:15:06PM +0000, Krister Walfridsson wrote:
  >  The faulting address netbsd:filt_piperead+0xf3 correspond to 
  >  kern/sys_pipe.c:1404 which is the line "mutex_exit(rpipe->pipe_lock);" in
  >  
  >           if ((hint & NOTE_SUBMIT) == 0) {
  >                   mutex_exit(rpipe->pipe_lock);
  >           }

 That doesn't make any sense... :(

  >  Setting ddb.commandonenter to "trace" just loops the same info:
  >  
  >  filt_piperead(cef5a600,0,cef59cac,c04854b6,cef59c88,cef5a9b4,2,2,0,2) at 
  >  netbsd:filt_piperead+0xf3
  >  uvm_fault(0xceb59ba0, 0, 1) -> 0xe
  >  fatal page fault in supervisor mode
  >  trap type 6 code 0 eip c024e2a0 cs 8 eflags 10246 cr2 249 ilevel 8
  >  kernel: supervisor trap page fault, code=0
  >  Faulted in DDB; continuing...

 Blah. I guess the stack's been garbaged and it's dying on rpipe
 because rpipe is a local... 

 How repeatable is it?

 -- 
 David A. Holland
 dholland@netbsd.org

From: Krister Walfridsson <kristerw@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/44402: page fault in supervisor mode in netbsd:filt_piperead
Date: Mon, 31 Jan 2011 22:14:25 +0100

 > =A0Blah. I guess the stack's been garbaged and it's dying on rpipe
 > =A0because rpipe is a local...
 >
 > =A0How repeatable is it?

 Running the script

   #!/bin/sh
   while :
   do
         echo -n .
         rm -f cgrun037.comp.stderr
         /usr/local/tmp/ghc20110114/ghc/inplace/bin/ghc-stage2
 -fforce-recomp -dcore-lint -dcmm-lint -dno-debug-output
 -no-user-package-conf -rtsopts  -o cgrun037 cgrun037.hs
 >cgrun037.comp.stderr 2>&1
   done

 typically fails before it loops 100 times.

    /Krister

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, kristerw@netbsd.org
Subject: Re: kern/44402: page fault in supervisor mode in netbsd:filt_piperead
Date: Sat, 12 Feb 2011 20:30:56 +0000

 On Mon, Jan 31, 2011 at 09:15:06PM +0000, Krister Walfridsson wrote:
  >> How repeatable is it?
  >  
  > [...]
  >  typically fails before it loops 100 times.

 Hrm. so it's not deterministic...

 -- 
 David A. Holland
 dholland@netbsd.org

From: Krister Walfridsson <kristerw@netbsd.org>
To: David Holland <dholland-bugs@netbsd.org>
Cc: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org, 
	netbsd-bugs@netbsd.org
Subject: Re: kern/44402: page fault in supervisor mode in netbsd:filt_piperead
Date: Mon, 18 Jul 2011 16:55:46 +0200

 I can still reproduce this with current NetBSD.

 I have also reproduced this on another machine (i386 current running
 in VMWare fusion configured as using two CPUs), although it takes much
 longer time for that one to panic ("a few hours" compared to "a few
 minutes" on my development machine).

 I tried to reproduce this on a single-CPU machine without success.
 But I only let the script run for ~1 hour, so I do not know if I were
 just unlucky, or if you need a multiprocessor machine in order to
 trigger this bug.

 I have not managed to create a small, simple, test program that
 triggers this bug.  But I have placed a tar-ball (PR44402.tgz) in
 ~kristerw at ftp.netbsd.org that contains everything needed to
 reproduce this.

   /Krister

From: Krister Walfridsson <krister.walfridsson@gmail.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/44402: page fault in supervisor mode in netbsd:filt_piperead
Date: Mon, 18 Jul 2011 00:30:39 +0200 (CEST)

 I can still reproduce this with current NetBSD.

 I have also reproduced this on another machine (i386 current running in
 VMWare fusion configured as using two CPUs), although it takes much
 longer time for that one to panic ("a few hours" compared to "a few 
 minutes" on my development machine).

 I tried to reproduce this on a single-CPU machine without success.  But I
 only let the script run for ~1 hour, so I do not know if I were just 
 unlucky, or if you need a multiprocessor machine in order to trigger this 
 bug.

 I have not managed to create a small, simple, test program that triggers
 this bug.  But I have placed a tar-ball (PR44402.tgz) in ~kristerw at
 ftp.netbsd.org that contains everything needed to reproduce this.

     /Krister

>Unformatted:
Home
PR Database Search
(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.