NetBSD Problem Report #48027

From paul@whooppee.com  Sun Jul  7 20:32:07 2013
Return-Path: <paul@whooppee.com>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id D18087182F
	for <gnats-bugs@gnats.NetBSD.org>; Sun,  7 Jul 2013 20:32:06 +0000 (UTC)
Message-Id: <20130707203205.1308024797C@screamer.whooppee.com>
Date: Sun,  7 Jul 2013 13:32:05 -0700 (PDT)
From: paul@whooppee.com
Reply-To: paul@whooppee.com
To: gnats-bugs@NetBSD.org
Subject: nfsserver module doesn't work
X-Send-Pr-Version: 3.95

>Number:         48027
>Category:       kern
>Synopsis:       nfsserver module doesn't work
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    pgoyette
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Jul 07 20:35:00 +0000 2013
>Closed-Date:    Sat Dec 14 06:33:04 +0000 2013
>Last-Modified:  Sat Dec 14 06:33:04 +0000 2013
>Originator:     Paul Goyette
>Release:        NetBSD 6.99.23
>Organization:
-------------------------------------------------------------------------
| Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:       |
| Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com    |
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
| Kernel Developer |                          | pgoyette at netbsd.org  |
-------------------------------------------------------------------------
>Environment:


System: NetBSD screamer.whooppee.com 6.99.23 NetBSD 6.99.23 (GENERIC) #17: Thu Jul 4 07:18:10 PDT 2013 paul@screamer.whooppee.com:/build/netbsd-local/obj/amd64/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
	For some unknown reason, a custom kernel config which contains both

		options NFSSERVER 
	and	filesystem NFS

	still cannot successfully start the nfsd process and export its
	filesystems.  nfsd reports "NFS not available" due to a SIGSYS
	signal.

	The same error occurs when the above options/filesystem are not 
	included in the kernel, whether or not the nfs and nfsserver 
	modules are manually pre-loaded or allowed to auto-load.


>How-To-Repeat:
	Build a kernel using the config file at 

		http://www.whooppee.com/~paul/WHOOPPEE-NFS

	and try to start nfsd
>Fix:
	Unknown

>Release-Note:

>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Mon, 8 Jul 2013 02:21:43 +0000

 On Sun, Jul 07, 2013 at 08:35:00PM +0000, paul@whooppee.com wrote:
  > 	For some unknown reason, a custom kernel config which contains both
  > 
  > 		options NFSSERVER 
  > 	and	filesystem NFS
  > 
  > 	still cannot successfully start the nfsd process and export its
  > 	filesystems.  nfsd reports "NFS not available" due to a SIGSYS
  > 	signal.

 ...so this has nothing to do with modules?

 -- 
 David A. Holland
 dholland@netbsd.org

From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Sun, 7 Jul 2013 19:31:34 -0700 (PDT)

 On Mon, 8 Jul 2013, David Holland wrote:

 >  > 	still cannot successfully start the nfsd process and export its
 >  > 	filesystems.  nfsd reports "NFS not available" due to a SIGSYS
 >  > 	signal.
 >
 > ...so this has nothing to do with modules?

 I'n suspecting that something isn't getting initialized correctly when 
 the module is loaded.  I've been trying to track it down, but no luck so 
 far.

 But I don't think this is an issue with the MODULAR infrastructure, but 
 rather an issue with specific implementation of nfsserver module.  It 
 "just works" in a monolithic kernel, but fails with loaded module in a 
 minimalist kernel.

 One thing I have not tried yet is to build a GENERIC-MINUS-NFS kernel 
 and see if the problem persists there.



 -------------------------------------------------------------------------
 | Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:       |
 | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com    |
 | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
 | Kernel Developer |                          | pgoyette at netbsd.org  |
 -------------------------------------------------------------------------

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Mon, 8 Jul 2013 03:42:23 +0000

 On Mon, Jul 08, 2013 at 02:35:01AM +0000, Paul Goyette wrote:
  >  But I don't think this is an issue with the MODULAR infrastructure, but 
  >  rather an issue with specific implementation of nfsserver module.  It 
  >  "just works" in a monolithic kernel, but fails with loaded module in a 
  >  minimalist kernel.

 In the original report you said it failed even when compiled in...

 -- 
 David A. Holland
 dholland@netbsd.org

From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Sun, 7 Jul 2013 21:22:59 -0700 (PDT)

 On Mon, 8 Jul 2013, David Holland wrote:

 >  >  But I don't think this is an issue with the MODULAR infrastructure, but
 >  >  rather an issue with specific implementation of nfsserver module.  It
 >  >  "just works" in a monolithic kernel, but fails with loaded module in a
 >  >  minimalist kernel.
 >
 > In the original report you said it failed even when compiled in...

 Yep.  My minimalist kernel is obviously not including something else 
 that is required for nfsserver.  I simply don't know what is missing.

 So, to summarize:

  	Full GENERIC monolithic kernel, with everything built-in, works

  	My stripped-down, minimalist kernel, with almost everything
  	removed, does not work,

  	- when nfsserver module is manually loaded by /etc/boot.cfg
  	- when module is manually loaded via modload
  	- when module is allowed to autoload

  	The same stripped-down kernel _still_ fails to work, even when
  	'filesystem nfs' and 'options NFSSERVER' are added.

 Note that everything "used to work" just fine on 6.99.17.  It has only 
 broken since I updated to 6.99.23





From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Mon, 8 Jul 2013 06:14:04 -0700 (PDT)

 An additional data-point:

 At one point, I tried to modunload(8) the nfsserver module.  A couple of 
 seconds later, the machine panic'd at mutex_vector_enter() (sorry, I 
 forgot to copy the offset).

 Fortunately I had a PS2 keyboard attached, rather than relying on the 
 USB keyboard.  Unfortunately, it didnpt help, since a "bt" command went 
 into an infinite loop of

  	?
  	kernel: page fault trap, code=0
  	Faulted in DDB; continuing...




 -------------------------------------------------------------------------
 | Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:       |
 | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com    |
 | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
 | Kernel Developer |                          | pgoyette at netbsd.org  |
 -------------------------------------------------------------------------

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Mon, 8 Jul 2013 18:13:38 +0000

 On Mon, Jul 08, 2013 at 04:25:01AM +0000, Paul Goyette wrote:
  >  > In the original report you said it failed even when compiled in...
  >  
  >  Yep.  My minimalist kernel is obviously not including something else 
  >  that is required for nfsserver.  I simply don't know what is missing.
  > [...]

 Ah.

 That is odd - given that the only way you should be getting SIGSYS is
 if the nfssvc() syscall doesn't get installed, and that happens as
 basically the first step in the module initialization. (Which is
 supposed to be called whether or not the code is compiled in or loaded
 on the fly.)

 And if there's something it depends on that's missing, in the builtin
 case that ought to just result in link failure.

 Are you sure it loads/attaches successfully? If it starts to load and
 then unloads itself for some kind of error, it will uninstall the
 syscall and then nfsd won't go.

 My inclination would be to add some printfs in that initialization
 code to see where it does and doesn't get.

  >  Note that everything "used to work" just fine on 6.99.17.  It has only 
  >  broken since I updated to 6.99.23

 Maybe someone broke module initialization.

 -- 
 David A. Holland
 dholland@netbsd.org

From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Mon, 8 Jul 2013 11:24:35 -0700 (PDT)

 On Mon, 8 Jul 2013, David Holland wrote:

 > Ah.
 >
 > That is odd - given that the only way you should be getting SIGSYS is
 > if the nfssvc() syscall doesn't get installed, and that happens as
 > basically the first step in the module initialization. (Which is
 > supposed to be called whether or not the code is compiled in or loaded
 > on the fly.)
 >
 > And if there's something it depends on that's missing, in the builtin
 > case that ought to just result in link failure.

 Yes, I agree.

 > Are you sure it loads/attaches successfully? If it starts to load and
 > then unloads itself for some kind of error, it will uninstall the
 > syscall and then nfsd won't go.

 I noticed that the module initialization code doesn't check for an error 
 return from syscall_establish().  Also the termination code doesn't look 
 at the error return from syscall_disestablish().  Given that I've seen 
 at least one kernel crash triggered directly from modunload(8), I do 
 suspect this code.

 > My inclination would be to add some printfs in that initialization
 > code to see where it does and doesn't get.

 Yes, that's the next step.  I've been trying to monitor progress via 
 syslog() calls in userland nfsd, but now need more granularity/detail.

 >  >  Note that everything "used to work" just fine on 6.99.17.  It has
 >  >  only broken since I updated to 6.99.23
 >
 > Maybe someone broke module initialization.

 Well, I have lots of other modules loaded, without any issues, so I 
 don't think that the module infrastruture is broken.  Just something 
 wrong with this one.



 -------------------------------------------------------------------------
 | Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:       |
 | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com    |
 | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
 | Kernel Developer |                          | pgoyette at netbsd.org  |
 -------------------------------------------------------------------------

From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Mon, 8 Jul 2013 12:32:48 -0700 (PDT)

 On Mon, 8 Jul 2013, Paul Goyette wrote:

 > > My inclination would be to add some printfs in that initialization
 > > code to see where it does and doesn't get.
 >
 > Yes, that's the next step.  I've been trying to monitor progress via
 > syslog() calls in userland nfsd, but now need more granularity/detail.

 Hmmm, it seems that adding a printf() in the module initialization code 
 "fixes" the problem.  Some sort of timing issue, maybe?

 In any case, modunload(8) of the nfsserver module repeatable causes a 
 kernel panic as previously described. It doesn't panic immediately, and 
 in fact I've had enough time to actually confirm (with modstat(8)) that 
 the module has been unloaded.  But then it panics.  I suspect that there 
 is maybe some timer code that has not been stopped, and it still fires 
 even after the module is unloaded?

 Oh, I previously noted that the module init/fini code wasn't checking 
 the error status from syscall_{,dis}establish() calls.  That was not 
 correct - the status is being checked correctly.


 -------------------------------------------------------------------------
 | Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:       |
 | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com    |
 | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
 | Kernel Developer |                          | pgoyette at netbsd.org  |
 -------------------------------------------------------------------------

From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Mon, 8 Jul 2013 18:48:04 -0700 (PDT)

 OK, some more experimental results...

 I have a total of six machines, all running the _identical_ kernel, 
 identical modules, and identical userland.  Hardware configurations are 
 all similar, but NOT identical, and there are five different ASUS 
 motherboards in use.

 Test #1 simply involves starting nfsd manually, and then stopping nfsd. 
 This results in an auto-load of the nfsserver module.

 Test #2 is a manual modunload(8) of the nfsserver module.


  	Machine    Motherboard   Test #1    Test #2
  	   1       M4A88T-M      OK         Panic-A
  	   2       M4A88TD-V EVO OK         Panic-A
  	   3       M5A99X EVO    Panic-B
  	   4       M4A88TD-M     OK         OK
  	   5       M4A88TD-m     OK         Panic-A
  	   6       KGPE-D16      Fails to   Panic-A
  	                         initialize

  	   6-GEN   KGPE-D16      OK         Not tried

  	   3A      M5A99X EVO    OK         Panic-A

 Panic-A is reported at mutex-vector_enter

 Panic-B is reported at nfssvc_addsock+1e5

 Note that while machines 4 and 5 are identical (except for network 
 addressing and hard drive details), the results are different!

 The entry for Machine 3A involves a modified nfsserver module, with a 
 couple of extra printf()s.  This changes the behavior.

 Something wierd is happening, for sure, but I don't have clue.

 For now, my production nfs server will continue to run the GENERIC 
 kernel (line 6-GEN above).....




 -------------------------------------------------------------------------
 | Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:       |
 | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com    |
 | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
 | Kernel Developer |                          | pgoyette at netbsd.org  |
 -------------------------------------------------------------------------

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Tue, 9 Jul 2013 18:01:25 +0000

 On Mon, Jul 08, 2013 at 11:24:35AM -0700, Paul Goyette wrote:
  > >Are you sure it loads/attaches successfully? If it starts to load and
  > >then unloads itself for some kind of error, it will uninstall the
  > >syscall and then nfsd won't go.
  > 
  > I noticed that the module initialization code doesn't check for an
  > error return from syscall_establish().

 Er...?

    switch (cmd) {
    case MODULE_CMD_INIT:
 	error = syscall_establish(NULL, nfsserver_syscalls);
 	if (error != 0) {
 		return error;
 	}

 -- 
 David A. Holland
 dholland@netbsd.org

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, paul@whooppee.com
Cc: 
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Tue, 9 Jul 2013 14:05:01 -0400

 On Jul 9,  1:50am, paul@whooppee.com (Paul Goyette) wrote:
 -- Subject: Re: kern/48027: nfsserver module doesn't work

 |  Panic-A is reported at mutex-vector_enter
 |  
 |  Panic-B is reported at nfssvc_addsock+1e5
 |  
 |  Note that while machines 4 and 5 are identical (except for network 
 |  addressing and hard drive details), the results are different!
 |  
 |  The entry for Machine 3A involves a modified nfsserver module, with a 
 |  couple of extra printf()s.  This changes the behavior.
 |  
 |  Something wierd is happening, for sure, but I don't have clue.
 |  
 |  For now, my production nfs server will continue to run the GENERIC 
 |  kernel (line 6-GEN above).....

 Stack traces.

 christos

From: Paul Goyette <paul@whooppee.com>
To: Christos Zoulas <christos@zoulas.com>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org, 
    netbsd-bugs@netbsd.org
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Tue, 9 Jul 2013 11:08:53 -0700 (PDT)

 Stack traces not possible - infinite trap loop inside GDB


 On Tue, 9 Jul 2013, Christos Zoulas wrote:

 > On Jul 9,  1:50am, paul@whooppee.com (Paul Goyette) wrote:
 > -- Subject: Re: kern/48027: nfsserver module doesn't work
 >
 > |  Panic-A is reported at mutex-vector_enter
 > |
 > |  Panic-B is reported at nfssvc_addsock+1e5
 > |
 > |  Note that while machines 4 and 5 are identical (except for network
 > |  addressing and hard drive details), the results are different!
 > |
 > |  The entry for Machine 3A involves a modified nfsserver module, with a
 > |  couple of extra printf()s.  This changes the behavior.
 > |
 > |  Something wierd is happening, for sure, but I don't have clue.
 > |
 > |  For now, my production nfs server will continue to run the GENERIC
 > |  kernel (line 6-GEN above).....
 >
 > Stack traces.
 >
 > christos
 >
 > !DSPAM:51dc50cf82821835572626!
 >
 >

 -------------------------------------------------------------------------
 | Paul Goyette     | PGP Key fingerprint:     | E-mail addresses:       |
 | Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com    |
 | Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
 | Kernel Developer |                          | pgoyette at netbsd.org  |
 -------------------------------------------------------------------------

From: "Paul Goyette" <pgoyette@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/48027 CVS commit: src/sys/kern
Date: Sat, 14 Dec 2013 06:27:57 +0000

 Module Name:	src
 Committed By:	pgoyette
 Date:		Sat Dec 14 06:27:57 UTC 2013

 Modified Files:
 	src/sys/kern: kern_syscall.c

 Log Message:
 Add SYS_compat_60__lwp_park to the list of syscalls that can be resolved by loading kernel modules.

 This seems to address my PR kern/48027


 To generate a diff of this commit:
 cvs rdiff -u -r1.8 -r1.9 src/sys/kern/kern_syscall.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

Responsible-Changed-From-To: kern-bug-people->pgoyette
Responsible-Changed-By: pgoyette@NetBSD.org
Responsible-Changed-When: Sat, 14 Dec 2013 06:33:04 +0000
Responsible-Changed-Why:
I fixed it


State-Changed-From-To: open->closed
State-Changed-By: pgoyette@NetBSD.org
State-Changed-When: Sat, 14 Dec 2013 06:33:04 +0000
State-Changed-Why:
It's fixed.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.