NetBSD Problem Report #48027
From paul@whooppee.com Sun Jul 7 20:32:07 2013
Return-Path: <paul@whooppee.com>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "Postmaster NetBSD.org" (verified OK))
by mollari.NetBSD.org (Postfix) with ESMTPS id D18087182F
for <gnats-bugs@gnats.NetBSD.org>; Sun, 7 Jul 2013 20:32:06 +0000 (UTC)
Message-Id: <20130707203205.1308024797C@screamer.whooppee.com>
Date: Sun, 7 Jul 2013 13:32:05 -0700 (PDT)
From: paul@whooppee.com
Reply-To: paul@whooppee.com
To: gnats-bugs@NetBSD.org
Subject: nfsserver module doesn't work
X-Send-Pr-Version: 3.95
>Number: 48027
>Category: kern
>Synopsis: nfsserver module doesn't work
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: pgoyette
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sun Jul 07 20:35:00 +0000 2013
>Closed-Date: Sat Dec 14 06:33:04 +0000 2013
>Last-Modified: Sat Dec 14 06:33:04 +0000 2013
>Originator: Paul Goyette
>Release: NetBSD 6.99.23
>Organization:
-------------------------------------------------------------------------
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com |
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
| Kernel Developer | | pgoyette at netbsd.org |
-------------------------------------------------------------------------
>Environment:
System: NetBSD screamer.whooppee.com 6.99.23 NetBSD 6.99.23 (GENERIC) #17: Thu Jul 4 07:18:10 PDT 2013 paul@screamer.whooppee.com:/build/netbsd-local/obj/amd64/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
For some unknown reason, a custom kernel config which contains both
options NFSSERVER
and filesystem NFS
still cannot successfully start the nfsd process and export its
filesystems. nfsd reports "NFS not available" due to a SIGSYS
signal.
The same error occurs when the above options/filesystem are not
included in the kernel, whether or not the nfs and nfsserver
modules are manually pre-loaded or allowed to auto-load.
>How-To-Repeat:
Build a kernel using the config file at
http://www.whooppee.com/~paul/WHOOPPEE-NFS
and try to start nfsd
>Fix:
Unknown
>Release-Note:
>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Mon, 8 Jul 2013 02:21:43 +0000
On Sun, Jul 07, 2013 at 08:35:00PM +0000, paul@whooppee.com wrote:
> For some unknown reason, a custom kernel config which contains both
>
> options NFSSERVER
> and filesystem NFS
>
> still cannot successfully start the nfsd process and export its
> filesystems. nfsd reports "NFS not available" due to a SIGSYS
> signal.
...so this has nothing to do with modules?
--
David A. Holland
dholland@netbsd.org
From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Sun, 7 Jul 2013 19:31:34 -0700 (PDT)
On Mon, 8 Jul 2013, David Holland wrote:
> > still cannot successfully start the nfsd process and export its
> > filesystems. nfsd reports "NFS not available" due to a SIGSYS
> > signal.
>
> ...so this has nothing to do with modules?
I'n suspecting that something isn't getting initialized correctly when
the module is loaded. I've been trying to track it down, but no luck so
far.
But I don't think this is an issue with the MODULAR infrastructure, but
rather an issue with specific implementation of nfsserver module. It
"just works" in a monolithic kernel, but fails with loaded module in a
minimalist kernel.
One thing I have not tried yet is to build a GENERIC-MINUS-NFS kernel
and see if the problem persists there.
-------------------------------------------------------------------------
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com |
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
| Kernel Developer | | pgoyette at netbsd.org |
-------------------------------------------------------------------------
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Mon, 8 Jul 2013 03:42:23 +0000
On Mon, Jul 08, 2013 at 02:35:01AM +0000, Paul Goyette wrote:
> But I don't think this is an issue with the MODULAR infrastructure, but
> rather an issue with specific implementation of nfsserver module. It
> "just works" in a monolithic kernel, but fails with loaded module in a
> minimalist kernel.
In the original report you said it failed even when compiled in...
--
David A. Holland
dholland@netbsd.org
From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Sun, 7 Jul 2013 21:22:59 -0700 (PDT)
On Mon, 8 Jul 2013, David Holland wrote:
> > But I don't think this is an issue with the MODULAR infrastructure, but
> > rather an issue with specific implementation of nfsserver module. It
> > "just works" in a monolithic kernel, but fails with loaded module in a
> > minimalist kernel.
>
> In the original report you said it failed even when compiled in...
Yep. My minimalist kernel is obviously not including something else
that is required for nfsserver. I simply don't know what is missing.
So, to summarize:
Full GENERIC monolithic kernel, with everything built-in, works
My stripped-down, minimalist kernel, with almost everything
removed, does not work,
- when nfsserver module is manually loaded by /etc/boot.cfg
- when module is manually loaded via modload
- when module is allowed to autoload
The same stripped-down kernel _still_ fails to work, even when
'filesystem nfs' and 'options NFSSERVER' are added.
Note that everything "used to work" just fine on 6.99.17. It has only
broken since I updated to 6.99.23
From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Mon, 8 Jul 2013 06:14:04 -0700 (PDT)
An additional data-point:
At one point, I tried to modunload(8) the nfsserver module. A couple of
seconds later, the machine panic'd at mutex_vector_enter() (sorry, I
forgot to copy the offset).
Fortunately I had a PS2 keyboard attached, rather than relying on the
USB keyboard. Unfortunately, it didnpt help, since a "bt" command went
into an infinite loop of
?
kernel: page fault trap, code=0
Faulted in DDB; continuing...
-------------------------------------------------------------------------
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com |
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
| Kernel Developer | | pgoyette at netbsd.org |
-------------------------------------------------------------------------
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Mon, 8 Jul 2013 18:13:38 +0000
On Mon, Jul 08, 2013 at 04:25:01AM +0000, Paul Goyette wrote:
> > In the original report you said it failed even when compiled in...
>
> Yep. My minimalist kernel is obviously not including something else
> that is required for nfsserver. I simply don't know what is missing.
> [...]
Ah.
That is odd - given that the only way you should be getting SIGSYS is
if the nfssvc() syscall doesn't get installed, and that happens as
basically the first step in the module initialization. (Which is
supposed to be called whether or not the code is compiled in or loaded
on the fly.)
And if there's something it depends on that's missing, in the builtin
case that ought to just result in link failure.
Are you sure it loads/attaches successfully? If it starts to load and
then unloads itself for some kind of error, it will uninstall the
syscall and then nfsd won't go.
My inclination would be to add some printfs in that initialization
code to see where it does and doesn't get.
> Note that everything "used to work" just fine on 6.99.17. It has only
> broken since I updated to 6.99.23
Maybe someone broke module initialization.
--
David A. Holland
dholland@netbsd.org
From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Mon, 8 Jul 2013 11:24:35 -0700 (PDT)
On Mon, 8 Jul 2013, David Holland wrote:
> Ah.
>
> That is odd - given that the only way you should be getting SIGSYS is
> if the nfssvc() syscall doesn't get installed, and that happens as
> basically the first step in the module initialization. (Which is
> supposed to be called whether or not the code is compiled in or loaded
> on the fly.)
>
> And if there's something it depends on that's missing, in the builtin
> case that ought to just result in link failure.
Yes, I agree.
> Are you sure it loads/attaches successfully? If it starts to load and
> then unloads itself for some kind of error, it will uninstall the
> syscall and then nfsd won't go.
I noticed that the module initialization code doesn't check for an error
return from syscall_establish(). Also the termination code doesn't look
at the error return from syscall_disestablish(). Given that I've seen
at least one kernel crash triggered directly from modunload(8), I do
suspect this code.
> My inclination would be to add some printfs in that initialization
> code to see where it does and doesn't get.
Yes, that's the next step. I've been trying to monitor progress via
syslog() calls in userland nfsd, but now need more granularity/detail.
> > Note that everything "used to work" just fine on 6.99.17. It has
> > only broken since I updated to 6.99.23
>
> Maybe someone broke module initialization.
Well, I have lots of other modules loaded, without any issues, so I
don't think that the module infrastruture is broken. Just something
wrong with this one.
-------------------------------------------------------------------------
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com |
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
| Kernel Developer | | pgoyette at netbsd.org |
-------------------------------------------------------------------------
From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Mon, 8 Jul 2013 12:32:48 -0700 (PDT)
On Mon, 8 Jul 2013, Paul Goyette wrote:
> > My inclination would be to add some printfs in that initialization
> > code to see where it does and doesn't get.
>
> Yes, that's the next step. I've been trying to monitor progress via
> syslog() calls in userland nfsd, but now need more granularity/detail.
Hmmm, it seems that adding a printf() in the module initialization code
"fixes" the problem. Some sort of timing issue, maybe?
In any case, modunload(8) of the nfsserver module repeatable causes a
kernel panic as previously described. It doesn't panic immediately, and
in fact I've had enough time to actually confirm (with modstat(8)) that
the module has been unloaded. But then it panics. I suspect that there
is maybe some timer code that has not been stopped, and it still fires
even after the module is unloaded?
Oh, I previously noted that the module init/fini code wasn't checking
the error status from syscall_{,dis}establish() calls. That was not
correct - the status is being checked correctly.
-------------------------------------------------------------------------
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com |
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
| Kernel Developer | | pgoyette at netbsd.org |
-------------------------------------------------------------------------
From: Paul Goyette <paul@whooppee.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Mon, 8 Jul 2013 18:48:04 -0700 (PDT)
OK, some more experimental results...
I have a total of six machines, all running the _identical_ kernel,
identical modules, and identical userland. Hardware configurations are
all similar, but NOT identical, and there are five different ASUS
motherboards in use.
Test #1 simply involves starting nfsd manually, and then stopping nfsd.
This results in an auto-load of the nfsserver module.
Test #2 is a manual modunload(8) of the nfsserver module.
Machine Motherboard Test #1 Test #2
1 M4A88T-M OK Panic-A
2 M4A88TD-V EVO OK Panic-A
3 M5A99X EVO Panic-B
4 M4A88TD-M OK OK
5 M4A88TD-m OK Panic-A
6 KGPE-D16 Fails to Panic-A
initialize
6-GEN KGPE-D16 OK Not tried
3A M5A99X EVO OK Panic-A
Panic-A is reported at mutex-vector_enter
Panic-B is reported at nfssvc_addsock+1e5
Note that while machines 4 and 5 are identical (except for network
addressing and hard drive details), the results are different!
The entry for Machine 3A involves a modified nfsserver module, with a
couple of extra printf()s. This changes the behavior.
Something wierd is happening, for sure, but I don't have clue.
For now, my production nfs server will continue to run the GENERIC
kernel (line 6-GEN above).....
-------------------------------------------------------------------------
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com |
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
| Kernel Developer | | pgoyette at netbsd.org |
-------------------------------------------------------------------------
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Tue, 9 Jul 2013 18:01:25 +0000
On Mon, Jul 08, 2013 at 11:24:35AM -0700, Paul Goyette wrote:
> >Are you sure it loads/attaches successfully? If it starts to load and
> >then unloads itself for some kind of error, it will uninstall the
> >syscall and then nfsd won't go.
>
> I noticed that the module initialization code doesn't check for an
> error return from syscall_establish().
Er...?
switch (cmd) {
case MODULE_CMD_INIT:
error = syscall_establish(NULL, nfsserver_syscalls);
if (error != 0) {
return error;
}
--
David A. Holland
dholland@netbsd.org
From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, paul@whooppee.com
Cc:
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Tue, 9 Jul 2013 14:05:01 -0400
On Jul 9, 1:50am, paul@whooppee.com (Paul Goyette) wrote:
-- Subject: Re: kern/48027: nfsserver module doesn't work
| Panic-A is reported at mutex-vector_enter
|
| Panic-B is reported at nfssvc_addsock+1e5
|
| Note that while machines 4 and 5 are identical (except for network
| addressing and hard drive details), the results are different!
|
| The entry for Machine 3A involves a modified nfsserver module, with a
| couple of extra printf()s. This changes the behavior.
|
| Something wierd is happening, for sure, but I don't have clue.
|
| For now, my production nfs server will continue to run the GENERIC
| kernel (line 6-GEN above).....
Stack traces.
christos
From: Paul Goyette <paul@whooppee.com>
To: Christos Zoulas <christos@zoulas.com>
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: kern/48027: nfsserver module doesn't work
Date: Tue, 9 Jul 2013 11:08:53 -0700 (PDT)
Stack traces not possible - infinite trap loop inside GDB
On Tue, 9 Jul 2013, Christos Zoulas wrote:
> On Jul 9, 1:50am, paul@whooppee.com (Paul Goyette) wrote:
> -- Subject: Re: kern/48027: nfsserver module doesn't work
>
> | Panic-A is reported at mutex-vector_enter
> |
> | Panic-B is reported at nfssvc_addsock+1e5
> |
> | Note that while machines 4 and 5 are identical (except for network
> | addressing and hard drive details), the results are different!
> |
> | The entry for Machine 3A involves a modified nfsserver module, with a
> | couple of extra printf()s. This changes the behavior.
> |
> | Something wierd is happening, for sure, but I don't have clue.
> |
> | For now, my production nfs server will continue to run the GENERIC
> | kernel (line 6-GEN above).....
>
> Stack traces.
>
> christos
>
> !DSPAM:51dc50cf82821835572626!
>
>
-------------------------------------------------------------------------
| Paul Goyette | PGP Key fingerprint: | E-mail addresses: |
| Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com |
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
| Kernel Developer | | pgoyette at netbsd.org |
-------------------------------------------------------------------------
From: "Paul Goyette" <pgoyette@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/48027 CVS commit: src/sys/kern
Date: Sat, 14 Dec 2013 06:27:57 +0000
Module Name: src
Committed By: pgoyette
Date: Sat Dec 14 06:27:57 UTC 2013
Modified Files:
src/sys/kern: kern_syscall.c
Log Message:
Add SYS_compat_60__lwp_park to the list of syscalls that can be resolved by loading kernel modules.
This seems to address my PR kern/48027
To generate a diff of this commit:
cvs rdiff -u -r1.8 -r1.9 src/sys/kern/kern_syscall.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Responsible-Changed-From-To: kern-bug-people->pgoyette
Responsible-Changed-By: pgoyette@NetBSD.org
Responsible-Changed-When: Sat, 14 Dec 2013 06:33:04 +0000
Responsible-Changed-Why:
I fixed it
State-Changed-From-To: open->closed
State-Changed-By: pgoyette@NetBSD.org
State-Changed-When: Sat, 14 Dec 2013 06:33:04 +0000
State-Changed-Why:
It's fixed.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.