NetBSD Problem Report #53072

From www@NetBSD.org  Sun Mar  4 14:39:14 2018
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 4F35D7A169
	for <gnats-bugs@gnats.NetBSD.org>; Sun,  4 Mar 2018 14:39:14 +0000 (UTC)
Message-Id: <20180304143913.472EE7A266@mollari.NetBSD.org>
Date: Sun,  4 Mar 2018 14:39:13 +0000 (UTC)
From: rcbixler@nyx.net
Reply-To: rcbixler@nyx.net
To: gnats-bugs@NetBSD.org
Subject: netbsd-8 regression: startx (nv driver) crashes system
X-Send-Pr-Version: www-1.0

>Number:         53072
>Notify-List:    bsiegert@NetBSD.org
>Category:       kern
>Synopsis:       netbsd-8 regression: startx (nv driver) crashes system
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    mrg
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Mar 04 14:40:01 +0000 2018
>Closed-Date:    Tue Mar 06 09:40:18 +0000 2018
>Last-Modified:  Sun Mar 11 18:10:00 +0000 2018
>Originator:     Roy Bixler
>Release:        netbsd-8
>Organization:
>Environment:
NetBSD laptop.bix.org 8.0_BETA NetBSD 8.0_BETA (GENERIC.201802271630Z) i386
>Description:
System is an old Dell laptop (Precision M70) with an NVidia graphics card.  It doesn't work with the Nouveau DRM (problem reported in bug #50804), but it works with the old X server nv driver.  When I upgraded to the kernel above, I found that "startx" crashes the system.  It reboots and leaves a crash dump:

_KERNEL_OPT_NARCNET(0,104,c011e2a5,8,c0fff385,0,104,c0f73de5,dabefc5c,dabefc40) a
t 0
__kernel_end(104,0,c0f73de5,dabefc5c,c2b6ed40,6,dabefce4,dabefc50,c0947c9a,c0f73
de5) at dabefc5c
vpanic(c0f73de5,dabefc5c,dabefcd8,c0120935,c0f73de5,dabefce4,dabefce4,1,dabed2c0
,13246) at vpanic+0x131
snprintf(c0f73de5,dabefce4,dabefce4,1,dabed2c0,13246,8,0,0,0) at snprintf
trap_tss() at trap_tss
--- trap via task gate ---
_KERNEL_OPT_BEEP_ONHALT_COUNT+0x2:

The Xorg.0.log file from the crashed "startx" is:

[   123.004] (**) |-->Input Device "Keyboard0"
[   123.004] (==) Not automatically adding devices
[   123.004] (==) Not automatically enabling devices
[   123.004] (==) Not automatically adding GPU devices
[   123.004] (==) Max clients allowed: 256, resource mask: 0x1fffff
[   123.005] (**) FontPath set to:
        /usr/X11R7/lib/X11/fonts/misc/,
        /usr/X11R7/lib/X11/fonts/TTF/,
        /usr/X11R7/lib/X11/fonts/Type1/,
        /usr/X11R7/lib/X11/fonts/75dpi/,
        /usr/X11R7/lib/X11/fonts/100dpi/,
        /usr/X11R7/lib/X11/fonts/misc/,
        /usr/X11R7/lib/X11/fonts/TTF/,
        /usr/X11R7/lib/X11/fonts/Type1/,
        /usr/X11R7/lib/X11/fonts/75dpi/,
        /usr/X11R7/lib/X11/fonts/100dpi/
[   123.005] (**) ModulePath set to "/usr/X11R7/lib/modules"
[   123.005] Number of created screens does not match number of detected devices.
  Configuration failed.
[   123.006] (EE) Server terminated with error (2). Closing log fil

When I revert to the previous build:

NetBSD laptop.bix.org 8.0_BETA NetBSD 8.0_BETA (GENERIC.201802262040Z) i386

the X works and the system is usable again.  I suspect that this change "[pullup-8 #593] please pullup pmap & pool(9) fixes for netbsd-8" is the issue.
>How-To-Repeat:
Boot up a NetBSD-8 build on or after 201802271630Z with nouveau driver disabled and run "startx" as a normal user.
>Fix:
Revert to a prior NetBSD-8 build, such as the previous one on 201802262040Z.

>Release-Note:

>Audit-Trail:
From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org
Subject: Re: kern/53072: netbsd-8 regression: startx (nv driver) crashes
 system
Date: Sun, 4 Mar 2018 17:31:03 +0100

 On Sun, Mar 04, 2018 at 02:40:01PM +0000, rcbixler@nyx.net wrote:
 > the X works and the system is usable again.  I suspect that this
 > change "[pullup-8 #593] please pullup pmap & pool(9) fixes for
 > netbsd-8" is the issue.

 This sounds a bit unlikely - would you be able to test a -8 kernel
 just before that pullup happened?

 Martin

From: rcbixler@nyx.net
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53072: netbsd-8 regression: startx (nv driver) crashes
 system
Date: Sun, 4 Mar 2018 10:29:32 -0700

 > The following reply was made to PR kern/53072; it has been noted by GNATS.
 >
 > From: Martin Husemann <martin@duskware.de>
 > To: gnats-bugs@NetBSD.org
 > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 > 	netbsd-bugs@netbsd.org
 > Subject: Re: kern/53072: netbsd-8 regression: startx (nv driver) crashes
 >  system
 > Date: Sun, 4 Mar 2018 17:31:03 +0100
 >
 >  On Sun, Mar 04, 2018 at 02:40:01PM +0000, rcbixler@nyx.net wrote:
 >  > the X works and the system is usable again.  I suspect that this
 >  > change "[pullup-8 #593] please pullup pmap & pool(9) fixes for
 >  > netbsd-8" is the issue.
 >
 >  This sounds a bit unlikely - would you be able to test a -8 kernel
 >  just before that pullup happened?
 >
 >  Martin

 I tried the netbsd-8 build from 201802262040Z and didn't have the
 problem.  I first encountered the problem with the netbsd-8 build
 from 201802271630Z.  I see only 2 commits between those 2 times,
 the one I suspected:

     To: source-changes%NetBSD.org@localhost
     Subject: CVS commit: [netbsd-8] src/sys
     From: "Martin Husemann" <martin%netbsd.org@localhost>
     Date: Tue, 27 Feb 2018 09:07:33 +0000

 and this one:

     To: source-changes%NetBSD.org@localhost
     Subject: CVS commit: [netbsd-8] src/doc
     From: "Martin Husemann" <martin%netbsd.org@localhost>
     Date: Tue, 27 Feb 2018 06:07:28 +0000

     Module Name:    src
     Committed By:   martin
     Date:           Tue Feb 27 06:07:28 UTC 2018

     Modified Files:
         src/doc [netbsd-8]: CHANGES-8.0

     Log Message:
     Ammend ticket #587: additionally xform_esp.c r1.77 has been pulled up.


     To generate a diff of this commit:
     cvs rdiff -u -r1.1.2.132 -r1.1.2.133 src/doc/CHANGES-8.0

 I considered the former commit more likely, as it looks pretty involved
 and it affects memory allocation.  Are you suggesting that I try
 to build a netbsd-8 from the current tree with the suspect change reverted?
 If not, what are you suggesting?

 -- 
 Roy Bixler <rcbixler@nyx.net>

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, rcbixler@nyx.net
Subject: Re: kern/53072: netbsd-8 regression: startx (nv driver) crashes
 system
Date: Mon, 5 Mar 2018 08:47:21 +0100

 On Sun, Mar 04, 2018 at 06:50:01PM +0000, rcbixler@nyx.net wrote:
 >  I considered the former commit more likely, as it looks pretty involved
 >  and it affects memory allocation.  Are you suggesting that I try
 >  to build a netbsd-8 from the current tree with the suspect change reverted?
 >  If not, what are you suggesting?

 Ok, that is strong evidence ;-)

 Any chance you could try a -current kernel (just kernel, just for testing)?

 Martin

From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org, rcbixler@nyx.net
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: re: kern/53072: netbsd-8 regression: startx (nv driver) crashes system
Date: Mon, 05 Mar 2018 18:57:19 +1100

 > _KERNEL_OPT_NARCNET(0,104,c011e2a5,8,c0fff385,0,104,c0f73de5,dabefc5c,da=
 befc40) a
 > t 0
 > __kernel_end(104,0,c0f73de5,dabefc5c,c2b6ed40,6,dabefce4,dabefc50,c0947c=
 9a,c0f73
 > de5) at dabefc5c
 > vpanic(c0f73de5,dabefc5c,dabefcd8,c0120935,c0f73de5,dabefce4,dabefce4,1,=
 dabed2c0
 > ,13246) at vpanic+0x131
 > snprintf(c0f73de5,dabefce4,dabefce4,1,dabed2c0,13246,8,0,0,0) at snprint=
 f
 > trap_tss() at trap_tss
 > --- trap via task gate ---
 > _KERNEL_OPT_BEEP_ONHALT_COUNT+0x2:

 this is from crash(8)?  can you try gdb, see if it can trace
 through the trap and where it really is happening?

 this probably is some teardown issue, as it seems that X tries
 and then fails, and then we crash.  can you try the vesa driver
 for now, it should have a reasonably performance until we
 figure this problem out.


 .mrg.

From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org, rcbixler@nyx.net
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: re: kern/53072: netbsd-8 regression: startx (nv driver) crashes system
Date: Mon, 05 Mar 2018 18:59:15 +1100

 other things you can try are to enable drm debug before
 starting X, and seeing what it logs around the failure.
 there are two sysctl's:

 	hw.drm2.drm_debug
 	hw.drm2.nouveau_debug

 not sure about the latter, but i guess both are useful.
 (the former is generic debug that i've used.)

From: Martin Husemann <martin@duskware.de>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, rcbixler@nyx.net
Subject: Re: kern/53072: netbsd-8 regression: startx (nv driver) crashes
 system
Date: Mon, 5 Mar 2018 09:06:54 +0100

 FWIW: I tested a noveau machine running -current and didn't run into any
 issues there.

 Martin

From: rcbixler@nyx.net
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53072: netbsd-8 regression: startx (nv driver) crashes
 system
Date: Mon, 5 Mar 2018 05:36:06 -0700

 > The following reply was made to PR kern/53072; it has been noted by GNATS.
 >
 > From: Martin Husemann <martin@duskware.de>
 > To: gnats-bugs@NetBSD.org
 > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 > 	netbsd-bugs@netbsd.org, rcbixler@nyx.net
 > Subject: Re: kern/53072: netbsd-8 regression: startx (nv driver) crashes
 >  system
 > Date: Mon, 5 Mar 2018 08:47:21 +0100
 >
 >  On Sun, Mar 04, 2018 at 06:50:01PM +0000, rcbixler@nyx.net wrote:
 >  >  I considered the former commit more likely, as it looks pretty
 > involved
 >  >  and it affects memory allocation.  Are you suggesting that I try
 >  >  to build a netbsd-8 from the current tree with the suspect change
 > reverted?
 >  >  If not, what are you suggesting?
 >
 >  Ok, that is strong evidence ;-)
 >
 >  Any chance you could try a -current kernel (just kernel, just for
 > testing)?

 I have a -current installation on that machine, updated on 1 Mar, and
 X works on it.

 -- 
 Roy Bixler <rcbixler@nyx.net>

From: Roy Bixler <rcbixler@nyx.net>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/53072: netbsd-8 regression: startx (nv driver) crashes
 system
Date: Mon, 5 Mar 2018 07:55:19 -0700

 On Mon, Mar 05, 2018 at 08:00:02AM +0000, matthew green wrote:
 > The following reply was made to PR kern/53072; it has been noted by GNATS.
 > 
 > From: matthew green <mrg@eterna.com.au>
 > To: gnats-bugs@NetBSD.org, rcbixler@nyx.net
 > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 >     netbsd-bugs@netbsd.org
 > Subject: re: kern/53072: netbsd-8 regression: startx (nv driver) crashes system
 > Date: Mon, 05 Mar 2018 18:57:19 +1100
 > 
 >  > _KERNEL_OPT_NARCNET(0,104,c011e2a5,8,c0fff385,0,104,c0f73de5,dabefc5c,da=
 >  befc40) a
 >  > t 0
 >  > __kernel_end(104,0,c0f73de5,dabefc5c,c2b6ed40,6,dabefce4,dabefc50,c0947c=
 >  9a,c0f73
 >  > de5) at dabefc5c
 >  > vpanic(c0f73de5,dabefc5c,dabefcd8,c0120935,c0f73de5,dabefce4,dabefce4,1,=
 >  dabed2c0
 >  > ,13246) at vpanic+0x131
 >  > snprintf(c0f73de5,dabefce4,dabefce4,1,dabed2c0,13246,8,0,0,0) at snprint=
 >  f
 >  > trap_tss() at trap_tss
 >  > --- trap via task gate ---
 >  > _KERNEL_OPT_BEEP_ONHALT_COUNT+0x2:
 >  
 >  this is from crash(8)?  can you try gdb, see if it can trace
 >  through the trap and where it really is happening?
 >  
 >  this probably is some teardown issue, as it seems that X tries
 >  and then fails, and then we crash.  can you try the vesa driver
 >  for now, it should have a reasonably performance until we
 >  figure this problem out.
 >  
 >  
 >  .mrg.
 >  

 I've run out of time for now, but I was able to confirm that a kernel
 built from CVS as of 27 Feb. 2018 0900Z doesn't have the problem, but
 a kernel build from CVS as of 27 Feb. 2018 1000Z does have the
 problem.  I also confirmed that I am unable to use X at all with the
 latter.  I tried the vesa driver and the wsfb driver with vesa mode
 and those crash as well.  I guess, if I want to use X, I'll be
 restricted to using the 27 Feb. 2018 0900Z build or -current.

 -- 
 Roy Bixler <rcbixler@nyx.net>
 "The fundamental principle of science, the definition almost, is this: the
 sole test of the validity of any idea is experiment."
 -- Richard P. Feynman

From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org, rcbixler@nyx.net
Subject: re: kern/53072: netbsd-8 regression: startx (nv driver) crashes system
Date: Tue, 06 Mar 2018 05:17:28 +1100

 > I have a -current installation on that machine, updated on 1 Mar, and
 > X works on it.

 OK, so it's probably the case that i missed a particular change
 in the pool/pmap pullup but i am having a lot of trouble figuring
 out what it could be.  i've made 2 more passes over the list of
 updated pool callers and found nothing more.

 did you get anywhere with gdb?  it might be helpful to track
 this down if we know what code path is failing.

 thanks.



 .mrg.

From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org, rcbixler@nyx.net, bsiegert@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
    netbsd-bugs@netbsd.org
Subject: re: kern/53072: netbsd-8 regression: startx (nv driver) crashes system
Date: Tue, 06 Mar 2018 06:06:41 +1100

 can you try this patch?  it's x86/pmap.c 1.267 which was
 missed in the pullup.

 Benny, this might fix your new problem too.  can you also
 test it?

 thanks.


 .mrg.

 Index: pmap.c
 ===================================================================
 RCS file: /cvsroot/src/sys/arch/x86/x86/pmap.c,v
 retrieving revision 1.245.6.2
 diff -p -u -u -r1.245.6.2 pmap.c
 --- pmap.c	27 Feb 2018 09:07:33 -0000	1.245.6.2
 +++ pmap.c	5 Mar 2018 19:02:45 -0000
 @@ -1737,8 +1737,8 @@ pmap_pp_needs_pve(struct pmap_page *pp)
  	 * since the first pv entry is stored in the pmap_page.
  	 */

 -	return (pp->pp_flags & PP_EMBEDDED) != 0 ||
 -		!LIST_EMPTY(&pp->pp_head.pvh_list);
 +	return pp && ((pp->pp_flags & PP_EMBEDDED) != 0 ||
 +	    !LIST_EMPTY(&pp->pp_head.pvh_list));
  }

  /*
 @@ -4123,7 +4123,7 @@ pmap_enter_ma(struct pmap *pmap, vaddr_t
  	 */

  	bool needpves = pmap_pp_needs_pve(new_pp);
 -	if (new_pp && needpves) {
 +	if (needpves) {
  		new_pve = pool_cache_get(&pmap_pv_cache, PR_NOWAIT);
  		new_sparepve = pool_cache_get(&pmap_pv_cache, PR_NOWAIT);
  	} else {

From: Roy Bixler <rcbixler@nyx.net>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/53072: netbsd-8 regression: startx (nv driver) crashes
 system
Date: Mon, 5 Mar 2018 17:23:19 -0700

 On Mon, Mar 05, 2018 at 07:10:00PM +0000, matthew green wrote:
 > The following reply was made to PR kern/53072; it has been noted by GNATS.
 > 
 > From: matthew green <mrg@eterna.com.au>
 > To: gnats-bugs@NetBSD.org, rcbixler@nyx.net, bsiegert@netbsd.org
 > Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 >     netbsd-bugs@netbsd.org
 > Subject: re: kern/53072: netbsd-8 regression: startx (nv driver) crashes system
 > Date: Tue, 06 Mar 2018 06:06:41 +1100
 > 
 >  can you try this patch?  it's x86/pmap.c 1.267 which was
 >  missed in the pullup.

 I applied the patch to the source tree pulled from netbsd-8 as of
 2018-02-27 1000Z and X with the nv driver works again.

 >  Index: pmap.c
 >  ===================================================================
 >  RCS file: /cvsroot/src/sys/arch/x86/x86/pmap.c,v
 >  retrieving revision 1.245.6.2
 >  diff -p -u -u -r1.245.6.2 pmap.c
 >  --- pmap.c	27 Feb 2018 09:07:33 -0000	1.245.6.2
 >  +++ pmap.c	5 Mar 2018 19:02:45 -0000
 >  @@ -1737,8 +1737,8 @@ pmap_pp_needs_pve(struct pmap_page *pp)
 >   	 * since the first pv entry is stored in the pmap_page.
 >   	 */
 >   
 >  -	return (pp->pp_flags & PP_EMBEDDED) != 0 ||
 >  -		!LIST_EMPTY(&pp->pp_head.pvh_list);
 >  +	return pp && ((pp->pp_flags & PP_EMBEDDED) != 0 ||
 >  +	    !LIST_EMPTY(&pp->pp_head.pvh_list));
 >   }
 >   
 >   /*
 >  @@ -4123,7 +4123,7 @@ pmap_enter_ma(struct pmap *pmap, vaddr_t
 >   	 */
 >   
 >   	bool needpves = pmap_pp_needs_pve(new_pp);
 >  -	if (new_pp && needpves) {
 >  +	if (needpves) {
 >   		new_pve = pool_cache_get(&pmap_pv_cache, PR_NOWAIT);
 >   		new_sparepve = pool_cache_get(&pmap_pv_cache, PR_NOWAIT);
 >   	} else {
 >  

 -- 
 Roy Bixler <rcbixler@nyx.net>
 "The fundamental principle of science, the definition almost, is this: the
 sole test of the validity of any idea is experiment."
 -- Richard P. Feynman

Responsible-Changed-From-To: kern-bug-people->mrg
Responsible-Changed-By: mrg@NetBSD.org
Responsible-Changed-When: Tue, 06 Mar 2018 09:40:18 +0000
Responsible-Changed-Why:
my pullup caused the problem.


State-Changed-From-To: open->closed
State-Changed-By: mrg@NetBSD.org
State-Changed-When: Tue, 06 Mar 2018 09:40:18 +0000
State-Changed-Why:
fix has been pulled up.  thanks!


From: Benny Siegert <bsiegert@netbsd.org>
To: mrg@eterna.com.au
Cc: gnats-bugs@netbsd.org, rcbixler@nyx.net, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/53072: netbsd-8 regression: startx (nv driver) crashes system
Date: Sun, 11 Mar 2018 17:32:51 +0000

 On Sun, Mar 11, 2018 at 5:33 PM Benny Siegert <bsiegert@netbsd.org> wrote:

 > On Mon, Mar 5, 2018 at 8:06 PM matthew green <mrg@eterna.com.au> wrote:
 > > Benny, this might fix your new problem too.  can you also
 > > test it?

 > I downloaded a NetBSD-8 GENERIC kernel built on March 11, so that should
 > include the pulled-up patch. I still get runaway memory allocation on
 login
 > (from the login process). Because of that, I could not test whether startx
 > succeeds.

 I managed to gdb into the login process, and it was stuck below
 __getlastlogx50. Removing /var/log/lastlogx fixed this problem!

 Now I could actually run startx, which was crashing before on this
 installation. It is now working perfectly. Thanks for fixing the problem.

 -- 
 Benny

From: Benny Siegert <bsiegert@netbsd.org>
To: mrg@eterna.com.au
Cc: gnats-bugs@netbsd.org, rcbixler@nyx.net, kern-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org
Subject: Re: kern/53072: netbsd-8 regression: startx (nv driver) crashes system
Date: Sun, 11 Mar 2018 16:33:18 +0000

 On Mon, Mar 5, 2018 at 8:06 PM matthew green <mrg@eterna.com.au> wrote:
 > Benny, this might fix your new problem too.  can you also
 > test it?

 I downloaded a NetBSD-8 GENERIC kernel built on March 11, so that should
 include the pulled-up patch. I still get runaway memory allocation on login
 (from the login process). Because of that, I could not test whether startx
 succeeds.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.