NetBSD Problem Report #53441

From oster@fween.ca  Tue Jul 10 16:11:38 2018
Return-Path: <oster@fween.ca>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 2A2F67A159
	for <gnats-bugs@gnats.NetBSD.org>; Tue, 10 Jul 2018 16:11:38 +0000 (UTC)
Message-Id: <20180710145412.0934952D379@thog.fween.ca>
Date: Tue, 10 Jul 2018 08:54:12 -0600 (CST)
From: oster@netbsd.org
Reply-To: oster@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: nouveau panic in 8.0_RC2 amd64
X-Send-Pr-Version: 3.95

>Number:         53441
>Category:       kern
>Synopsis:       nouveau panic in 8.0_RC2 amd64
>Confidential:   no
>Severity:       critical
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jul 10 16:15:00 +0000 2018
>Closed-Date:    Sun Sep 23 19:44:49 +0000 2018
>Last-Modified:  Sun Sep 23 19:44:49 +0000 2018
>Originator:     Greg Oster
>Release:        NetBSD 8.0_RC2
>Organization:
>Environment:
System: NetBSD thog 8.0_RC2 NetBSD 8.0_RC2 (THOG.gdb) #0: Fri Jun 29 15:10:23 CST 2018 oster@thog:/u1/builds/build183/src/obj/amd64/u1/builds/build183/src/sys/arch/amd64/compile/THOG.gdb amd64
Architecture: x86_64
Machine: amd64
>Description:

The nouveau driver occasionally panics for no good reason.  It can panic
when X11 is being used, and it can panic when no-one is on the console.

Panic looks like:

uvm_fault(0xffffffff819b7d80, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip 0xffffffff8114d302 cs 0x8 rflags 0x10282 cr2 0x70 ilevel 0x8 rsp 0xffff80013ce5bdd0
curlwp 0xfffffe843b5a0080 pid 0.16 lowest kstack 0xffff80013ce592c0
panic: trap
cpu2: Begin traceback...
vpanic() at netbsd:vpanic+0x219
vpanic() at netbsd:vpanic
trap() at netbsd:trap+0x2b9
--- trap (number 6) ---
nouveau_fence_update() at netbsd:nouveau_fence_update+0x10
nouveau_fence_done() at netbsd:nouveau_fence_done+0x29
nouveau_bo_fence_signalled() at netbsd:nouveau_bo_fence_signalled+0x18
ttm_bo_wait() at netbsd:ttm_bo_wait+0x90
ttm_bo_cleanup_refs_and_unlock() at netbsd:ttm_bo_cleanup_refs_and_unlock+0x66
ttm_bo_delayed_delete() at netbsd:ttm_bo_delayed_delete+0x175
ttm_bo_delayed_workqueue() at netbsd:ttm_bo_delayed_workqueue+0x2b
linux_worker() at netbsd:linux_worker+0xf9
workqueue_runlist() at netbsd:workqueue_runlist+0x59
workqueue_worker() at netbsd:workqueue_worker+0xb1
cpu2: End traceback...
uvm_fault(0xfffffe842f5fd5c0, 0x0, 2) -> e

fatal page fault in supervisor mode
dumping to dev 0,1 (offset=8425399, size=4189705):
trap type 6 code 0x2 rip 0xffffffff80cb5d7b cs 0x8 rflags 0x10296 cr2 0x84 ilevel 0x8 rsp 0xffff800d1u4m2p4 b2b90
curlwp 0xfffffe8403f36120 pid 885.2 lowest kstack 0xffff8001424b02c0
coretemp0: workqueue busy: updates stopped
coretemp1: workqueue busy: updates stopped
coretemp2: workqueue busy: updates stopped
coretemp3: workqueue busy: updates stopped



>How-To-Repeat:

Run the nouveau driver on NetBSD-8.0_RC2/amd64 using a NVIDIA GeForce GT 420:
...
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
nouveau0 at pci1 dev 0 function 0: vendor 10de product 0de2 (rev. 0xa1)
drm kern info: nouveau  [  DEVICE][nouveau0] BOOT0  : 0x0c1100a1
drm kern info: nouveau  [  DEVICE][nouveau0] Chipset: GF108 (NVC1)
drm kern info: nouveau  [  DEVICE][nouveau0] Family : NVC0
drm kern info: nouveau  [   VBIOS][nouveau0] checking PRAMIN for image...
drm kern info: nouveau  [   VBIOS][nouveau0] ... appears to be valid
drm kern info: nouveau  [   VBIOS][nouveau0] using image from PRAMIN
drm kern info: nouveau  [   VBIOS][nouveau0] BIT signature found
drm kern info: nouveau  [   VBIOS][nouveau0] version 70.08.1f.00.0c
nouveau0: interrupting at ioapic0 pin 16 (nouveau)
drm kern warning: nouveau W[     PFB][nouveau0][0x00000000][0xfffffe811d51b808] reclocking of this ram type unsupported
drm kern info: nouveau  [     PFB][nouveau0] RAM type: DDR3
drm kern info: nouveau  [     PFB][nouveau0] RAM size: 512 MiB
drm kern info: nouveau  [     PFB][nouveau0]    ZCOMP: 0 tags
drm kern info: nouveau  [    VOLT][nouveau0] GPU voltage: 900000uv
drm kern info: nouveau  [  PTHERM][nouveau0] FAN control: PWM
drm kern info: nouveau  [  PTHERM][nouveau0] fan management: automatic
drm kern info: nouveau  [  PTHERM][nouveau0] internal sensor: yes
drm kern info: nouveau  [     CLK][nouveau0] 03: core 50 MHz memory 135 MHz 
drm kern info: nouveau  [     CLK][nouveau0] 07: core 405 MHz memory 324 MHz 
drm kern info: nouveau  [     CLK][nouveau0] 0f: core 700 MHz memory 800 MHz 
drm kern info: nouveau  [     CLK][nouveau0] --: core 405 MHz memory 324 MHz 
Zone  kernel: Available graphics memory: 5504634 kiB
Zone   dma32: Available graphics memory: 2097152 kiB
drm kern info: nouveau  [     DRM] VRAM: 512 MiB
drm kern info: nouveau  [     DRM] GART: 1048576 MiB
drm kern info: nouveau  [     DRM] TMDS table version 2.0
drm kern info: nouveau  [     DRM] DCB version 4.0
drm kern info: nouveau  [     DRM] DCB outp 00: 01800302 00020030
drm kern info: nouveau  [     DRM] DCB outp 01: 02000300 00000000
drm kern info: nouveau  [     DRM] DCB outp 02: 08811392 00020020
drm kern info: nouveau  [     DRM] DCB outp 03: 04822310 00000000
drm kern info: nouveau  [     DRM] DCB conn 00: 00001030
drm kern info: nouveau  [     DRM] DCB conn 01: 00002161
drm kern info: nouveau  [     DRM] DCB conn 02: 00000200
drm: Supports vblank timestamp caching Rev 2 (21.10.2013).
drm: Driver supports precise vblank timestamp query.
drm kern info: nouveau  [     DRM] MM: using COPY0 for buffer copies
nouveaufb0 at nouveau0
nouveau0: info: registered panic notifier
nouveaufb0: framebuffer at 0xffff8001400b4000, size 1920x1200, depth 32, stride 7680
...


and then wait for the boom.  The panic may happen in hours or days.


>Fix:
  Please.  I have a kernel with full debug symbols and a couple of crash dumps
related to this if someone wants additional information from them.

>Release-Note:

>Audit-Trail:
From: coypu@sdf.org
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53441: nouveau panic in 8.0_RC2 amd64
Date: Tue, 10 Jul 2018 18:05:22 +0000

 Addendum: note linux doesn't destroy spin locks, so we'll need some
 logic to guard this possibly.

From: Greg Oster <oster@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53441: nouveau panic in 8.0_RC2 amd64
Date: Fri, 3 Aug 2018 17:40:41 -0600

 On Tue, 10 Jul 2018 16:15:00 +0000 (UTC)
 oster@netbsd.org wrote:

 > >Number:         53441
 > >Category:       kern
 > >Synopsis:       nouveau panic in 8.0_RC2 amd64
 > >Confidential:   no
 > >Severity:       critical
 > >Priority:       high
 > >Responsible:    kern-bug-people
 > >State:          open
 > >Class:          sw-bug
 > >Submitter-Id:   net
 > >Arrival-Date:   Tue Jul 10 16:15:00 +0000 2018
 > >Originator:     Greg Oster
 > >Release:        NetBSD 8.0_RC2
 > >Organization:
 > >Environment:  
 > System: NetBSD thog 8.0_RC2 NetBSD 8.0_RC2 (THOG.gdb) #0: Fri Jun 29
 > 15:10:23 CST 2018
 > oster@thog:/u1/builds/build183/src/obj/amd64/u1/builds/build183/src/sys/arch/amd64/compile/THOG.gdb
 > amd64 Architecture: x86_64 Machine: amd64
 > >Description:  
 > 
 > The nouveau driver occasionally panics for no good reason.  It can
 > panic when X11 is being used, and it can panic when no-one is on the
 > console.
 > 
 > Panic looks like:
 > 
 > uvm_fault(0xffffffff819b7d80, 0x0, 1) -> e
 > fatal page fault in supervisor mode
 > trap type 6 code 0 rip 0xffffffff8114d302 cs 0x8 rflags 0x10282 cr2
 > 0x70 ilevel 0x8 rsp 0xffff80013ce5bdd0 curlwp 0xfffffe843b5a0080 pid
 > 0.16 lowest kstack 0xffff80013ce592c0 panic: trap
 > cpu2: Begin traceback...
 > vpanic() at netbsd:vpanic+0x219
 > vpanic() at netbsd:vpanic
 > trap() at netbsd:trap+0x2b9
 > --- trap (number 6) ---
 > nouveau_fence_update() at netbsd:nouveau_fence_update+0x10
 > nouveau_fence_done() at netbsd:nouveau_fence_done+0x29
 > nouveau_bo_fence_signalled() at netbsd:nouveau_bo_fence_signalled+0x18
 > ttm_bo_wait() at netbsd:ttm_bo_wait+0x90
 > ttm_bo_cleanup_refs_and_unlock() at
 > netbsd:ttm_bo_cleanup_refs_and_unlock+0x66 ttm_bo_delayed_delete() at
 > netbsd:ttm_bo_delayed_delete+0x175 ttm_bo_delayed_workqueue() at
 > netbsd:ttm_bo_delayed_workqueue+0x2b linux_worker() at
 > netbsd:linux_worker+0xf9 workqueue_runlist() at
 > netbsd:workqueue_runlist+0x59 workqueue_worker() at
 > netbsd:workqueue_worker+0xb1 cpu2: End traceback...
 > uvm_fault(0xfffffe842f5fd5c0, 0x0, 2) -> e
 > 
 > fatal page fault in supervisor mode
 > dumping to dev 0,1 (offset=8425399, size=4189705):
 > trap type 6 code 0x2 rip 0xffffffff80cb5d7b cs 0x8 rflags 0x10296 cr2
 > 0x84 ilevel 0x8 rsp 0xffff800d1u4m2p4 b2b90 curlwp 0xfffffe8403f36120
 > pid 885.2 lowest kstack 0xffff8001424b02c0 coretemp0: workqueue busy:
 > updates stopped coretemp1: workqueue busy: updates stopped
 > coretemp2: workqueue busy: updates stopped
 > coretemp3: workqueue busy: updates stopped
 > 
 > 
 > 
 > >How-To-Repeat:  
 > 
 > Run the nouveau driver on NetBSD-8.0_RC2/amd64 using a NVIDIA GeForce
 > GT 420: ...
 > pci1 at ppb0 bus 1
 > pci1: i/o space, memory space enabled, rd/line, wr/inv ok
 > nouveau0 at pci1 dev 0 function 0: vendor 10de product 0de2 (rev.
 > 0xa1) drm kern info: nouveau  [  DEVICE][nouveau0] BOOT0  : 0x0c1100a1
 > drm kern info: nouveau  [  DEVICE][nouveau0] Chipset: GF108 (NVC1)
 > drm kern info: nouveau  [  DEVICE][nouveau0] Family : NVC0
 > drm kern info: nouveau  [   VBIOS][nouveau0] checking PRAMIN for
 > image... drm kern info: nouveau  [   VBIOS][nouveau0] ... appears to
 > be valid drm kern info: nouveau  [   VBIOS][nouveau0] using image
 > from PRAMIN drm kern info: nouveau  [   VBIOS][nouveau0] BIT
 > signature found drm kern info: nouveau  [   VBIOS][nouveau0] version
 > 70.08.1f.00.0c nouveau0: interrupting at ioapic0 pin 16 (nouveau)
 > drm kern warning: nouveau
 > W[     PFB][nouveau0][0x00000000][0xfffffe811d51b808] reclocking of
 > this ram type unsupported drm kern info: nouveau
 > [     PFB][nouveau0] RAM type: DDR3 drm kern info: nouveau
 > [     PFB][nouveau0] RAM size: 512 MiB drm kern info: nouveau
 > [     PFB][nouveau0]    ZCOMP: 0 tags drm kern info: nouveau
 > [    VOLT][nouveau0] GPU voltage: 900000uv drm kern info: nouveau
 > [  PTHERM][nouveau0] FAN control: PWM drm kern info: nouveau
 > [  PTHERM][nouveau0] fan management: automatic drm kern info:
 > nouveau  [  PTHERM][nouveau0] internal sensor: yes drm kern info:
 > nouveau  [     CLK][nouveau0] 03: core 50 MHz memory 135 MHz drm kern
 > info: nouveau  [     CLK][nouveau0] 07: core 405 MHz memory 324 MHz
 > drm kern info: nouveau  [     CLK][nouveau0] 0f: core 700 MHz memory
 > 800 MHz drm kern info: nouveau  [     CLK][nouveau0] --: core 405 MHz
 > memory 324 MHz Zone  kernel: Available graphics memory: 5504634 kiB
 > Zone   dma32: Available graphics memory: 2097152 kiB drm kern info:
 > nouveau  [     DRM] VRAM: 512 MiB drm kern info: nouveau  [     DRM]
 > GART: 1048576 MiB drm kern info: nouveau  [     DRM] TMDS table
 > version 2.0 drm kern info: nouveau  [     DRM] DCB version 4.0 drm
 > kern info: nouveau  [     DRM] DCB outp 00: 01800302 00020030 drm
 > kern info: nouveau  [     DRM] DCB outp 01: 02000300 00000000 drm
 > kern info: nouveau  [     DRM] DCB outp 02: 08811392 00020020 drm
 > kern info: nouveau  [     DRM] DCB outp 03: 04822310 00000000 drm
 > kern info: nouveau  [     DRM] DCB conn 00: 00001030 drm kern info:
 > nouveau  [     DRM] DCB conn 01: 00002161 drm kern info: nouveau
 > [     DRM] DCB conn 02: 00000200 drm: Supports vblank timestamp
 > caching Rev 2 (21.10.2013). drm: Driver supports precise vblank
 > timestamp query. drm kern info: nouveau  [     DRM] MM: using COPY0
 > for buffer copies nouveaufb0 at nouveau0
 > nouveau0: info: registered panic notifier
 > nouveaufb0: framebuffer at 0xffff8001400b4000, size 1920x1200, depth
 > 32, stride 7680 ...
 > 
 > 
 > and then wait for the boom.  The panic may happen in hours or days.
 > 
 > 
 > >Fix:  
 >   Please.  I have a kernel with full debug symbols and a couple of
 > crash dumps related to this if someone wants additional information
 > from them.

 Traceback from gdb kernel:

 (gdb) bt
 #0  cpu_reboot (howto=260, bootstr=0x0)
     at /u1/builds/build185/src/sys/arch/amd64/amd64/machdep.c:710
 #1  0xffffffff80ceece2 in vpanic (fmt=0xffffffff81207070 "trap", 
     ap=0xffff80013ce5bbb8)
 at /u1/builds/build185/src/sys/kern/subr_prf.c:342 #2
 0xffffffff80ceeaba in panic (fmt=0xffffffff81207070 "trap")
 at /u1/builds/build185/src/sys/kern/subr_prf.c:258 #3
 0xffffffff80228bfd in trap (frame=0xffff80013ce5bce0)
 at /u1/builds/build185/src/sys/arch/amd64/amd64/trap.c:336 #4
 0xffffffff8021f61f in alltraps () #5  0xffffffff8114d577 in
 nouveau_fence_update (chan=0x0)
 at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c:132
 #6  0xffffffff8114d72d in nouveau_fence_done (fence=0xfffffe834add5c48)
 at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c:171
 #7  0xffffffff811419f5 in nouveau_bo_fence_signalled
 ( sync_obj=0xfffffe834add5c48)
 at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_bo.c:1566
 #8  0xffffffff8119841a in ttm_bo_wait (bo=0xfffffe82f9fc0408,
 lazy=false, interruptible=false, no_wait=true)
 at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:1671
 #9  0xffffffff81195d15 in ttm_bo_cleanup_refs_and_unlock
 ( bo=0xfffffe82f9fc0408, interruptible=false, no_wait_gpu=true)
 at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:516
 #10 0xffffffff81196108 in ttm_bo_delayed_delete
 (bdev=0xfffffe811d500160, remove_all=false)
 at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:621
 #11 0xffffffff811961da in ttm_bo_delayed_workqueue
 (work=0xfffffe811d500520)
 at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:650
 #12 0xffffffff80abf6a9 in linux_worker (wk=0xfffffe811d500520,
 arg=0xfffffe843e620f80)
 at /u1/builds/build185/src/sys/external/bsd/common/linux/linux_work.c:505
 #13 0xffffffff80cf85ef in workqueue_runlist (wq=0xfffffe843b5b7d00,
 list=0xfffffe843b5b7d70)
 at /u1/builds/build185/src/sys/kern/subr_workqueue.c:106 #14
 0xffffffff80cf86b2 in workqueue_worker (cookie=0xfffffe843b5b7d00)
 at /u1/builds/build185/src/sys/kern/subr_workqueue.c:133 #15
 0xffffffff80208747 in lwp_trampoline () #16 0x0000000000000000 in ?? ()
 (gdb)
 ...
 (gdb) list
 166
 167     bool
 168     nouveau_fence_done(struct nouveau_fence *fence)
 169     {
 170             if (fence->channel)
 171                     nouveau_fence_update(fence->channel);
 172             return !fence->channel;
 173     }
 174
 175     static int
 (gdb) down
 #5  0xffffffff8114d577 in nouveau_fence_update (chan=0x0)
     at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c:132
 132             struct nouveau_fence_chan *fctx = chan->fence;
 (gdb) list
 127     }
 128
 129     static void
 130     nouveau_fence_update(struct nouveau_channel *chan)
 131     {
 132             struct nouveau_fence_chan *fctx = chan->fence;
 133             struct nouveau_fence *fence, *fnext;
 134
 135             spin_lock(&fctx->lock);
 136             list_for_each_entry_safe(fence, fnext, &fctx->pending,
 head) { 
 (gdb) print chan
 $11 = (struct nouveau_channel *) 0x0
 (gdb) 

 "huh?"

 We just checked fence->channel for non-zero before the call to
 nouveau_fence_update(), and now it's suddenly zero?  Methinks there 
 are some locking issues happening here if the rug is getting pulled
 out that fast!  Also: are there other uses of fence->channel where it
 could suddenly change from something to 0 and cause issues?

 (the machine worked fine for 8 days before this panic...)

 Later...

 Greg Oster

From: Greg Oster <oster@netbsd.org>
To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org, oster@netbsd.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53441: nouveau panic in 8.0_RC2 amd64
Date: Fri, 3 Aug 2018 20:53:16 -0600

 On Fri,  3 Aug 2018 23:45:01 +0000 (UTC)
 Greg Oster <oster@netbsd.org> wrote:

 > The following reply was made to PR kern/53441; it has been noted by
 > GNATS.
 > 
 > From: Greg Oster <oster@netbsd.org>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: kern/53441: nouveau panic in 8.0_RC2 amd64
 > Date: Fri, 3 Aug 2018 17:40:41 -0600
 > 
 >  On Tue, 10 Jul 2018 16:15:00 +0000 (UTC)
 >  oster@netbsd.org wrote:
 >  
 >  > >Number:         53441
 >  > >Category:       kern
 >  > >Synopsis:       nouveau panic in 8.0_RC2 amd64
 >  > >Confidential:   no
 >  > >Severity:       critical
 >  > >Priority:       high
 >  > >Responsible:    kern-bug-people
 >  > >State:          open
 >  > >Class:          sw-bug
 >  > >Submitter-Id:   net
 >  > >Arrival-Date:   Tue Jul 10 16:15:00 +0000 2018
 >  > >Originator:     Greg Oster
 >  > >Release:        NetBSD 8.0_RC2
 >  > >Organization:
 >  > >Environment:    
 >  > System: NetBSD thog 8.0_RC2 NetBSD 8.0_RC2 (THOG.gdb) #0: Fri Jun
 >  > 29 15:10:23 CST 2018
 >  > oster@thog:/u1/builds/build183/src/obj/amd64/u1/builds/build183/src/sys/arch/amd64/compile/THOG.gdb
 >  > amd64 Architecture: x86_64 Machine: amd64  
 >  > >Description:    
 >  > 
 >  > The nouveau driver occasionally panics for no good reason.  It can
 >  > panic when X11 is being used, and it can panic when no-one is on
 >  > the console.
 >  > 
 >  > Panic looks like:
 >  > 
 >  > uvm_fault(0xffffffff819b7d80, 0x0, 1) -> e
 >  > fatal page fault in supervisor mode
 >  > trap type 6 code 0 rip 0xffffffff8114d302 cs 0x8 rflags 0x10282 cr2
 >  > 0x70 ilevel 0x8 rsp 0xffff80013ce5bdd0 curlwp 0xfffffe843b5a0080
 >  > pid 0.16 lowest kstack 0xffff80013ce592c0 panic: trap
 >  > cpu2: Begin traceback...
 >  > vpanic() at netbsd:vpanic+0x219
 >  > vpanic() at netbsd:vpanic
 >  > trap() at netbsd:trap+0x2b9
 >  > --- trap (number 6) ---
 >  > nouveau_fence_update() at netbsd:nouveau_fence_update+0x10
 >  > nouveau_fence_done() at netbsd:nouveau_fence_done+0x29
 >  > nouveau_bo_fence_signalled() at
 >  > netbsd:nouveau_bo_fence_signalled+0x18 ttm_bo_wait() at
 >  > netbsd:ttm_bo_wait+0x90 ttm_bo_cleanup_refs_and_unlock() at
 >  > netbsd:ttm_bo_cleanup_refs_and_unlock+0x66 ttm_bo_delayed_delete()
 >  > at netbsd:ttm_bo_delayed_delete+0x175 ttm_bo_delayed_workqueue() at
 >  > netbsd:ttm_bo_delayed_workqueue+0x2b linux_worker() at
 >  > netbsd:linux_worker+0xf9 workqueue_runlist() at
 >  > netbsd:workqueue_runlist+0x59 workqueue_worker() at
 >  > netbsd:workqueue_worker+0xb1 cpu2: End traceback...
 >  > uvm_fault(0xfffffe842f5fd5c0, 0x0, 2) -> e
 >  > 
 >  > fatal page fault in supervisor mode
 >  > dumping to dev 0,1 (offset=8425399, size=4189705):
 >  > trap type 6 code 0x2 rip 0xffffffff80cb5d7b cs 0x8 rflags 0x10296
 >  > cr2 0x84 ilevel 0x8 rsp 0xffff800d1u4m2p4 b2b90 curlwp
 >  > 0xfffffe8403f36120 pid 885.2 lowest kstack 0xffff8001424b02c0
 >  > coretemp0: workqueue busy: updates stopped coretemp1: workqueue
 >  > busy: updates stopped coretemp2: workqueue busy: updates stopped
 >  > coretemp3: workqueue busy: updates stopped
 >  > 
 >  > 
 >  >   
 >  > >How-To-Repeat:    
 >  > 
 >  > Run the nouveau driver on NetBSD-8.0_RC2/amd64 using a NVIDIA
 >  > GeForce GT 420: ...
 >  > pci1 at ppb0 bus 1
 >  > pci1: i/o space, memory space enabled, rd/line, wr/inv ok
 >  > nouveau0 at pci1 dev 0 function 0: vendor 10de product 0de2 (rev.
 >  > 0xa1) drm kern info: nouveau  [  DEVICE][nouveau0] BOOT0  :
 >  > 0x0c1100a1 drm kern info: nouveau  [  DEVICE][nouveau0] Chipset:
 >  > GF108 (NVC1) drm kern info: nouveau  [  DEVICE][nouveau0] Family :
 >  > NVC0 drm kern info: nouveau  [   VBIOS][nouveau0] checking PRAMIN
 >  > for image... drm kern info: nouveau  [   VBIOS][nouveau0] ...
 >  > appears to be valid drm kern info: nouveau  [   VBIOS][nouveau0]
 >  > using image from PRAMIN drm kern info: nouveau
 >  > [   VBIOS][nouveau0] BIT signature found drm kern info: nouveau
 >  > [   VBIOS][nouveau0] version 70.08.1f.00.0c nouveau0: interrupting
 >  > at ioapic0 pin 16 (nouveau) drm kern warning: nouveau
 >  > W[     PFB][nouveau0][0x00000000][0xfffffe811d51b808] reclocking of
 >  > this ram type unsupported drm kern info: nouveau
 >  > [     PFB][nouveau0] RAM type: DDR3 drm kern info: nouveau
 >  > [     PFB][nouveau0] RAM size: 512 MiB drm kern info: nouveau
 >  > [     PFB][nouveau0]    ZCOMP: 0 tags drm kern info: nouveau
 >  > [    VOLT][nouveau0] GPU voltage: 900000uv drm kern info: nouveau
 >  > [  PTHERM][nouveau0] FAN control: PWM drm kern info: nouveau
 >  > [  PTHERM][nouveau0] fan management: automatic drm kern info:
 >  > nouveau  [  PTHERM][nouveau0] internal sensor: yes drm kern info:
 >  > nouveau  [     CLK][nouveau0] 03: core 50 MHz memory 135 MHz drm
 >  > kern info: nouveau  [     CLK][nouveau0] 07: core 405 MHz memory
 >  > 324 MHz drm kern info: nouveau  [     CLK][nouveau0] 0f: core 700
 >  > MHz memory 800 MHz drm kern info: nouveau  [     CLK][nouveau0]
 >  > --: core 405 MHz memory 324 MHz Zone  kernel: Available graphics
 >  > memory: 5504634 kiB Zone   dma32: Available graphics memory:
 >  > 2097152 kiB drm kern info: nouveau  [     DRM] VRAM: 512 MiB drm
 >  > kern info: nouveau  [     DRM] GART: 1048576 MiB drm kern info:
 >  > nouveau  [     DRM] TMDS table version 2.0 drm kern info: nouveau
 >  > [     DRM] DCB version 4.0 drm kern info: nouveau  [     DRM] DCB
 >  > outp 00: 01800302 00020030 drm kern info: nouveau  [     DRM] DCB
 >  > outp 01: 02000300 00000000 drm kern info: nouveau  [     DRM] DCB
 >  > outp 02: 08811392 00020020 drm kern info: nouveau  [     DRM] DCB
 >  > outp 03: 04822310 00000000 drm kern info: nouveau  [     DRM] DCB
 >  > conn 00: 00001030 drm kern info: nouveau  [     DRM] DCB conn 01:
 >  > 00002161 drm kern info: nouveau [     DRM] DCB conn 02: 00000200
 >  > drm: Supports vblank timestamp caching Rev 2 (21.10.2013). drm:
 >  > Driver supports precise vblank timestamp query. drm kern info:
 >  > nouveau  [     DRM] MM: using COPY0 for buffer copies nouveaufb0
 >  > at nouveau0 nouveau0: info: registered panic notifier
 >  > nouveaufb0: framebuffer at 0xffff8001400b4000, size 1920x1200,
 >  > depth 32, stride 7680 ...
 >  > 
 >  > 
 >  > and then wait for the boom.  The panic may happen in hours or days.
 >  > 
 >  >   
 >  > >Fix:    
 >  >   Please.  I have a kernel with full debug symbols and a couple of
 >  > crash dumps related to this if someone wants additional information
 >  > from them.  
 >  
 >  Traceback from gdb kernel:
 >  
 >  (gdb) bt
 >  #0  cpu_reboot (howto=260, bootstr=0x0)
 >      at /u1/builds/build185/src/sys/arch/amd64/amd64/machdep.c:710
 >  #1  0xffffffff80ceece2 in vpanic (fmt=0xffffffff81207070 "trap", 
 >      ap=0xffff80013ce5bbb8)
 >  at /u1/builds/build185/src/sys/kern/subr_prf.c:342 #2
 >  0xffffffff80ceeaba in panic (fmt=0xffffffff81207070 "trap")
 >  at /u1/builds/build185/src/sys/kern/subr_prf.c:258 #3
 >  0xffffffff80228bfd in trap (frame=0xffff80013ce5bce0)
 >  at /u1/builds/build185/src/sys/arch/amd64/amd64/trap.c:336 #4
 >  0xffffffff8021f61f in alltraps () #5  0xffffffff8114d577 in
 >  nouveau_fence_update (chan=0x0)
 >  at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c:132
 >  #6  0xffffffff8114d72d in nouveau_fence_done
 > (fence=0xfffffe834add5c48)
 > at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c:171
 > #7  0xffffffff811419f5 in nouveau_bo_fence_signalled
 > ( sync_obj=0xfffffe834add5c48)
 > at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_bo.c:1566
 > #8  0xffffffff8119841a in ttm_bo_wait (bo=0xfffffe82f9fc0408,
 > lazy=false, interruptible=false, no_wait=true)
 > at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:1671
 > #9  0xffffffff81195d15 in ttm_bo_cleanup_refs_and_unlock
 > ( bo=0xfffffe82f9fc0408, interruptible=false, no_wait_gpu=true)
 > at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:516
 > #10 0xffffffff81196108 in ttm_bo_delayed_delete
 > (bdev=0xfffffe811d500160, remove_all=false)
 > at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:621
 > #11 0xffffffff811961da in ttm_bo_delayed_workqueue
 > (work=0xfffffe811d500520)
 > at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:650
 > #12 0xffffffff80abf6a9 in linux_worker (wk=0xfffffe811d500520,
 > arg=0xfffffe843e620f80)
 > at /u1/builds/build185/src/sys/external/bsd/common/linux/linux_work.c:505
 > #13 0xffffffff80cf85ef in workqueue_runlist (wq=0xfffffe843b5b7d00,
 > list=0xfffffe843b5b7d70)
 > at /u1/builds/build185/src/sys/kern/subr_workqueue.c:106 #14
 > 0xffffffff80cf86b2 in workqueue_worker (cookie=0xfffffe843b5b7d00)
 > at /u1/builds/build185/src/sys/kern/subr_workqueue.c:133 #15
 > 0xffffffff80208747 in lwp_trampoline () #16 0x0000000000000000 in ??
 > () (gdb) ...
 >  (gdb) list
 >  166
 >  167     bool
 >  168     nouveau_fence_done(struct nouveau_fence *fence)
 >  169     {
 >  170             if (fence->channel)
 >  171                     nouveau_fence_update(fence->channel);
 >  172             return !fence->channel;
 >  173     }
 >  174
 >  175     static int
 >  (gdb) down
 >  #5  0xffffffff8114d577 in nouveau_fence_update (chan=0x0)
 >      at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c:132
 >  132             struct nouveau_fence_chan *fctx = chan->fence;
 >  (gdb) list
 >  127     }
 >  128
 >  129     static void
 >  130     nouveau_fence_update(struct nouveau_channel *chan)
 >  131     {
 >  132             struct nouveau_fence_chan *fctx = chan->fence;
 >  133             struct nouveau_fence *fence, *fnext;
 >  134
 >  135             spin_lock(&fctx->lock);
 >  136             list_for_each_entry_safe(fence, fnext,
 > &fctx->pending, head) { 
 >  (gdb) print chan
 >  $11 = (struct nouveau_channel *) 0x0
 >  (gdb) 
 >  
 >  "huh?"
 >  
 >  We just checked fence->channel for non-zero before the call to
 >  nouveau_fence_update(), and now it's suddenly zero?  Methinks there 
 >  are some locking issues happening here if the rug is getting pulled
 >  out that fast!  Also: are there other uses of fence->channel where it
 >  could suddenly change from something to 0 and cause issues?
 >  
 >  (the machine worked fine for 8 days before this panic...)
 >  
 >  Later...
 >  
 >  Greg Oster
 >  

 Just fell over again.. so twice now today.  Seems there are (at least)
 two different failure modes - one where I can get a kernel trace, and
 one where it's a fast trip to reboot.... 

 uvm_fault(0xffffffff819b7d80, 0x0, 1) -> e
 fatal page fault in supervisor mode
 trap type 6 code 0 rip 0xffffffff8114d577 cs 0x8 rflags 0x10282 cr2
 0x70 ilevel 0x8 rsp 0xffff80013ce5bdd0 curlwp 0xfffffe843b5a0080 pid
 0.16 lowest kstack 0xffff80013ce592c0 panic: trap
 cpu1: Begin traceback...
 vpanic() at netbsd:vpanic+0x219
 vpanic() at netbsd:vpanic
 trap() at netbsd:trap+0x2b9
 --- trap (number 6) ---
 nouveau_fence_update() at netbsd:nouveau_fence_update+0x10
 nouveau_fence_done() at netbsd:nouveau_fence_done+0x29
 nouveau_bo_fence_signalled() at netbsd:nouveau_bo_fence_signalled+0x18
 ttm_bo_wait() at netbsd:ttm_bo_wait+0x90
 ttm_bo_cleanup_refs_and_unlock() at
 netbsd:ttm_bo_cleanup_refs_and_unlock+0x66 ttm_bo_delayed_delete() at
 netbsd:ttm_bo_delayed_delete+0x175 ttm_bo_delayed_workqueue() at
 netbsd:ttm_bo_delayed_workqueue+0x2b linux_worker() at
 netbsd:linux_worker+0xf9 workqueue_runlist() at
 netbsd:workqueue_runlist+0x59 workqueue_worker() at
 netbsd:workqueue_worker+0xb1 cpu1: End traceback...


 Later...

 Greg Oster

 -- 

 Later...

 Greg Oster

From: Taylor R Campbell <campbell@mumble.net>
To: Greg Oster <oster@NetBSD.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53441: nouveau panic in 8.0_RC2 amd64
Date: Fri, 10 Aug 2018 23:26:31 +0000

 This is a multi-part message in MIME format.
 --=_Nwac1jXA/0rVUrtExBwAgxmAQFvib7Kf

 Please try the attached patch and let me know if it helps.

 --=_Nwac1jXA/0rVUrtExBwAgxmAQFvib7Kf
 Content-Type: text/plain; charset="ISO-8859-1"; name="53441"
 Content-Transfer-Encoding: quoted-printable
 Content-Disposition: attachment; filename="53441.patch"

 From ef5781c793d890187ac766847a0228296c8edc2c Mon Sep 17 00:00:00 2001
 From: Taylor R Campbell <riastradh@NetBSD.org>
 Date: Fri, 10 Aug 2018 22:29:26 +0000
 Subject: [PATCH] Attempt to sort out race between nouveau_fence_done/signal.

 ---
  .../bsd/drm2/dist/drm/nouveau/nouveau_fence.c      | 28 ++++++++++++++++++=
 ----
  1 file changed, 23 insertions(+), 5 deletions(-)

 diff --git a/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c b/sys/e=
 xternal/bsd/drm2/dist/drm/nouveau/nouveau_fence.c
 index 2a83285e07da..49bdb96273dd 100644
 --- a/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c
 +++ b/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c
 @@ -126,21 +126,27 @@ nouveau_fence_work(struct nouveau_fence *fence,
  	spin_unlock(&fctx->lock);
  }
 =20
 -static void
 -nouveau_fence_update(struct nouveau_channel *chan)
 +static bool
 +nouveau_fence_update(struct nouveau_channel *chan,
 +    struct nouveau_fence *fence0)
  {
  	struct nouveau_fence_chan *fctx =3D chan->fence;
  	struct nouveau_fence *fence, *fnext;
 +	bool signalled =3D false;	/* Did we signal fence0?  */
 =20
  	spin_lock(&fctx->lock);
  	list_for_each_entry_safe(fence, fnext, &fctx->pending, head) {
  		if (fctx->read(chan) < fence->sequence)
  			break;
 =20
 +		if (fence =3D=3D fence0)
 +			signalled =3D true;
  		nouveau_fence_signal(fence);
  		nouveau_fence_unref(&fence);
  	}
  	spin_unlock(&fctx->lock);
 +
 +	return signalled;
  }
 =20
  int
 @@ -167,9 +173,21 @@ nouveau_fence_emit(struct nouveau_fence *fence, struct=
  nouveau_channel *chan)
  bool
  nouveau_fence_done(struct nouveau_fence *fence)
  {
 -	if (fence->channel)
 -		nouveau_fence_update(fence->channel);
 -	return !fence->channel;
 +	struct nouveau_channel *chan;
 +
 +	/*
 +	 * The lock under which the fence transitions from the
 +	 * not-signalled state to the signalled state is stored in the
 +	 * channel.  The way the signalled state is indicated is by
 +	 * nulling the pointer to the channel.  Soooo...  We load the
 +	 * channel pointer once, and hope that the reference counting
 +	 * mechanism for fences keeps the fence from dying too.
 +	 */
 +	chan =3D fence->channel;
 +	__insn_barrier();
 +	if (chan)
 +		return nouveau_fence_update(chan, fence);
 +	return !chan;
  }
 =20
  static int
 --=20
 2.11.0


 --=_Nwac1jXA/0rVUrtExBwAgxmAQFvib7Kf--

From: Greg Oster <oster@netbsd.org>
To: Taylor R Campbell <campbell@mumble.net>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53441: nouveau panic in 8.0_RC2 amd64
Date: Fri, 10 Aug 2018 21:05:16 -0600

 On Fri, 10 Aug 2018 23:26:31 +0000
 Taylor R Campbell <campbell@mumble.net> wrote:

 > Please try the attached patch and let me know if it helps.

 Now running a kernel with the patch.  Will let you know what
 happens.

 Thanks!

 Later...

 Greg Oster


From: Greg Oster <oster@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/53441: nouveau panic in 8.0_RC2 amd64
Date: Sat, 11 Aug 2018 15:08:44 -0600

 On Sat, 11 Aug 2018 03:10:01 +0000 (UTC)
 Greg Oster <oster@netbsd.org> wrote:

 > The following reply was made to PR kern/53441; it has been noted by
 > GNATS.
 > 
 > From: Greg Oster <oster@netbsd.org>
 > To: Taylor R Campbell <campbell@mumble.net>
 > Cc: gnats-bugs@NetBSD.org
 > Subject: Re: kern/53441: nouveau panic in 8.0_RC2 amd64
 > Date: Fri, 10 Aug 2018 21:05:16 -0600
 > 
 >  On Fri, 10 Aug 2018 23:26:31 +0000
 >  Taylor R Campbell <campbell@mumble.net> wrote:
 >  
 >  > Please try the attached patch and let me know if it helps.  
 >  
 >  Now running a kernel with the patch.  Will let you know what
 >  happens.

 Keeled over again early this afternoon (I wasn't even sitting at the
 machine at the time).

 uvm_fault(0xfffffe8433abd8c0, 0x0, 1) -> e
 fatal page fault in supervisor mode
 trap type 6 code 0 rip 0xffffffff8114d7d5 cs 0x8 rflags 0x13282 cr2 0x8
 ilevel 0 rsp 0xffff800141d3ac20 curlwp 0xfffffe843792d260 pid 161.1
 lowest kstack 0xffff800141d382c0 panic: trap
 cpu0: Begin traceback...
 vpanic() at netbsd:vpanic+0x219
 vpanic() at netbsd:vpanic
 trap() at netbsd:trap+0x2b9
 --- trap (number 6) ---
 nouveau_fence_wait_uevent() at netbsd:nouveau_fence_wait_uevent+0x21
 nouveau_fence_wait() at netbsd:nouveau_fence_wait+0x5e
 nouveau_bo_fence_wait() at netbsd:nouveau_bo_fence_wait+0x2c
 ttm_bo_wait() at netbsd:ttm_bo_wait+0x1a0
 nouveau_gem_ioctl_cpu_prep() at netbsd:nouveau_gem_ioctl_cpu_prep+0xa2
 drm_ioctl() at netbsd:drm_ioctl+0x248
 sys_ioctl() at netbsd:sys_ioctl+0x4eb
 sy_call() at netbsd:sy_call+-0x2918aa
 sy_invoke() at netbsd:sy_invoke+0xd5
 syscall() at netbsd:syscall+0xff
 --- syscall (number 54) ---
 73cc0a0fedfa:
 cpu0: End traceback...

 I have a crash dump if details from there would be interesting.

 Thanks.

 Later...

 Greg Oster

From: Taylor R Campbell <campbell@mumble.net>
To: Greg Oster <oster@NetBSD.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53441: nouveau panic in 8.0_RC2 amd64
Date: Thu, 16 Aug 2018 05:37:54 +0000

 This is a multi-part message in MIME format.
 --=_73WqrbnRAUdWsUzKH5e7RXeLfJD5xyxZ

 Please revert the previous patch, and try the attached patch instead.

 --=_73WqrbnRAUdWsUzKH5e7RXeLfJD5xyxZ
 Content-Type: text/plain; charset="ISO-8859-1"; name="53441-v2"
 Content-Transfer-Encoding: quoted-printable
 Content-Disposition: attachment; filename="53441-v2.patch"

 diff --git a/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c b/sys/e=
 xternal/bsd/drm2/dist/drm/nouveau/nouveau_fence.c
 index 2a83285e07da..da0864f2f13d 100644
 --- a/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c
 +++ b/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c
 @@ -29,6 +29,9 @@
  #include <sys/cdefs.h>
  __KERNEL_RCSID(0, "$NetBSD: nouveau_fence.c,v 1.4 2016/04/13 07:57:15 rias=
 tradh Exp $");
 =20
 +#include <sys/types.h>
 +#include <sys/xcall.h>
 +
  #include <drm/drmP.h>
 =20
  #include <asm/param.h>
 @@ -41,6 +44,12 @@ __KERNEL_RCSID(0, "$NetBSD: nouveau_fence.c,v 1.4 2016/0=
 4/13 07:57:15 riastradh
 =20
  #include <engine/fifo.h>
 =20
 +/*
 + * struct fence_work
 + *
 + *	State for a work action scheduled when a fence is completed.
 + *	Will call func(data) at some point after that happens.
 + */
  struct fence_work {
  	struct work_struct base;
  	struct list_head head;
 @@ -48,101 +57,289 @@ struct fence_work {
  	void *data;
  };
 =20
 +/*
 + * nouveau_fence_channel_acquire(fence)
 + *
 + *	Try to return the channel associated with fence.
 + */
 +static struct nouveau_channel *
 +nouveau_fence_channel_acquire(struct nouveau_fence *fence)
 +{
 +	struct nouveau_channel *chan;
 +	struct nouveau_fence_chan *fctx;
 +
 +	/*
 +	 * Block cross-calls while we examine fence.  If we observe
 +	 * that fence->done is false, then the channel cannot be
 +	 * destroyed even by another CPU until after kpreempt_enable.
 +	 */
 +	kpreempt_disable();
 +	if (fence->done) {
 +		chan =3D NULL;
 +	} else {
 +		chan =3D fence->channel;
 +		fctx =3D chan->fence;
 +		atomic_inc_uint(&fctx->refcnt);
 +	}
 +	kpreempt_enable();
 +
 +	return chan;
 +}
 +
 +/*
 + * nouveau_fence_gc_grab(fctx, list)
 + *
 + *	Move all of channel's done fences to list.
 + *
 + *	Caller must hold channel's fence lock.
 + */
 +static void
 +nouveau_fence_gc_grab(struct nouveau_fence_chan *fctx, struct list_head *l=
 ist)
 +{
 +	struct list_head *node, *next;
 +
 +	BUG_ON(!spin_is_locked(&fctx->lock));
 +
 +	list_for_each_safe(node, next, &fctx->done) {
 +		list_move_tail(node, list);
 +	}
 +}
 +
 +/*
 + * nouveau_fence_gc_free(list)
 + *
 + *	Unreference all of the fences in the list.
 + *
 + *	Caller MUST NOT hold the fences' channel's fence lock.
 + */
 +static void
 +nouveau_fence_gc_free(struct list_head *list)
 +{
 +	struct nouveau_fence *fence, *next;
 +
 +	list_for_each_entry_safe(fence, next, list, head) {
 +		list_del(&fence->head);
 +		nouveau_fence_unref(&fence);
 +	}
 +}
 +
 +/*
 + * nouveau_fence_channel_release(channel)
 + *
 + *	Release the channel acquired with nouveau_fence_channel_acquire.
 + */
 +static void
 +nouveau_fence_channel_release(struct nouveau_channel *chan)
 +{
 +	struct nouveau_fence_chan *fctx =3D chan->fence;
 +	unsigned old, new;
 +
 +	do {
 +		old =3D fctx->refcnt;
 +		if (old =3D=3D 0) {
 +			spin_lock(&fctx->lock);
 +			if (atomic_dec_uint_nv(&fctx->refcnt) =3D=3D 0)
 +				DRM_SPIN_WAKEUP_ALL(&fctx->waitqueue,
 +				    &fctx->lock);
 +			spin_unlock(&fctx->lock);
 +			return;
 +		}
 +		new =3D old - 1;
 +	} while (atomic_cas_uint(&fctx->refcnt, old, new) !=3D old);
 +}
 +
 +/*
 + * nouveau_fence_signal(fence)
 + *
 + *	Schedule all the work for fence's completion, mark it done, and
 + *	move it from the pending list to the done list.
 + *
 + *	Caller must hold fence's channel's fence lock.
 + */
  static void
  nouveau_fence_signal(struct nouveau_fence *fence)
  {
 +	struct nouveau_channel *chan __diagused =3D fence->channel;
 +	struct nouveau_fence_chan *fctx __diagused =3D chan->fence;
  	struct fence_work *work, *temp;
 =20
 +	BUG_ON(!spin_is_locked(&fctx->lock));
 +	BUG_ON(fence->done);
 +
 +	/* Schedule all the work for this fence.  */
  	list_for_each_entry_safe(work, temp, &fence->work, head) {
  		schedule_work(&work->base);
  		list_del(&work->head);
  	}
 =20
 -	fence->channel =3D NULL;
 -	list_del(&fence->head);
 +	/* Note that the fence is done.  */
 +	fence->done =3D true;
 +
 +	/* Move it from the pending list to the done list.  */
 +	list_move_tail(&fence->head, &fctx->done);
 +}
 +
 +static void
 +nouveau_fence_context_del_xc(void *a, void *b)
 +{
  }
 =20
 +/*
 + * nouveau_fence_context_del(fctx)
 + *
 + *	Artificially complete all fences in fctx, wait for their work
 + *	to drain, and destroy the memory associated with fctx.
 + */
  void
  nouveau_fence_context_del(struct nouveau_fence_chan *fctx)
  {
  	struct nouveau_fence *fence, *fnext;
 +	struct list_head done_list;
 +	int ret __diagused;
 +
 +	INIT_LIST_HEAD(&done_list);
 +
 +	/* Signal all the fences in fctx.  */
  	spin_lock(&fctx->lock);
  	list_for_each_entry_safe(fence, fnext, &fctx->pending, head) {
  		nouveau_fence_signal(fence);
  	}
 +	nouveau_fence_gc_grab(fctx, &done_list);
 +	spin_unlock(&fctx->lock);
 +
 +	/* Release any fences that we signalled.  */
 +	nouveau_fence_gc_free(&done_list);
 +
 +	/* Wait for the workqueue to drain.  */
 +	flush_scheduled_work();
 +
 +	/* Wait for nouveau_fence_channel_acquire to complete on all CPUs.  */
 +	xc_wait(xc_broadcast(0, nouveau_fence_context_del_xc, NULL, NULL));
 +
 +	/* Wait for any references to drain.  */
 +	spin_lock(&fctx->lock);
 +	DRM_SPIN_WAIT_NOINTR_UNTIL(ret, &fctx->waitqueue, &fctx->lock,
 +	    fctx->refcnt =3D=3D 0);
 +	BUG_ON(ret);
  	spin_unlock(&fctx->lock);
 +
 +	/* Make sure there are no more fences on the list.  */
 +	BUG_ON(!list_empty(&fctx->done));
 +	BUG_ON(!list_empty(&fctx->flip));
 +	BUG_ON(!list_empty(&fctx->pending));
 +
 +	/* Destroy the fence context.  */
 +	DRM_DESTROY_WAITQUEUE(&fctx->waitqueue);
  	spin_lock_destroy(&fctx->lock);
  }
 =20
 +/*
 + * nouveau_fence_context_new(fctx)
 + *
 + *	Initialize the state fctx for all fences on a channel.
 + */
  void
  nouveau_fence_context_new(struct nouveau_fence_chan *fctx)
  {
 +
  	INIT_LIST_HEAD(&fctx->flip);
  	INIT_LIST_HEAD(&fctx->pending);
 +	INIT_LIST_HEAD(&fctx->done);
  	spin_lock_init(&fctx->lock);
 +	DRM_INIT_WAITQUEUE(&fctx->waitqueue, "nvfnchan");
 +	fctx->refcnt =3D 0;
  }
 =20
 +/*
 + * nouveau_fence_work_handler(kwork)
 + *
 + *	Work handler for nouveau_fence_work.
 + */
  static void
  nouveau_fence_work_handler(struct work_struct *kwork)
  {
  	struct fence_work *work =3D container_of(kwork, typeof(*work), base);
 +
  	work->func(work->data);
  	kfree(work);
  }
 =20
 +/*
 + * nouveau_fence_work(fence, func, data)
 + *
 + *	Arrange to call func(data) after fence is completed.  If fence
 + *	is already completed, call it immediately.  If memory is
 + *	scarce, synchronously wait for the fence and call it.
 + */
  void
  nouveau_fence_work(struct nouveau_fence *fence,
  		   void (*func)(void *), void *data)
  {
 -	struct nouveau_channel *chan =3D fence->channel;
 +	struct nouveau_channel *chan;
  	struct nouveau_fence_chan *fctx;
  	struct fence_work *work =3D NULL;
 =20
 -	if (nouveau_fence_done(fence)) {
 -		func(data);
 -		return;
 -	}
 -
 +	if ((chan =3D nouveau_fence_channel_acquire(fence)) =3D=3D NULL)
 +		goto now0;
  	fctx =3D chan->fence;
 +
  	work =3D kmalloc(sizeof(*work), GFP_KERNEL);
 -	if (!work) {
 +	if (work =3D=3D NULL) {
  		WARN_ON(nouveau_fence_wait(fence, false, false));
 -		func(data);
 -		return;
 +		goto now1;
  	}
 =20
  	spin_lock(&fctx->lock);
 -	if (!fence->channel) {
 +	if (fence->done) {
  		spin_unlock(&fctx->lock);
 -		kfree(work);
 -		func(data);
 -		return;
 +		goto now2;
  	}
 -
  	INIT_WORK(&work->base, nouveau_fence_work_handler);
  	work->func =3D func;
  	work->data =3D data;
  	list_add(&work->head, &fence->work);
 +	if (atomic_dec_uint_nv(&fctx->refcnt) =3D=3D 0)
 +		DRM_SPIN_WAKEUP_ALL(&fctx->waitqueue, &fctx->lock);
  	spin_unlock(&fctx->lock);
 +	return;
 +
 +now2:	kfree(work);
 +now1:	nouveau_fence_channel_release(chan);
 +now0:	func(data);
  }
 =20
 +/*
 + * nouveau_fence_update(chan)
 + *
 + *	Test all fences on chan for completion.  For any that are
 + *	completed, mark them as such and schedule work for them.
 + *
 + *	Caller must hold chan's fence lock.
 + */
  static void
  nouveau_fence_update(struct nouveau_channel *chan)
  {
  	struct nouveau_fence_chan *fctx =3D chan->fence;
  	struct nouveau_fence *fence, *fnext;
 =20
 -	spin_lock(&fctx->lock);
 +	BUG_ON(!spin_is_locked(&fctx->lock));
  	list_for_each_entry_safe(fence, fnext, &fctx->pending, head) {
  		if (fctx->read(chan) < fence->sequence)
  			break;
 -
  		nouveau_fence_signal(fence);
 -		nouveau_fence_unref(&fence);
  	}
 -	spin_unlock(&fctx->lock);
 +	BUG_ON(!spin_is_locked(&fctx->lock));
  }
 =20
 +/*
 + * nouveau_fence_emit(fence, chan)
 + *
 + *	- Initialize fence.
 + *	- Set its timeout to 15 sec from now.
 + *	- Assign it the next sequence number on channel.
 + *	- Submit it to the device with the device-specific emit routine.
 + *	- If that succeeds, add it to the list of pending fences on chan.
 + */
  int
  nouveau_fence_emit(struct nouveau_fence *fence, struct nouveau_channel *ch=
 an)
  {
 @@ -151,7 +348,9 @@ nouveau_fence_emit(struct nouveau_fence *fence, struct =
 nouveau_channel *chan)
 =20
  	fence->channel  =3D chan;
  	fence->timeout  =3D jiffies + (15 * HZ);
 +	spin_lock(&fctx->lock);
  	fence->sequence =3D ++fctx->sequence;
 +	spin_unlock(&fctx->lock);
 =20
  	ret =3D fctx->emit(fence);
  	if (!ret) {
 @@ -164,77 +363,130 @@ nouveau_fence_emit(struct nouveau_fence *fence, stru=
 ct nouveau_channel *chan)
  	return ret;
  }
 =20
 +/*
 + * nouveau_fence_done_locked(fence, chan)
 + *
 + *	Test whether fence, which must be on chan, is done.  If it is
 + *	not marked as done, poll all fences on chan first.
 + *
 + *	Caller must hold chan's fence lock.
 + */
 +static bool
 +nouveau_fence_done_locked(struct nouveau_fence *fence,
 +    struct nouveau_channel *chan)
 +{
 +	struct nouveau_fence_chan *fctx __diagused =3D chan->fence;
 +
 +	BUG_ON(!spin_is_locked(&fctx->lock));
 +	BUG_ON(fence->channel !=3D chan);
 +
 +	/* If it's not done, poll it for changes.  */
 +	if (!fence->done)
 +		nouveau_fence_update(chan);
 +
 +	/* Check, possibly again, whether it is done now.  */
 +	return fence->done;
 +}
 +
 +/*
 + * nouveau_fence_done(fence)
 + *
 + *	Test whether fence is done.  If it is not marked as done, poll
 + *	all fences on its channel first.  Caller MUST NOT hold the
 + *	fence lock.
 + */
  bool
  nouveau_fence_done(struct nouveau_fence *fence)
  {
 -	if (fence->channel)
 -		nouveau_fence_update(fence->channel);
 -	return !fence->channel;
 +	struct nouveau_channel *chan;
 +	struct nouveau_fence_chan *fctx;
 +	struct list_head done_list;
 +	bool done;
 +
 +	if ((chan =3D nouveau_fence_channel_acquire(fence)) =3D=3D NULL)
 +		return true;
 +
 +	INIT_LIST_HEAD(&done_list);
 +
 +	fctx =3D chan->fence;
 +	spin_lock(&fctx->lock);
 +	done =3D nouveau_fence_done_locked(fence, chan);
 +	nouveau_fence_gc_grab(fctx, &done_list);
 +	spin_unlock(&fctx->lock);
 +
 +	nouveau_fence_channel_release(chan);
 +
 +	nouveau_fence_gc_free(&done_list);
 +
 +	return done;
  }
 =20
 +/*
 + * nouveau_fence_wait_uevent_handler(data, index)
 + *
 + *	Nouveau uevent handler for fence completion.  data is a
 + *	nouveau_fence_chan pointer.  Simply wake up all threads waiting
 + *	for completion of any fences on the channel.  Does not mark
 + *	fences as completed -- threads must poll fences for completion.
 + */
  static int
  nouveau_fence_wait_uevent_handler(void *data, int index)
  {
 -	struct nouveau_fence_priv *priv =3D data;
 -#ifdef __NetBSD__
 -	spin_lock(&priv->waitlock);
 -	/* XXX Set a flag...  */
 -	DRM_SPIN_WAKEUP_ALL(&priv->waitqueue, &priv->waitlock);
 -	spin_unlock(&priv->waitlock);
 -#else
 -	wake_up_all(&priv->waiting);
 -#endif
 +	struct nouveau_fence_chan *fctx =3D data;
 +
 +	spin_lock(&fctx->lock);
 +	DRM_SPIN_WAKEUP_ALL(&fctx->waitqueue, &fctx->lock);
 +	spin_unlock(&fctx->lock);
 +
  	return NVKM_EVENT_KEEP;
  }
 =20
 +/*
 + * nouveau_fence_wait_uevent(fence, chan, intr)
 + *
 + *	Wait using a nouveau event for completion of fence on chan.
 + *	Wait interruptibly iff intr is true.
 + */
  static int
 -nouveau_fence_wait_uevent(struct nouveau_fence *fence, bool intr)
 -
 +nouveau_fence_wait_uevent(struct nouveau_fence *fence,
 +    struct nouveau_channel *chan, bool intr)
  {
 -	struct nouveau_channel *chan =3D fence->channel;
  	struct nouveau_fifo *pfifo =3D nouveau_fifo(chan->drm->device);
 -	struct nouveau_fence_priv *priv =3D chan->drm->fence;
 +	struct nouveau_fence_chan *fctx =3D chan->fence;
  	struct nouveau_eventh *handler;
 +	struct list_head done_list;
  	int ret =3D 0;
 =20
 +	BUG_ON(fence->channel !=3D chan);
 +
  	ret =3D nouveau_event_new(pfifo->uevent, 0,
  				nouveau_fence_wait_uevent_handler,
 -				priv, &handler);
 +				fctx, &handler);
  	if (ret)
  		return ret;
 =20
  	nouveau_event_get(handler);
 =20
 +	INIT_LIST_HEAD(&done_list);
 +
  	if (fence->timeout) {
  		unsigned long timeout =3D fence->timeout - jiffies;
 =20
  		if (time_before(jiffies, fence->timeout)) {
 -#ifdef __NetBSD__
 -			spin_lock(&priv->waitlock);
 +			spin_lock(&fctx->lock);
  			if (intr) {
  				DRM_SPIN_TIMED_WAIT_UNTIL(ret,
 -				    &priv->waitqueue, &priv->waitlock,
 +				    &fctx->waitqueue, &fctx->lock,
  				    timeout,
 -				    nouveau_fence_done(fence));
 +				    nouveau_fence_done_locked(fence, chan));
  			} else {
  				DRM_SPIN_TIMED_WAIT_NOINTR_UNTIL(ret,
 -				    &priv->waitqueue, &priv->waitlock,
 +				    &fctx->waitqueue, &fctx->lock,
  				    timeout,
 -				    nouveau_fence_done(fence));
 -			}
 -			spin_unlock(&priv->waitlock);
 -#else
 -			if (intr) {
 -				ret =3D wait_event_interruptible_timeout(
 -						priv->waiting,
 -						nouveau_fence_done(fence),
 -						timeout);
 -			} else {
 -				ret =3D wait_event_timeout(priv->waiting,
 -						nouveau_fence_done(fence),
 -						timeout);
 +				    nouveau_fence_done_locked(fence, chan));
  			}
 -#endif
 +			nouveau_fence_gc_grab(fctx, &done_list);
 +			spin_unlock(&fctx->lock);
  		}
 =20
  		if (ret >=3D 0) {
 @@ -243,50 +495,53 @@ nouveau_fence_wait_uevent(struct nouveau_fence *fence=
 , bool intr)
  				ret =3D -EBUSY;
  		}
  	} else {
 -#ifdef __NetBSD__
 -		spin_lock(&priv->waitlock);
 -		if (intr) {
 -			DRM_SPIN_WAIT_UNTIL(ret, &priv->waitqueue,
 -			    &priv->waitlock,
 -			    nouveau_fence_done(fence));
 -		} else {
 -			DRM_SPIN_WAIT_NOINTR_UNTIL(ret, &priv->waitqueue,
 -			    &priv->waitlock,
 -			    nouveau_fence_done(fence));
 -		}
 -		spin_unlock(&priv->waitlock);
 -#else
 +		spin_lock(&fctx->lock);
  		if (intr) {
 -			ret =3D wait_event_interruptible(priv->waiting,
 -					nouveau_fence_done(fence));
 +			DRM_SPIN_WAIT_UNTIL(ret, &fctx->waitqueue,
 +			    &fctx->lock,
 +			    nouveau_fence_done_locked(fence, chan));
  		} else {
 -			wait_event(priv->waiting, nouveau_fence_done(fence));
 +			DRM_SPIN_WAIT_NOINTR_UNTIL(ret, &fctx->waitqueue,
 +			    &fctx->lock,
 +			    nouveau_fence_done_locked(fence, chan));
  		}
 -#endif
 +		nouveau_fence_gc_grab(fctx, &done_list);
 +		spin_unlock(&fctx->lock);
  	}
 =20
  	nouveau_event_ref(NULL, &handler);
 +
 +	nouveau_fence_gc_free(&done_list);
 +
  	if (unlikely(ret < 0))
  		return ret;
 =20
  	return 0;
  }
 =20
 +/*
 + * nouveau_fence_wait(fence, lazy, intr)
 + *
 + *	Wait for fence to complete.  Wait interruptibly iff intr is
 + *	true.  If lazy is true, may sleep, either for a single tick or
 + *	for an interrupt; otherwise will busy-wait.
 + */
  int
  nouveau_fence_wait(struct nouveau_fence *fence, bool lazy, bool intr)
  {
 -	struct nouveau_channel *chan =3D fence->channel;
 -	struct nouveau_fence_priv *priv =3D chan ? chan->drm->fence : NULL;
 -#ifndef __NetBSD__
 -	unsigned long sleep_time =3D NSEC_PER_MSEC / 1000;
 -	ktime_t t;
 -#endif
 +	struct nouveau_channel *chan;
 +	struct nouveau_fence_priv *priv;
 +	unsigned long delay_usec =3D 1;
  	int ret =3D 0;
 =20
 +	if ((chan =3D nouveau_fence_channel_acquire(fence)) =3D=3D NULL)
 +		goto out0;
 +
 +	priv =3D chan->drm->fence;
  	while (priv && priv->uevent && lazy && !nouveau_fence_done(fence)) {
 -		ret =3D nouveau_fence_wait_uevent(fence, intr);
 +		ret =3D nouveau_fence_wait_uevent(fence, chan, intr);
  		if (ret < 0)
 -			return ret;
 +			goto out1;
  	}
 =20
  	while (!nouveau_fence_done(fence)) {
 @@ -295,33 +550,19 @@ nouveau_fence_wait(struct nouveau_fence *fence, bool =
 lazy, bool intr)
  			break;
  		}
 =20
 -#ifdef __NetBSD__
 -		if (lazy)
 -			kpause("nvfencep", intr, 1, NULL);
 -		else
 -			DELAY(1);
 -#else
 -		__set_current_state(intr ? TASK_INTERRUPTIBLE :
 -					   TASK_UNINTERRUPTIBLE);
 -		if (lazy) {
 -			t =3D ktime_set(0, sleep_time);
 -			schedule_hrtimeout(&t, HRTIMER_MODE_REL);
 -			sleep_time *=3D 2;
 -			if (sleep_time > NSEC_PER_MSEC)
 -				sleep_time =3D NSEC_PER_MSEC;
 -		}
 -
 -		if (intr && signal_pending(current)) {
 -			ret =3D -ERESTARTSYS;
 -			break;
 +		if (lazy && delay_usec >=3D 1000*hztoms(1)) {
 +			/* XXX errno NetBSD->Linux */
 +			ret =3D -kpause("nvfencew", intr, 1, NULL);
 +			if (ret !=3D -EWOULDBLOCK)
 +				break;
 +		} else {
 +			DELAY(delay_usec);
 +			delay_usec *=3D 2;
  		}
 -#endif
  	}
 =20
 -#ifndef __NetBSD__
 -	__set_current_state(TASK_RUNNING);
 -#endif
 -	return ret;
 +out1:	nouveau_fence_channel_release(chan);
 +out0:	return ret;
  }
 =20
  int
 @@ -331,13 +572,14 @@ nouveau_fence_sync(struct nouveau_fence *fence, struc=
 t nouveau_channel *chan)
  	struct nouveau_channel *prev;
  	int ret =3D 0;
 =20
 -	prev =3D fence ? fence->channel : NULL;
 -	if (prev) {
 +	if (fence !=3D NULL &&
 +	    (prev =3D nouveau_fence_channel_acquire(fence)) !=3D NULL) {
  		if (unlikely(prev !=3D chan && !nouveau_fence_done(fence))) {
  			ret =3D fctx->sync(fence, prev, chan);
  			if (unlikely(ret))
  				ret =3D nouveau_fence_wait(fence, true, false);
  		}
 +		nouveau_fence_channel_release(prev);
  	}
 =20
  	return ret;
 @@ -347,12 +589,14 @@ static void
  nouveau_fence_del(struct kref *kref)
  {
  	struct nouveau_fence *fence =3D container_of(kref, typeof(*fence), kref);
 +
  	kfree(fence);
  }
 =20
  void
  nouveau_fence_unref(struct nouveau_fence **pfence)
  {
 +
  	if (*pfence)
  		kref_put(&(*pfence)->kref, nouveau_fence_del);
  	*pfence =3D NULL;
 @@ -361,6 +605,7 @@ nouveau_fence_unref(struct nouveau_fence **pfence)
  struct nouveau_fence *
  nouveau_fence_ref(struct nouveau_fence *fence)
  {
 +
  	if (fence)
  		kref_get(&fence->kref);
  	return fence;
 @@ -382,6 +627,7 @@ nouveau_fence_new(struct nouveau_channel *chan, bool sy=
 smem,
 =20
  	INIT_LIST_HEAD(&fence->work);
  	fence->sysmem =3D sysmem;
 +	fence->done =3D false;
  	kref_init(&fence->kref);
 =20
  	ret =3D nouveau_fence_emit(fence, chan);
 diff --git a/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.h b/sys/e=
 xternal/bsd/drm2/dist/drm/nouveau/nouveau_fence.h
 index f6f12ba1f38f..a0c32455bd55 100644
 --- a/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.h
 +++ b/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.h
 @@ -9,6 +9,7 @@ struct nouveau_fence {
  	struct kref kref;
 =20
  	bool sysmem;
 +	bool done;
 =20
  	struct nouveau_channel *channel;
  	unsigned long timeout;
 @@ -27,9 +28,15 @@ void nouveau_fence_work(struct nouveau_fence *, void (*)=
 (void *), void *);
  int  nouveau_fence_wait(struct nouveau_fence *, bool lazy, bool intr);
  int  nouveau_fence_sync(struct nouveau_fence *, struct nouveau_channel *);
 =20
 +/*
 + * struct nouveau_fence_chan:
 + *
 + *	State common to all fences in a single nouveau_channel.
 + */
  struct nouveau_fence_chan {
  	struct list_head pending;
  	struct list_head flip;
 +	struct list_head done;
 =20
  	int  (*emit)(struct nouveau_fence *);
  	int  (*sync)(struct nouveau_fence *, struct nouveau_channel *,
 @@ -39,9 +46,16 @@ struct nouveau_fence_chan {
  	int  (*sync32)(struct nouveau_channel *, u64, u32);
 =20
  	spinlock_t lock;
 +	drm_waitqueue_t waitqueue;
 +	volatile unsigned refcnt;
  	u32 sequence;
  };
 =20
 +/*
 + * struct nouveau_fence_priv:
 + *
 + *	Device-specific operations on fences.
 + */
  struct nouveau_fence_priv {
  	void (*dtor)(struct nouveau_drm *);
  	bool (*suspend)(struct nouveau_drm *);
 @@ -49,12 +63,6 @@ struct nouveau_fence_priv {
  	int  (*context_new)(struct nouveau_channel *);
  	void (*context_del)(struct nouveau_channel *);
 =20
 -#ifdef __NetBSD__
 -	spinlock_t waitlock;
 -	drm_waitqueue_t waitqueue;
 -#else
 -	wait_queue_head_t waiting;
 -#endif
  	bool uevent;
  };
 =20
 diff --git a/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_nv84_fence.c b/=
 sys/external/bsd/drm2/dist/drm/nouveau/nouveau_nv84_fence.c
 index 0bf784f0f11b..d4e6b8fa9992 100644
 --- a/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_nv84_fence.c
 +++ b/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_nv84_fence.c
 @@ -216,11 +216,6 @@ nv84_fence_destroy(struct nouveau_drm *drm)
  {
  	struct nv84_fence_priv *priv =3D drm->fence;
 =20
 -#ifdef __NetBSD__
 -	spin_lock_destroy(&priv->base.waitlock);
 -	DRM_DESTROY_WAITQUEUE(&priv->base.waitqueue);
 -#endif
 -
  	nouveau_bo_unmap(priv->bo_gart);
  	if (priv->bo_gart)
  		nouveau_bo_unpin(priv->bo_gart);
 @@ -250,12 +245,6 @@ nv84_fence_create(struct nouveau_drm *drm)
  	priv->base.context_new =3D nv84_fence_context_new;
  	priv->base.context_del =3D nv84_fence_context_del;
 =20
 -#ifdef __NetBSD__
 -	spin_lock_init(&priv->base.waitlock);
 -	DRM_INIT_WAITQUEUE(&priv->base.waitqueue, "nvfenceq");
 -#else
 -	init_waitqueue_head(&priv->base.waiting);
 -#endif
  	priv->base.uevent =3D true;
 =20
  	ret =3D nouveau_bo_new(drm->dev, 16 * (pfifo->max + 1), 0,

 --=_73WqrbnRAUdWsUzKH5e7RXeLfJD5xyxZ--

From: Greg Oster <oster@netbsd.org>
To: Taylor R Campbell <campbell@mumble.net>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53441: nouveau panic in 8.0_RC2 amd64
Date: Thu, 16 Aug 2018 07:28:41 -0600

 On Thu, 16 Aug 2018 05:37:54 +0000
 Taylor R Campbell <campbell@mumble.net> wrote:

 > Please revert the previous patch, and try the attached patch instead.

 re-patched, and running with new kernel.  Will let you know how this
 one goes...

 Thanks!

 Later...

 Greg Oster

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53441 CVS commit: src/sys/external/bsd/drm2/dist/drm/nouveau
Date: Thu, 23 Aug 2018 01:06:51 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Thu Aug 23 01:06:51 UTC 2018

 Modified Files:
 	src/sys/external/bsd/drm2/dist/drm/nouveau: nouveau_fence.c
 	    nouveau_fence.h nouveau_nv84_fence.c

 Log Message:
 Rewrite nouveau_fence in an attempt to make it make sense.

 PR kern/53441

 XXX pullup-7
 XXX pullup-8


 To generate a diff of this commit:
 cvs rdiff -u -r1.4 -r1.5 \
     src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c
 cvs rdiff -u -r1.2 -r1.3 \
     src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.h \
     src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_nv84_fence.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53441 CVS commit: src/sys/external/bsd/drm2/dist/drm/nouveau
Date: Thu, 23 Aug 2018 01:10:04 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Thu Aug 23 01:10:04 UTC 2018

 Modified Files:
 	src/sys/external/bsd/drm2/dist/drm/nouveau: nouveau_fence.c
 	    nouveau_fence.h

 Log Message:
 Fences may last longer than their channels.

 - Use a reference count on the nouveau_fence_chan object.
 - Acquire it with kpreemption disabled.
 - Use xcall to wait for kpreempt-disabled sections to complete.

 PR kern/53441

 XXX pullup-7
 XXX pullup-8


 To generate a diff of this commit:
 cvs rdiff -u -r1.5 -r1.6 \
     src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c
 cvs rdiff -u -r1.3 -r1.4 \
     src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53441 CVS commit: src/sys/external/bsd/drm2/dist/drm/nouveau
Date: Thu, 23 Aug 2018 01:10:21 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Thu Aug 23 01:10:21 UTC 2018

 Modified Files:
 	src/sys/external/bsd/drm2/dist/drm/nouveau: nouveau_fence.c
 	    nouveau_fence.h

 Log Message:
 Defer nouveau_fence_unref until spin unlock.

 - kfree while holding a spin lock is not a good idea.
 - Make sure we GC every time we might signal fences.

 PR kern/53441

 XXX pullup-7
 XXX pullup-8


 To generate a diff of this commit:
 cvs rdiff -u -r1.6 -r1.7 \
     src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c
 cvs rdiff -u -r1.4 -r1.5 \
     src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.h

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53441 CVS commit: src/sys/external/bsd/drm2/dist/drm/nouveau
Date: Thu, 23 Aug 2018 01:10:29 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Thu Aug 23 01:10:28 UTC 2018

 Modified Files:
 	src/sys/external/bsd/drm2/dist/drm/nouveau: nouveau_fence.c

 Log Message:
 Attempt to make sense of return values of nouveau_fence_wait.

 PR kern/53441

 XXX pullup-7
 XXX pullup-8


 To generate a diff of this commit:
 cvs rdiff -u -r1.7 -r1.8 \
     src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53441 CVS commit: src/sys/external/bsd/drm2/dist/drm/nouveau
Date: Thu, 23 Aug 2018 01:10:36 +0000

 Module Name:	src
 Committed By:	riastradh
 Date:		Thu Aug 23 01:10:36 UTC 2018

 Modified Files:
 	src/sys/external/bsd/drm2/dist/drm/nouveau: nouveau_fence.c

 Log Message:
 Fix edge case of reference counting, oops.

 PR kern/53441

 XXX pullup-7
 XXX pullup-8


 To generate a diff of this commit:
 cvs rdiff -u -r1.8 -r1.9 \
     src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/53441 CVS commit: [netbsd-8] src/sys/external/bsd/drm2/dist/drm/nouveau
Date: Fri, 31 Aug 2018 17:35:51 +0000

 Module Name:	src
 Committed By:	martin
 Date:		Fri Aug 31 17:35:51 UTC 2018

 Modified Files:
 	src/sys/external/bsd/drm2/dist/drm/nouveau [netbsd-8]: nouveau_fence.c
 	    nouveau_fence.h nouveau_nv84_fence.c

 Log Message:
 Pull up following revision(s) (requested by riastradh in ticket #996):

 	sys/external/bsd/drm2/dist/drm/nouveau/nouveau_nv84_fence.c: revision 1.3
 	sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.h: revision 1.3
 	sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.h: revision 1.4
 	sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.h: revision 1.5
 	sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c: revision 1.5
 	sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c: revision 1.6
 	sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c: revision 1.7
 	sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c: revision 1.8
 	sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c: revision 1.9

 Rewrite nouveau_fence in an attempt to make it make sense.
 PR kern/53441
 XXX pullup-7
 XXX pullup-8

 Fences may last longer than their channels.
 - Use a reference count on the nouveau_fence_chan object.
 - Acquire it with kpreemption disabled.
 - Use xcall to wait for kpreempt-disabled sections to complete.
 PR kern/53441
 XXX pullup-7
 XXX pullup-8

 Defer nouveau_fence_unref until spin unlock.
 - kfree while holding a spin lock is not a good idea.
 - Make sure we GC every time we might signal fences.
 PR kern/53441
 XXX pullup-7
 XXX pullup-8

 Attempt to make sense of return values of nouveau_fence_wait.
 PR kern/53441
 XXX pullup-7
 XXX pullup-8

 Fix edge case of reference counting, oops.
 PR kern/53441
 XXX pullup-7
 XXX pullup-8


 To generate a diff of this commit:
 cvs rdiff -u -r1.4 -r1.4.10.1 \
     src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c
 cvs rdiff -u -r1.2 -r1.2.24.1 \
     src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.h
 cvs rdiff -u -r1.2 -r1.2.10.1 \
     src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_nv84_fence.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->feedback
State-Changed-By: maya@NetBSD.org
State-Changed-When: Sun, 23 Sep 2018 14:07:05 +0000
State-Changed-Why:
Setting a reminder to see if there are panics after a few days/weeks. (I don't think you provided feedback after the new commits)


From: Greg Oster <oster@netbsd.org>
To: maya@NetBSD.org
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
 netbsd-bugs@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/53441 (nouveau panic in 8.0_RC2 amd64)
Date: Sun, 23 Sep 2018 13:36:56 -0600

 On Sun, 23 Sep 2018 14:07:06 +0000 (UTC)
 maya@NetBSD.org wrote:

 > Synopsis: nouveau panic in 8.0_RC2 amd64
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: maya@NetBSD.org
 > State-Changed-When: Sun, 23 Sep 2018 14:07:05 +0000
 > State-Changed-Why:
 > Setting a reminder to see if there are panics after a few days/weeks.
 > (I don't think you provided feedback after the new commits)

 I have had zero panics since the new commits, and since things were
 pulled up to 8.0. 

 Can call this one done.

 Thanks!

 Later...

 Greg Oster

State-Changed-From-To: feedback->closed
State-Changed-By: maya@NetBSD.org
State-Changed-When: Sun, 23 Sep 2018 19:44:49 +0000
State-Changed-Why:
Reported fixed, thanks riastradh, thanks for the report, go.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.