NetBSD Problem Report #53441
From oster@fween.ca Tue Jul 10 16:11:38 2018
Return-Path: <oster@fween.ca>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 2A2F67A159
for <gnats-bugs@gnats.NetBSD.org>; Tue, 10 Jul 2018 16:11:38 +0000 (UTC)
Message-Id: <20180710145412.0934952D379@thog.fween.ca>
Date: Tue, 10 Jul 2018 08:54:12 -0600 (CST)
From: oster@netbsd.org
Reply-To: oster@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: nouveau panic in 8.0_RC2 amd64
X-Send-Pr-Version: 3.95
>Number: 53441
>Category: kern
>Synopsis: nouveau panic in 8.0_RC2 amd64
>Confidential: no
>Severity: critical
>Priority: high
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Tue Jul 10 16:15:00 +0000 2018
>Closed-Date: Sun Sep 23 19:44:49 +0000 2018
>Last-Modified: Sun Sep 23 19:44:49 +0000 2018
>Originator: Greg Oster
>Release: NetBSD 8.0_RC2
>Organization:
>Environment:
System: NetBSD thog 8.0_RC2 NetBSD 8.0_RC2 (THOG.gdb) #0: Fri Jun 29 15:10:23 CST 2018 oster@thog:/u1/builds/build183/src/obj/amd64/u1/builds/build183/src/sys/arch/amd64/compile/THOG.gdb amd64
Architecture: x86_64
Machine: amd64
>Description:
The nouveau driver occasionally panics for no good reason. It can panic
when X11 is being used, and it can panic when no-one is on the console.
Panic looks like:
uvm_fault(0xffffffff819b7d80, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip 0xffffffff8114d302 cs 0x8 rflags 0x10282 cr2 0x70 ilevel 0x8 rsp 0xffff80013ce5bdd0
curlwp 0xfffffe843b5a0080 pid 0.16 lowest kstack 0xffff80013ce592c0
panic: trap
cpu2: Begin traceback...
vpanic() at netbsd:vpanic+0x219
vpanic() at netbsd:vpanic
trap() at netbsd:trap+0x2b9
--- trap (number 6) ---
nouveau_fence_update() at netbsd:nouveau_fence_update+0x10
nouveau_fence_done() at netbsd:nouveau_fence_done+0x29
nouveau_bo_fence_signalled() at netbsd:nouveau_bo_fence_signalled+0x18
ttm_bo_wait() at netbsd:ttm_bo_wait+0x90
ttm_bo_cleanup_refs_and_unlock() at netbsd:ttm_bo_cleanup_refs_and_unlock+0x66
ttm_bo_delayed_delete() at netbsd:ttm_bo_delayed_delete+0x175
ttm_bo_delayed_workqueue() at netbsd:ttm_bo_delayed_workqueue+0x2b
linux_worker() at netbsd:linux_worker+0xf9
workqueue_runlist() at netbsd:workqueue_runlist+0x59
workqueue_worker() at netbsd:workqueue_worker+0xb1
cpu2: End traceback...
uvm_fault(0xfffffe842f5fd5c0, 0x0, 2) -> e
fatal page fault in supervisor mode
dumping to dev 0,1 (offset=8425399, size=4189705):
trap type 6 code 0x2 rip 0xffffffff80cb5d7b cs 0x8 rflags 0x10296 cr2 0x84 ilevel 0x8 rsp 0xffff800d1u4m2p4 b2b90
curlwp 0xfffffe8403f36120 pid 885.2 lowest kstack 0xffff8001424b02c0
coretemp0: workqueue busy: updates stopped
coretemp1: workqueue busy: updates stopped
coretemp2: workqueue busy: updates stopped
coretemp3: workqueue busy: updates stopped
>How-To-Repeat:
Run the nouveau driver on NetBSD-8.0_RC2/amd64 using a NVIDIA GeForce GT 420:
...
pci1 at ppb0 bus 1
pci1: i/o space, memory space enabled, rd/line, wr/inv ok
nouveau0 at pci1 dev 0 function 0: vendor 10de product 0de2 (rev. 0xa1)
drm kern info: nouveau [ DEVICE][nouveau0] BOOT0 : 0x0c1100a1
drm kern info: nouveau [ DEVICE][nouveau0] Chipset: GF108 (NVC1)
drm kern info: nouveau [ DEVICE][nouveau0] Family : NVC0
drm kern info: nouveau [ VBIOS][nouveau0] checking PRAMIN for image...
drm kern info: nouveau [ VBIOS][nouveau0] ... appears to be valid
drm kern info: nouveau [ VBIOS][nouveau0] using image from PRAMIN
drm kern info: nouveau [ VBIOS][nouveau0] BIT signature found
drm kern info: nouveau [ VBIOS][nouveau0] version 70.08.1f.00.0c
nouveau0: interrupting at ioapic0 pin 16 (nouveau)
drm kern warning: nouveau W[ PFB][nouveau0][0x00000000][0xfffffe811d51b808] reclocking of this ram type unsupported
drm kern info: nouveau [ PFB][nouveau0] RAM type: DDR3
drm kern info: nouveau [ PFB][nouveau0] RAM size: 512 MiB
drm kern info: nouveau [ PFB][nouveau0] ZCOMP: 0 tags
drm kern info: nouveau [ VOLT][nouveau0] GPU voltage: 900000uv
drm kern info: nouveau [ PTHERM][nouveau0] FAN control: PWM
drm kern info: nouveau [ PTHERM][nouveau0] fan management: automatic
drm kern info: nouveau [ PTHERM][nouveau0] internal sensor: yes
drm kern info: nouveau [ CLK][nouveau0] 03: core 50 MHz memory 135 MHz
drm kern info: nouveau [ CLK][nouveau0] 07: core 405 MHz memory 324 MHz
drm kern info: nouveau [ CLK][nouveau0] 0f: core 700 MHz memory 800 MHz
drm kern info: nouveau [ CLK][nouveau0] --: core 405 MHz memory 324 MHz
Zone kernel: Available graphics memory: 5504634 kiB
Zone dma32: Available graphics memory: 2097152 kiB
drm kern info: nouveau [ DRM] VRAM: 512 MiB
drm kern info: nouveau [ DRM] GART: 1048576 MiB
drm kern info: nouveau [ DRM] TMDS table version 2.0
drm kern info: nouveau [ DRM] DCB version 4.0
drm kern info: nouveau [ DRM] DCB outp 00: 01800302 00020030
drm kern info: nouveau [ DRM] DCB outp 01: 02000300 00000000
drm kern info: nouveau [ DRM] DCB outp 02: 08811392 00020020
drm kern info: nouveau [ DRM] DCB outp 03: 04822310 00000000
drm kern info: nouveau [ DRM] DCB conn 00: 00001030
drm kern info: nouveau [ DRM] DCB conn 01: 00002161
drm kern info: nouveau [ DRM] DCB conn 02: 00000200
drm: Supports vblank timestamp caching Rev 2 (21.10.2013).
drm: Driver supports precise vblank timestamp query.
drm kern info: nouveau [ DRM] MM: using COPY0 for buffer copies
nouveaufb0 at nouveau0
nouveau0: info: registered panic notifier
nouveaufb0: framebuffer at 0xffff8001400b4000, size 1920x1200, depth 32, stride 7680
...
and then wait for the boom. The panic may happen in hours or days.
>Fix:
Please. I have a kernel with full debug symbols and a couple of crash dumps
related to this if someone wants additional information from them.
>Release-Note:
>Audit-Trail:
From: coypu@sdf.org
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/53441: nouveau panic in 8.0_RC2 amd64
Date: Tue, 10 Jul 2018 18:05:22 +0000
Addendum: note linux doesn't destroy spin locks, so we'll need some
logic to guard this possibly.
From: Greg Oster <oster@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/53441: nouveau panic in 8.0_RC2 amd64
Date: Fri, 3 Aug 2018 17:40:41 -0600
On Tue, 10 Jul 2018 16:15:00 +0000 (UTC)
oster@netbsd.org wrote:
> >Number: 53441
> >Category: kern
> >Synopsis: nouveau panic in 8.0_RC2 amd64
> >Confidential: no
> >Severity: critical
> >Priority: high
> >Responsible: kern-bug-people
> >State: open
> >Class: sw-bug
> >Submitter-Id: net
> >Arrival-Date: Tue Jul 10 16:15:00 +0000 2018
> >Originator: Greg Oster
> >Release: NetBSD 8.0_RC2
> >Organization:
> >Environment:
> System: NetBSD thog 8.0_RC2 NetBSD 8.0_RC2 (THOG.gdb) #0: Fri Jun 29
> 15:10:23 CST 2018
> oster@thog:/u1/builds/build183/src/obj/amd64/u1/builds/build183/src/sys/arch/amd64/compile/THOG.gdb
> amd64 Architecture: x86_64 Machine: amd64
> >Description:
>
> The nouveau driver occasionally panics for no good reason. It can
> panic when X11 is being used, and it can panic when no-one is on the
> console.
>
> Panic looks like:
>
> uvm_fault(0xffffffff819b7d80, 0x0, 1) -> e
> fatal page fault in supervisor mode
> trap type 6 code 0 rip 0xffffffff8114d302 cs 0x8 rflags 0x10282 cr2
> 0x70 ilevel 0x8 rsp 0xffff80013ce5bdd0 curlwp 0xfffffe843b5a0080 pid
> 0.16 lowest kstack 0xffff80013ce592c0 panic: trap
> cpu2: Begin traceback...
> vpanic() at netbsd:vpanic+0x219
> vpanic() at netbsd:vpanic
> trap() at netbsd:trap+0x2b9
> --- trap (number 6) ---
> nouveau_fence_update() at netbsd:nouveau_fence_update+0x10
> nouveau_fence_done() at netbsd:nouveau_fence_done+0x29
> nouveau_bo_fence_signalled() at netbsd:nouveau_bo_fence_signalled+0x18
> ttm_bo_wait() at netbsd:ttm_bo_wait+0x90
> ttm_bo_cleanup_refs_and_unlock() at
> netbsd:ttm_bo_cleanup_refs_and_unlock+0x66 ttm_bo_delayed_delete() at
> netbsd:ttm_bo_delayed_delete+0x175 ttm_bo_delayed_workqueue() at
> netbsd:ttm_bo_delayed_workqueue+0x2b linux_worker() at
> netbsd:linux_worker+0xf9 workqueue_runlist() at
> netbsd:workqueue_runlist+0x59 workqueue_worker() at
> netbsd:workqueue_worker+0xb1 cpu2: End traceback...
> uvm_fault(0xfffffe842f5fd5c0, 0x0, 2) -> e
>
> fatal page fault in supervisor mode
> dumping to dev 0,1 (offset=8425399, size=4189705):
> trap type 6 code 0x2 rip 0xffffffff80cb5d7b cs 0x8 rflags 0x10296 cr2
> 0x84 ilevel 0x8 rsp 0xffff800d1u4m2p4 b2b90 curlwp 0xfffffe8403f36120
> pid 885.2 lowest kstack 0xffff8001424b02c0 coretemp0: workqueue busy:
> updates stopped coretemp1: workqueue busy: updates stopped
> coretemp2: workqueue busy: updates stopped
> coretemp3: workqueue busy: updates stopped
>
>
>
> >How-To-Repeat:
>
> Run the nouveau driver on NetBSD-8.0_RC2/amd64 using a NVIDIA GeForce
> GT 420: ...
> pci1 at ppb0 bus 1
> pci1: i/o space, memory space enabled, rd/line, wr/inv ok
> nouveau0 at pci1 dev 0 function 0: vendor 10de product 0de2 (rev.
> 0xa1) drm kern info: nouveau [ DEVICE][nouveau0] BOOT0 : 0x0c1100a1
> drm kern info: nouveau [ DEVICE][nouveau0] Chipset: GF108 (NVC1)
> drm kern info: nouveau [ DEVICE][nouveau0] Family : NVC0
> drm kern info: nouveau [ VBIOS][nouveau0] checking PRAMIN for
> image... drm kern info: nouveau [ VBIOS][nouveau0] ... appears to
> be valid drm kern info: nouveau [ VBIOS][nouveau0] using image
> from PRAMIN drm kern info: nouveau [ VBIOS][nouveau0] BIT
> signature found drm kern info: nouveau [ VBIOS][nouveau0] version
> 70.08.1f.00.0c nouveau0: interrupting at ioapic0 pin 16 (nouveau)
> drm kern warning: nouveau
> W[ PFB][nouveau0][0x00000000][0xfffffe811d51b808] reclocking of
> this ram type unsupported drm kern info: nouveau
> [ PFB][nouveau0] RAM type: DDR3 drm kern info: nouveau
> [ PFB][nouveau0] RAM size: 512 MiB drm kern info: nouveau
> [ PFB][nouveau0] ZCOMP: 0 tags drm kern info: nouveau
> [ VOLT][nouveau0] GPU voltage: 900000uv drm kern info: nouveau
> [ PTHERM][nouveau0] FAN control: PWM drm kern info: nouveau
> [ PTHERM][nouveau0] fan management: automatic drm kern info:
> nouveau [ PTHERM][nouveau0] internal sensor: yes drm kern info:
> nouveau [ CLK][nouveau0] 03: core 50 MHz memory 135 MHz drm kern
> info: nouveau [ CLK][nouveau0] 07: core 405 MHz memory 324 MHz
> drm kern info: nouveau [ CLK][nouveau0] 0f: core 700 MHz memory
> 800 MHz drm kern info: nouveau [ CLK][nouveau0] --: core 405 MHz
> memory 324 MHz Zone kernel: Available graphics memory: 5504634 kiB
> Zone dma32: Available graphics memory: 2097152 kiB drm kern info:
> nouveau [ DRM] VRAM: 512 MiB drm kern info: nouveau [ DRM]
> GART: 1048576 MiB drm kern info: nouveau [ DRM] TMDS table
> version 2.0 drm kern info: nouveau [ DRM] DCB version 4.0 drm
> kern info: nouveau [ DRM] DCB outp 00: 01800302 00020030 drm
> kern info: nouveau [ DRM] DCB outp 01: 02000300 00000000 drm
> kern info: nouveau [ DRM] DCB outp 02: 08811392 00020020 drm
> kern info: nouveau [ DRM] DCB outp 03: 04822310 00000000 drm
> kern info: nouveau [ DRM] DCB conn 00: 00001030 drm kern info:
> nouveau [ DRM] DCB conn 01: 00002161 drm kern info: nouveau
> [ DRM] DCB conn 02: 00000200 drm: Supports vblank timestamp
> caching Rev 2 (21.10.2013). drm: Driver supports precise vblank
> timestamp query. drm kern info: nouveau [ DRM] MM: using COPY0
> for buffer copies nouveaufb0 at nouveau0
> nouveau0: info: registered panic notifier
> nouveaufb0: framebuffer at 0xffff8001400b4000, size 1920x1200, depth
> 32, stride 7680 ...
>
>
> and then wait for the boom. The panic may happen in hours or days.
>
>
> >Fix:
> Please. I have a kernel with full debug symbols and a couple of
> crash dumps related to this if someone wants additional information
> from them.
Traceback from gdb kernel:
(gdb) bt
#0 cpu_reboot (howto=260, bootstr=0x0)
at /u1/builds/build185/src/sys/arch/amd64/amd64/machdep.c:710
#1 0xffffffff80ceece2 in vpanic (fmt=0xffffffff81207070 "trap",
ap=0xffff80013ce5bbb8)
at /u1/builds/build185/src/sys/kern/subr_prf.c:342 #2
0xffffffff80ceeaba in panic (fmt=0xffffffff81207070 "trap")
at /u1/builds/build185/src/sys/kern/subr_prf.c:258 #3
0xffffffff80228bfd in trap (frame=0xffff80013ce5bce0)
at /u1/builds/build185/src/sys/arch/amd64/amd64/trap.c:336 #4
0xffffffff8021f61f in alltraps () #5 0xffffffff8114d577 in
nouveau_fence_update (chan=0x0)
at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c:132
#6 0xffffffff8114d72d in nouveau_fence_done (fence=0xfffffe834add5c48)
at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c:171
#7 0xffffffff811419f5 in nouveau_bo_fence_signalled
( sync_obj=0xfffffe834add5c48)
at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_bo.c:1566
#8 0xffffffff8119841a in ttm_bo_wait (bo=0xfffffe82f9fc0408,
lazy=false, interruptible=false, no_wait=true)
at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:1671
#9 0xffffffff81195d15 in ttm_bo_cleanup_refs_and_unlock
( bo=0xfffffe82f9fc0408, interruptible=false, no_wait_gpu=true)
at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:516
#10 0xffffffff81196108 in ttm_bo_delayed_delete
(bdev=0xfffffe811d500160, remove_all=false)
at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:621
#11 0xffffffff811961da in ttm_bo_delayed_workqueue
(work=0xfffffe811d500520)
at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:650
#12 0xffffffff80abf6a9 in linux_worker (wk=0xfffffe811d500520,
arg=0xfffffe843e620f80)
at /u1/builds/build185/src/sys/external/bsd/common/linux/linux_work.c:505
#13 0xffffffff80cf85ef in workqueue_runlist (wq=0xfffffe843b5b7d00,
list=0xfffffe843b5b7d70)
at /u1/builds/build185/src/sys/kern/subr_workqueue.c:106 #14
0xffffffff80cf86b2 in workqueue_worker (cookie=0xfffffe843b5b7d00)
at /u1/builds/build185/src/sys/kern/subr_workqueue.c:133 #15
0xffffffff80208747 in lwp_trampoline () #16 0x0000000000000000 in ?? ()
(gdb)
...
(gdb) list
166
167 bool
168 nouveau_fence_done(struct nouveau_fence *fence)
169 {
170 if (fence->channel)
171 nouveau_fence_update(fence->channel);
172 return !fence->channel;
173 }
174
175 static int
(gdb) down
#5 0xffffffff8114d577 in nouveau_fence_update (chan=0x0)
at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c:132
132 struct nouveau_fence_chan *fctx = chan->fence;
(gdb) list
127 }
128
129 static void
130 nouveau_fence_update(struct nouveau_channel *chan)
131 {
132 struct nouveau_fence_chan *fctx = chan->fence;
133 struct nouveau_fence *fence, *fnext;
134
135 spin_lock(&fctx->lock);
136 list_for_each_entry_safe(fence, fnext, &fctx->pending,
head) {
(gdb) print chan
$11 = (struct nouveau_channel *) 0x0
(gdb)
"huh?"
We just checked fence->channel for non-zero before the call to
nouveau_fence_update(), and now it's suddenly zero? Methinks there
are some locking issues happening here if the rug is getting pulled
out that fast! Also: are there other uses of fence->channel where it
could suddenly change from something to 0 and cause issues?
(the machine worked fine for 8 days before this panic...)
Later...
Greg Oster
From: Greg Oster <oster@netbsd.org>
To: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, oster@netbsd.org
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53441: nouveau panic in 8.0_RC2 amd64
Date: Fri, 3 Aug 2018 20:53:16 -0600
On Fri, 3 Aug 2018 23:45:01 +0000 (UTC)
Greg Oster <oster@netbsd.org> wrote:
> The following reply was made to PR kern/53441; it has been noted by
> GNATS.
>
> From: Greg Oster <oster@netbsd.org>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: kern/53441: nouveau panic in 8.0_RC2 amd64
> Date: Fri, 3 Aug 2018 17:40:41 -0600
>
> On Tue, 10 Jul 2018 16:15:00 +0000 (UTC)
> oster@netbsd.org wrote:
>
> > >Number: 53441
> > >Category: kern
> > >Synopsis: nouveau panic in 8.0_RC2 amd64
> > >Confidential: no
> > >Severity: critical
> > >Priority: high
> > >Responsible: kern-bug-people
> > >State: open
> > >Class: sw-bug
> > >Submitter-Id: net
> > >Arrival-Date: Tue Jul 10 16:15:00 +0000 2018
> > >Originator: Greg Oster
> > >Release: NetBSD 8.0_RC2
> > >Organization:
> > >Environment:
> > System: NetBSD thog 8.0_RC2 NetBSD 8.0_RC2 (THOG.gdb) #0: Fri Jun
> > 29 15:10:23 CST 2018
> > oster@thog:/u1/builds/build183/src/obj/amd64/u1/builds/build183/src/sys/arch/amd64/compile/THOG.gdb
> > amd64 Architecture: x86_64 Machine: amd64
> > >Description:
> >
> > The nouveau driver occasionally panics for no good reason. It can
> > panic when X11 is being used, and it can panic when no-one is on
> > the console.
> >
> > Panic looks like:
> >
> > uvm_fault(0xffffffff819b7d80, 0x0, 1) -> e
> > fatal page fault in supervisor mode
> > trap type 6 code 0 rip 0xffffffff8114d302 cs 0x8 rflags 0x10282 cr2
> > 0x70 ilevel 0x8 rsp 0xffff80013ce5bdd0 curlwp 0xfffffe843b5a0080
> > pid 0.16 lowest kstack 0xffff80013ce592c0 panic: trap
> > cpu2: Begin traceback...
> > vpanic() at netbsd:vpanic+0x219
> > vpanic() at netbsd:vpanic
> > trap() at netbsd:trap+0x2b9
> > --- trap (number 6) ---
> > nouveau_fence_update() at netbsd:nouveau_fence_update+0x10
> > nouveau_fence_done() at netbsd:nouveau_fence_done+0x29
> > nouveau_bo_fence_signalled() at
> > netbsd:nouveau_bo_fence_signalled+0x18 ttm_bo_wait() at
> > netbsd:ttm_bo_wait+0x90 ttm_bo_cleanup_refs_and_unlock() at
> > netbsd:ttm_bo_cleanup_refs_and_unlock+0x66 ttm_bo_delayed_delete()
> > at netbsd:ttm_bo_delayed_delete+0x175 ttm_bo_delayed_workqueue() at
> > netbsd:ttm_bo_delayed_workqueue+0x2b linux_worker() at
> > netbsd:linux_worker+0xf9 workqueue_runlist() at
> > netbsd:workqueue_runlist+0x59 workqueue_worker() at
> > netbsd:workqueue_worker+0xb1 cpu2: End traceback...
> > uvm_fault(0xfffffe842f5fd5c0, 0x0, 2) -> e
> >
> > fatal page fault in supervisor mode
> > dumping to dev 0,1 (offset=8425399, size=4189705):
> > trap type 6 code 0x2 rip 0xffffffff80cb5d7b cs 0x8 rflags 0x10296
> > cr2 0x84 ilevel 0x8 rsp 0xffff800d1u4m2p4 b2b90 curlwp
> > 0xfffffe8403f36120 pid 885.2 lowest kstack 0xffff8001424b02c0
> > coretemp0: workqueue busy: updates stopped coretemp1: workqueue
> > busy: updates stopped coretemp2: workqueue busy: updates stopped
> > coretemp3: workqueue busy: updates stopped
> >
> >
> >
> > >How-To-Repeat:
> >
> > Run the nouveau driver on NetBSD-8.0_RC2/amd64 using a NVIDIA
> > GeForce GT 420: ...
> > pci1 at ppb0 bus 1
> > pci1: i/o space, memory space enabled, rd/line, wr/inv ok
> > nouveau0 at pci1 dev 0 function 0: vendor 10de product 0de2 (rev.
> > 0xa1) drm kern info: nouveau [ DEVICE][nouveau0] BOOT0 :
> > 0x0c1100a1 drm kern info: nouveau [ DEVICE][nouveau0] Chipset:
> > GF108 (NVC1) drm kern info: nouveau [ DEVICE][nouveau0] Family :
> > NVC0 drm kern info: nouveau [ VBIOS][nouveau0] checking PRAMIN
> > for image... drm kern info: nouveau [ VBIOS][nouveau0] ...
> > appears to be valid drm kern info: nouveau [ VBIOS][nouveau0]
> > using image from PRAMIN drm kern info: nouveau
> > [ VBIOS][nouveau0] BIT signature found drm kern info: nouveau
> > [ VBIOS][nouveau0] version 70.08.1f.00.0c nouveau0: interrupting
> > at ioapic0 pin 16 (nouveau) drm kern warning: nouveau
> > W[ PFB][nouveau0][0x00000000][0xfffffe811d51b808] reclocking of
> > this ram type unsupported drm kern info: nouveau
> > [ PFB][nouveau0] RAM type: DDR3 drm kern info: nouveau
> > [ PFB][nouveau0] RAM size: 512 MiB drm kern info: nouveau
> > [ PFB][nouveau0] ZCOMP: 0 tags drm kern info: nouveau
> > [ VOLT][nouveau0] GPU voltage: 900000uv drm kern info: nouveau
> > [ PTHERM][nouveau0] FAN control: PWM drm kern info: nouveau
> > [ PTHERM][nouveau0] fan management: automatic drm kern info:
> > nouveau [ PTHERM][nouveau0] internal sensor: yes drm kern info:
> > nouveau [ CLK][nouveau0] 03: core 50 MHz memory 135 MHz drm
> > kern info: nouveau [ CLK][nouveau0] 07: core 405 MHz memory
> > 324 MHz drm kern info: nouveau [ CLK][nouveau0] 0f: core 700
> > MHz memory 800 MHz drm kern info: nouveau [ CLK][nouveau0]
> > --: core 405 MHz memory 324 MHz Zone kernel: Available graphics
> > memory: 5504634 kiB Zone dma32: Available graphics memory:
> > 2097152 kiB drm kern info: nouveau [ DRM] VRAM: 512 MiB drm
> > kern info: nouveau [ DRM] GART: 1048576 MiB drm kern info:
> > nouveau [ DRM] TMDS table version 2.0 drm kern info: nouveau
> > [ DRM] DCB version 4.0 drm kern info: nouveau [ DRM] DCB
> > outp 00: 01800302 00020030 drm kern info: nouveau [ DRM] DCB
> > outp 01: 02000300 00000000 drm kern info: nouveau [ DRM] DCB
> > outp 02: 08811392 00020020 drm kern info: nouveau [ DRM] DCB
> > outp 03: 04822310 00000000 drm kern info: nouveau [ DRM] DCB
> > conn 00: 00001030 drm kern info: nouveau [ DRM] DCB conn 01:
> > 00002161 drm kern info: nouveau [ DRM] DCB conn 02: 00000200
> > drm: Supports vblank timestamp caching Rev 2 (21.10.2013). drm:
> > Driver supports precise vblank timestamp query. drm kern info:
> > nouveau [ DRM] MM: using COPY0 for buffer copies nouveaufb0
> > at nouveau0 nouveau0: info: registered panic notifier
> > nouveaufb0: framebuffer at 0xffff8001400b4000, size 1920x1200,
> > depth 32, stride 7680 ...
> >
> >
> > and then wait for the boom. The panic may happen in hours or days.
> >
> >
> > >Fix:
> > Please. I have a kernel with full debug symbols and a couple of
> > crash dumps related to this if someone wants additional information
> > from them.
>
> Traceback from gdb kernel:
>
> (gdb) bt
> #0 cpu_reboot (howto=260, bootstr=0x0)
> at /u1/builds/build185/src/sys/arch/amd64/amd64/machdep.c:710
> #1 0xffffffff80ceece2 in vpanic (fmt=0xffffffff81207070 "trap",
> ap=0xffff80013ce5bbb8)
> at /u1/builds/build185/src/sys/kern/subr_prf.c:342 #2
> 0xffffffff80ceeaba in panic (fmt=0xffffffff81207070 "trap")
> at /u1/builds/build185/src/sys/kern/subr_prf.c:258 #3
> 0xffffffff80228bfd in trap (frame=0xffff80013ce5bce0)
> at /u1/builds/build185/src/sys/arch/amd64/amd64/trap.c:336 #4
> 0xffffffff8021f61f in alltraps () #5 0xffffffff8114d577 in
> nouveau_fence_update (chan=0x0)
> at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c:132
> #6 0xffffffff8114d72d in nouveau_fence_done
> (fence=0xfffffe834add5c48)
> at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c:171
> #7 0xffffffff811419f5 in nouveau_bo_fence_signalled
> ( sync_obj=0xfffffe834add5c48)
> at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_bo.c:1566
> #8 0xffffffff8119841a in ttm_bo_wait (bo=0xfffffe82f9fc0408,
> lazy=false, interruptible=false, no_wait=true)
> at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:1671
> #9 0xffffffff81195d15 in ttm_bo_cleanup_refs_and_unlock
> ( bo=0xfffffe82f9fc0408, interruptible=false, no_wait_gpu=true)
> at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:516
> #10 0xffffffff81196108 in ttm_bo_delayed_delete
> (bdev=0xfffffe811d500160, remove_all=false)
> at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:621
> #11 0xffffffff811961da in ttm_bo_delayed_workqueue
> (work=0xfffffe811d500520)
> at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/ttm/ttm_bo.c:650
> #12 0xffffffff80abf6a9 in linux_worker (wk=0xfffffe811d500520,
> arg=0xfffffe843e620f80)
> at /u1/builds/build185/src/sys/external/bsd/common/linux/linux_work.c:505
> #13 0xffffffff80cf85ef in workqueue_runlist (wq=0xfffffe843b5b7d00,
> list=0xfffffe843b5b7d70)
> at /u1/builds/build185/src/sys/kern/subr_workqueue.c:106 #14
> 0xffffffff80cf86b2 in workqueue_worker (cookie=0xfffffe843b5b7d00)
> at /u1/builds/build185/src/sys/kern/subr_workqueue.c:133 #15
> 0xffffffff80208747 in lwp_trampoline () #16 0x0000000000000000 in ??
> () (gdb) ...
> (gdb) list
> 166
> 167 bool
> 168 nouveau_fence_done(struct nouveau_fence *fence)
> 169 {
> 170 if (fence->channel)
> 171 nouveau_fence_update(fence->channel);
> 172 return !fence->channel;
> 173 }
> 174
> 175 static int
> (gdb) down
> #5 0xffffffff8114d577 in nouveau_fence_update (chan=0x0)
> at /u1/builds/build185/src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c:132
> 132 struct nouveau_fence_chan *fctx = chan->fence;
> (gdb) list
> 127 }
> 128
> 129 static void
> 130 nouveau_fence_update(struct nouveau_channel *chan)
> 131 {
> 132 struct nouveau_fence_chan *fctx = chan->fence;
> 133 struct nouveau_fence *fence, *fnext;
> 134
> 135 spin_lock(&fctx->lock);
> 136 list_for_each_entry_safe(fence, fnext,
> &fctx->pending, head) {
> (gdb) print chan
> $11 = (struct nouveau_channel *) 0x0
> (gdb)
>
> "huh?"
>
> We just checked fence->channel for non-zero before the call to
> nouveau_fence_update(), and now it's suddenly zero? Methinks there
> are some locking issues happening here if the rug is getting pulled
> out that fast! Also: are there other uses of fence->channel where it
> could suddenly change from something to 0 and cause issues?
>
> (the machine worked fine for 8 days before this panic...)
>
> Later...
>
> Greg Oster
>
Just fell over again.. so twice now today. Seems there are (at least)
two different failure modes - one where I can get a kernel trace, and
one where it's a fast trip to reboot....
uvm_fault(0xffffffff819b7d80, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip 0xffffffff8114d577 cs 0x8 rflags 0x10282 cr2
0x70 ilevel 0x8 rsp 0xffff80013ce5bdd0 curlwp 0xfffffe843b5a0080 pid
0.16 lowest kstack 0xffff80013ce592c0 panic: trap
cpu1: Begin traceback...
vpanic() at netbsd:vpanic+0x219
vpanic() at netbsd:vpanic
trap() at netbsd:trap+0x2b9
--- trap (number 6) ---
nouveau_fence_update() at netbsd:nouveau_fence_update+0x10
nouveau_fence_done() at netbsd:nouveau_fence_done+0x29
nouveau_bo_fence_signalled() at netbsd:nouveau_bo_fence_signalled+0x18
ttm_bo_wait() at netbsd:ttm_bo_wait+0x90
ttm_bo_cleanup_refs_and_unlock() at
netbsd:ttm_bo_cleanup_refs_and_unlock+0x66 ttm_bo_delayed_delete() at
netbsd:ttm_bo_delayed_delete+0x175 ttm_bo_delayed_workqueue() at
netbsd:ttm_bo_delayed_workqueue+0x2b linux_worker() at
netbsd:linux_worker+0xf9 workqueue_runlist() at
netbsd:workqueue_runlist+0x59 workqueue_worker() at
netbsd:workqueue_worker+0xb1 cpu1: End traceback...
Later...
Greg Oster
--
Later...
Greg Oster
From: Taylor R Campbell <campbell@mumble.net>
To: Greg Oster <oster@NetBSD.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53441: nouveau panic in 8.0_RC2 amd64
Date: Fri, 10 Aug 2018 23:26:31 +0000
This is a multi-part message in MIME format.
--=_Nwac1jXA/0rVUrtExBwAgxmAQFvib7Kf
Please try the attached patch and let me know if it helps.
--=_Nwac1jXA/0rVUrtExBwAgxmAQFvib7Kf
Content-Type: text/plain; charset="ISO-8859-1"; name="53441"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="53441.patch"
From ef5781c793d890187ac766847a0228296c8edc2c Mon Sep 17 00:00:00 2001
From: Taylor R Campbell <riastradh@NetBSD.org>
Date: Fri, 10 Aug 2018 22:29:26 +0000
Subject: [PATCH] Attempt to sort out race between nouveau_fence_done/signal.
---
.../bsd/drm2/dist/drm/nouveau/nouveau_fence.c | 28 ++++++++++++++++++=
----
1 file changed, 23 insertions(+), 5 deletions(-)
diff --git a/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c b/sys/e=
xternal/bsd/drm2/dist/drm/nouveau/nouveau_fence.c
index 2a83285e07da..49bdb96273dd 100644
--- a/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c
+++ b/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c
@@ -126,21 +126,27 @@ nouveau_fence_work(struct nouveau_fence *fence,
spin_unlock(&fctx->lock);
}
=20
-static void
-nouveau_fence_update(struct nouveau_channel *chan)
+static bool
+nouveau_fence_update(struct nouveau_channel *chan,
+ struct nouveau_fence *fence0)
{
struct nouveau_fence_chan *fctx =3D chan->fence;
struct nouveau_fence *fence, *fnext;
+ bool signalled =3D false; /* Did we signal fence0? */
=20
spin_lock(&fctx->lock);
list_for_each_entry_safe(fence, fnext, &fctx->pending, head) {
if (fctx->read(chan) < fence->sequence)
break;
=20
+ if (fence =3D=3D fence0)
+ signalled =3D true;
nouveau_fence_signal(fence);
nouveau_fence_unref(&fence);
}
spin_unlock(&fctx->lock);
+
+ return signalled;
}
=20
int
@@ -167,9 +173,21 @@ nouveau_fence_emit(struct nouveau_fence *fence, struct=
nouveau_channel *chan)
bool
nouveau_fence_done(struct nouveau_fence *fence)
{
- if (fence->channel)
- nouveau_fence_update(fence->channel);
- return !fence->channel;
+ struct nouveau_channel *chan;
+
+ /*
+ * The lock under which the fence transitions from the
+ * not-signalled state to the signalled state is stored in the
+ * channel. The way the signalled state is indicated is by
+ * nulling the pointer to the channel. Soooo... We load the
+ * channel pointer once, and hope that the reference counting
+ * mechanism for fences keeps the fence from dying too.
+ */
+ chan =3D fence->channel;
+ __insn_barrier();
+ if (chan)
+ return nouveau_fence_update(chan, fence);
+ return !chan;
}
=20
static int
--=20
2.11.0
--=_Nwac1jXA/0rVUrtExBwAgxmAQFvib7Kf--
From: Greg Oster <oster@netbsd.org>
To: Taylor R Campbell <campbell@mumble.net>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53441: nouveau panic in 8.0_RC2 amd64
Date: Fri, 10 Aug 2018 21:05:16 -0600
On Fri, 10 Aug 2018 23:26:31 +0000
Taylor R Campbell <campbell@mumble.net> wrote:
> Please try the attached patch and let me know if it helps.
Now running a kernel with the patch. Will let you know what
happens.
Thanks!
Later...
Greg Oster
From: Greg Oster <oster@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/53441: nouveau panic in 8.0_RC2 amd64
Date: Sat, 11 Aug 2018 15:08:44 -0600
On Sat, 11 Aug 2018 03:10:01 +0000 (UTC)
Greg Oster <oster@netbsd.org> wrote:
> The following reply was made to PR kern/53441; it has been noted by
> GNATS.
>
> From: Greg Oster <oster@netbsd.org>
> To: Taylor R Campbell <campbell@mumble.net>
> Cc: gnats-bugs@NetBSD.org
> Subject: Re: kern/53441: nouveau panic in 8.0_RC2 amd64
> Date: Fri, 10 Aug 2018 21:05:16 -0600
>
> On Fri, 10 Aug 2018 23:26:31 +0000
> Taylor R Campbell <campbell@mumble.net> wrote:
>
> > Please try the attached patch and let me know if it helps.
>
> Now running a kernel with the patch. Will let you know what
> happens.
Keeled over again early this afternoon (I wasn't even sitting at the
machine at the time).
uvm_fault(0xfffffe8433abd8c0, 0x0, 1) -> e
fatal page fault in supervisor mode
trap type 6 code 0 rip 0xffffffff8114d7d5 cs 0x8 rflags 0x13282 cr2 0x8
ilevel 0 rsp 0xffff800141d3ac20 curlwp 0xfffffe843792d260 pid 161.1
lowest kstack 0xffff800141d382c0 panic: trap
cpu0: Begin traceback...
vpanic() at netbsd:vpanic+0x219
vpanic() at netbsd:vpanic
trap() at netbsd:trap+0x2b9
--- trap (number 6) ---
nouveau_fence_wait_uevent() at netbsd:nouveau_fence_wait_uevent+0x21
nouveau_fence_wait() at netbsd:nouveau_fence_wait+0x5e
nouveau_bo_fence_wait() at netbsd:nouveau_bo_fence_wait+0x2c
ttm_bo_wait() at netbsd:ttm_bo_wait+0x1a0
nouveau_gem_ioctl_cpu_prep() at netbsd:nouveau_gem_ioctl_cpu_prep+0xa2
drm_ioctl() at netbsd:drm_ioctl+0x248
sys_ioctl() at netbsd:sys_ioctl+0x4eb
sy_call() at netbsd:sy_call+-0x2918aa
sy_invoke() at netbsd:sy_invoke+0xd5
syscall() at netbsd:syscall+0xff
--- syscall (number 54) ---
73cc0a0fedfa:
cpu0: End traceback...
I have a crash dump if details from there would be interesting.
Thanks.
Later...
Greg Oster
From: Taylor R Campbell <campbell@mumble.net>
To: Greg Oster <oster@NetBSD.org>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53441: nouveau panic in 8.0_RC2 amd64
Date: Thu, 16 Aug 2018 05:37:54 +0000
This is a multi-part message in MIME format.
--=_73WqrbnRAUdWsUzKH5e7RXeLfJD5xyxZ
Please revert the previous patch, and try the attached patch instead.
--=_73WqrbnRAUdWsUzKH5e7RXeLfJD5xyxZ
Content-Type: text/plain; charset="ISO-8859-1"; name="53441-v2"
Content-Transfer-Encoding: quoted-printable
Content-Disposition: attachment; filename="53441-v2.patch"
diff --git a/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c b/sys/e=
xternal/bsd/drm2/dist/drm/nouveau/nouveau_fence.c
index 2a83285e07da..da0864f2f13d 100644
--- a/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c
+++ b/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c
@@ -29,6 +29,9 @@
#include <sys/cdefs.h>
__KERNEL_RCSID(0, "$NetBSD: nouveau_fence.c,v 1.4 2016/04/13 07:57:15 rias=
tradh Exp $");
=20
+#include <sys/types.h>
+#include <sys/xcall.h>
+
#include <drm/drmP.h>
=20
#include <asm/param.h>
@@ -41,6 +44,12 @@ __KERNEL_RCSID(0, "$NetBSD: nouveau_fence.c,v 1.4 2016/0=
4/13 07:57:15 riastradh
=20
#include <engine/fifo.h>
=20
+/*
+ * struct fence_work
+ *
+ * State for a work action scheduled when a fence is completed.
+ * Will call func(data) at some point after that happens.
+ */
struct fence_work {
struct work_struct base;
struct list_head head;
@@ -48,101 +57,289 @@ struct fence_work {
void *data;
};
=20
+/*
+ * nouveau_fence_channel_acquire(fence)
+ *
+ * Try to return the channel associated with fence.
+ */
+static struct nouveau_channel *
+nouveau_fence_channel_acquire(struct nouveau_fence *fence)
+{
+ struct nouveau_channel *chan;
+ struct nouveau_fence_chan *fctx;
+
+ /*
+ * Block cross-calls while we examine fence. If we observe
+ * that fence->done is false, then the channel cannot be
+ * destroyed even by another CPU until after kpreempt_enable.
+ */
+ kpreempt_disable();
+ if (fence->done) {
+ chan =3D NULL;
+ } else {
+ chan =3D fence->channel;
+ fctx =3D chan->fence;
+ atomic_inc_uint(&fctx->refcnt);
+ }
+ kpreempt_enable();
+
+ return chan;
+}
+
+/*
+ * nouveau_fence_gc_grab(fctx, list)
+ *
+ * Move all of channel's done fences to list.
+ *
+ * Caller must hold channel's fence lock.
+ */
+static void
+nouveau_fence_gc_grab(struct nouveau_fence_chan *fctx, struct list_head *l=
ist)
+{
+ struct list_head *node, *next;
+
+ BUG_ON(!spin_is_locked(&fctx->lock));
+
+ list_for_each_safe(node, next, &fctx->done) {
+ list_move_tail(node, list);
+ }
+}
+
+/*
+ * nouveau_fence_gc_free(list)
+ *
+ * Unreference all of the fences in the list.
+ *
+ * Caller MUST NOT hold the fences' channel's fence lock.
+ */
+static void
+nouveau_fence_gc_free(struct list_head *list)
+{
+ struct nouveau_fence *fence, *next;
+
+ list_for_each_entry_safe(fence, next, list, head) {
+ list_del(&fence->head);
+ nouveau_fence_unref(&fence);
+ }
+}
+
+/*
+ * nouveau_fence_channel_release(channel)
+ *
+ * Release the channel acquired with nouveau_fence_channel_acquire.
+ */
+static void
+nouveau_fence_channel_release(struct nouveau_channel *chan)
+{
+ struct nouveau_fence_chan *fctx =3D chan->fence;
+ unsigned old, new;
+
+ do {
+ old =3D fctx->refcnt;
+ if (old =3D=3D 0) {
+ spin_lock(&fctx->lock);
+ if (atomic_dec_uint_nv(&fctx->refcnt) =3D=3D 0)
+ DRM_SPIN_WAKEUP_ALL(&fctx->waitqueue,
+ &fctx->lock);
+ spin_unlock(&fctx->lock);
+ return;
+ }
+ new =3D old - 1;
+ } while (atomic_cas_uint(&fctx->refcnt, old, new) !=3D old);
+}
+
+/*
+ * nouveau_fence_signal(fence)
+ *
+ * Schedule all the work for fence's completion, mark it done, and
+ * move it from the pending list to the done list.
+ *
+ * Caller must hold fence's channel's fence lock.
+ */
static void
nouveau_fence_signal(struct nouveau_fence *fence)
{
+ struct nouveau_channel *chan __diagused =3D fence->channel;
+ struct nouveau_fence_chan *fctx __diagused =3D chan->fence;
struct fence_work *work, *temp;
=20
+ BUG_ON(!spin_is_locked(&fctx->lock));
+ BUG_ON(fence->done);
+
+ /* Schedule all the work for this fence. */
list_for_each_entry_safe(work, temp, &fence->work, head) {
schedule_work(&work->base);
list_del(&work->head);
}
=20
- fence->channel =3D NULL;
- list_del(&fence->head);
+ /* Note that the fence is done. */
+ fence->done =3D true;
+
+ /* Move it from the pending list to the done list. */
+ list_move_tail(&fence->head, &fctx->done);
+}
+
+static void
+nouveau_fence_context_del_xc(void *a, void *b)
+{
}
=20
+/*
+ * nouveau_fence_context_del(fctx)
+ *
+ * Artificially complete all fences in fctx, wait for their work
+ * to drain, and destroy the memory associated with fctx.
+ */
void
nouveau_fence_context_del(struct nouveau_fence_chan *fctx)
{
struct nouveau_fence *fence, *fnext;
+ struct list_head done_list;
+ int ret __diagused;
+
+ INIT_LIST_HEAD(&done_list);
+
+ /* Signal all the fences in fctx. */
spin_lock(&fctx->lock);
list_for_each_entry_safe(fence, fnext, &fctx->pending, head) {
nouveau_fence_signal(fence);
}
+ nouveau_fence_gc_grab(fctx, &done_list);
+ spin_unlock(&fctx->lock);
+
+ /* Release any fences that we signalled. */
+ nouveau_fence_gc_free(&done_list);
+
+ /* Wait for the workqueue to drain. */
+ flush_scheduled_work();
+
+ /* Wait for nouveau_fence_channel_acquire to complete on all CPUs. */
+ xc_wait(xc_broadcast(0, nouveau_fence_context_del_xc, NULL, NULL));
+
+ /* Wait for any references to drain. */
+ spin_lock(&fctx->lock);
+ DRM_SPIN_WAIT_NOINTR_UNTIL(ret, &fctx->waitqueue, &fctx->lock,
+ fctx->refcnt =3D=3D 0);
+ BUG_ON(ret);
spin_unlock(&fctx->lock);
+
+ /* Make sure there are no more fences on the list. */
+ BUG_ON(!list_empty(&fctx->done));
+ BUG_ON(!list_empty(&fctx->flip));
+ BUG_ON(!list_empty(&fctx->pending));
+
+ /* Destroy the fence context. */
+ DRM_DESTROY_WAITQUEUE(&fctx->waitqueue);
spin_lock_destroy(&fctx->lock);
}
=20
+/*
+ * nouveau_fence_context_new(fctx)
+ *
+ * Initialize the state fctx for all fences on a channel.
+ */
void
nouveau_fence_context_new(struct nouveau_fence_chan *fctx)
{
+
INIT_LIST_HEAD(&fctx->flip);
INIT_LIST_HEAD(&fctx->pending);
+ INIT_LIST_HEAD(&fctx->done);
spin_lock_init(&fctx->lock);
+ DRM_INIT_WAITQUEUE(&fctx->waitqueue, "nvfnchan");
+ fctx->refcnt =3D 0;
}
=20
+/*
+ * nouveau_fence_work_handler(kwork)
+ *
+ * Work handler for nouveau_fence_work.
+ */
static void
nouveau_fence_work_handler(struct work_struct *kwork)
{
struct fence_work *work =3D container_of(kwork, typeof(*work), base);
+
work->func(work->data);
kfree(work);
}
=20
+/*
+ * nouveau_fence_work(fence, func, data)
+ *
+ * Arrange to call func(data) after fence is completed. If fence
+ * is already completed, call it immediately. If memory is
+ * scarce, synchronously wait for the fence and call it.
+ */
void
nouveau_fence_work(struct nouveau_fence *fence,
void (*func)(void *), void *data)
{
- struct nouveau_channel *chan =3D fence->channel;
+ struct nouveau_channel *chan;
struct nouveau_fence_chan *fctx;
struct fence_work *work =3D NULL;
=20
- if (nouveau_fence_done(fence)) {
- func(data);
- return;
- }
-
+ if ((chan =3D nouveau_fence_channel_acquire(fence)) =3D=3D NULL)
+ goto now0;
fctx =3D chan->fence;
+
work =3D kmalloc(sizeof(*work), GFP_KERNEL);
- if (!work) {
+ if (work =3D=3D NULL) {
WARN_ON(nouveau_fence_wait(fence, false, false));
- func(data);
- return;
+ goto now1;
}
=20
spin_lock(&fctx->lock);
- if (!fence->channel) {
+ if (fence->done) {
spin_unlock(&fctx->lock);
- kfree(work);
- func(data);
- return;
+ goto now2;
}
-
INIT_WORK(&work->base, nouveau_fence_work_handler);
work->func =3D func;
work->data =3D data;
list_add(&work->head, &fence->work);
+ if (atomic_dec_uint_nv(&fctx->refcnt) =3D=3D 0)
+ DRM_SPIN_WAKEUP_ALL(&fctx->waitqueue, &fctx->lock);
spin_unlock(&fctx->lock);
+ return;
+
+now2: kfree(work);
+now1: nouveau_fence_channel_release(chan);
+now0: func(data);
}
=20
+/*
+ * nouveau_fence_update(chan)
+ *
+ * Test all fences on chan for completion. For any that are
+ * completed, mark them as such and schedule work for them.
+ *
+ * Caller must hold chan's fence lock.
+ */
static void
nouveau_fence_update(struct nouveau_channel *chan)
{
struct nouveau_fence_chan *fctx =3D chan->fence;
struct nouveau_fence *fence, *fnext;
=20
- spin_lock(&fctx->lock);
+ BUG_ON(!spin_is_locked(&fctx->lock));
list_for_each_entry_safe(fence, fnext, &fctx->pending, head) {
if (fctx->read(chan) < fence->sequence)
break;
-
nouveau_fence_signal(fence);
- nouveau_fence_unref(&fence);
}
- spin_unlock(&fctx->lock);
+ BUG_ON(!spin_is_locked(&fctx->lock));
}
=20
+/*
+ * nouveau_fence_emit(fence, chan)
+ *
+ * - Initialize fence.
+ * - Set its timeout to 15 sec from now.
+ * - Assign it the next sequence number on channel.
+ * - Submit it to the device with the device-specific emit routine.
+ * - If that succeeds, add it to the list of pending fences on chan.
+ */
int
nouveau_fence_emit(struct nouveau_fence *fence, struct nouveau_channel *ch=
an)
{
@@ -151,7 +348,9 @@ nouveau_fence_emit(struct nouveau_fence *fence, struct =
nouveau_channel *chan)
=20
fence->channel =3D chan;
fence->timeout =3D jiffies + (15 * HZ);
+ spin_lock(&fctx->lock);
fence->sequence =3D ++fctx->sequence;
+ spin_unlock(&fctx->lock);
=20
ret =3D fctx->emit(fence);
if (!ret) {
@@ -164,77 +363,130 @@ nouveau_fence_emit(struct nouveau_fence *fence, stru=
ct nouveau_channel *chan)
return ret;
}
=20
+/*
+ * nouveau_fence_done_locked(fence, chan)
+ *
+ * Test whether fence, which must be on chan, is done. If it is
+ * not marked as done, poll all fences on chan first.
+ *
+ * Caller must hold chan's fence lock.
+ */
+static bool
+nouveau_fence_done_locked(struct nouveau_fence *fence,
+ struct nouveau_channel *chan)
+{
+ struct nouveau_fence_chan *fctx __diagused =3D chan->fence;
+
+ BUG_ON(!spin_is_locked(&fctx->lock));
+ BUG_ON(fence->channel !=3D chan);
+
+ /* If it's not done, poll it for changes. */
+ if (!fence->done)
+ nouveau_fence_update(chan);
+
+ /* Check, possibly again, whether it is done now. */
+ return fence->done;
+}
+
+/*
+ * nouveau_fence_done(fence)
+ *
+ * Test whether fence is done. If it is not marked as done, poll
+ * all fences on its channel first. Caller MUST NOT hold the
+ * fence lock.
+ */
bool
nouveau_fence_done(struct nouveau_fence *fence)
{
- if (fence->channel)
- nouveau_fence_update(fence->channel);
- return !fence->channel;
+ struct nouveau_channel *chan;
+ struct nouveau_fence_chan *fctx;
+ struct list_head done_list;
+ bool done;
+
+ if ((chan =3D nouveau_fence_channel_acquire(fence)) =3D=3D NULL)
+ return true;
+
+ INIT_LIST_HEAD(&done_list);
+
+ fctx =3D chan->fence;
+ spin_lock(&fctx->lock);
+ done =3D nouveau_fence_done_locked(fence, chan);
+ nouveau_fence_gc_grab(fctx, &done_list);
+ spin_unlock(&fctx->lock);
+
+ nouveau_fence_channel_release(chan);
+
+ nouveau_fence_gc_free(&done_list);
+
+ return done;
}
=20
+/*
+ * nouveau_fence_wait_uevent_handler(data, index)
+ *
+ * Nouveau uevent handler for fence completion. data is a
+ * nouveau_fence_chan pointer. Simply wake up all threads waiting
+ * for completion of any fences on the channel. Does not mark
+ * fences as completed -- threads must poll fences for completion.
+ */
static int
nouveau_fence_wait_uevent_handler(void *data, int index)
{
- struct nouveau_fence_priv *priv =3D data;
-#ifdef __NetBSD__
- spin_lock(&priv->waitlock);
- /* XXX Set a flag... */
- DRM_SPIN_WAKEUP_ALL(&priv->waitqueue, &priv->waitlock);
- spin_unlock(&priv->waitlock);
-#else
- wake_up_all(&priv->waiting);
-#endif
+ struct nouveau_fence_chan *fctx =3D data;
+
+ spin_lock(&fctx->lock);
+ DRM_SPIN_WAKEUP_ALL(&fctx->waitqueue, &fctx->lock);
+ spin_unlock(&fctx->lock);
+
return NVKM_EVENT_KEEP;
}
=20
+/*
+ * nouveau_fence_wait_uevent(fence, chan, intr)
+ *
+ * Wait using a nouveau event for completion of fence on chan.
+ * Wait interruptibly iff intr is true.
+ */
static int
-nouveau_fence_wait_uevent(struct nouveau_fence *fence, bool intr)
-
+nouveau_fence_wait_uevent(struct nouveau_fence *fence,
+ struct nouveau_channel *chan, bool intr)
{
- struct nouveau_channel *chan =3D fence->channel;
struct nouveau_fifo *pfifo =3D nouveau_fifo(chan->drm->device);
- struct nouveau_fence_priv *priv =3D chan->drm->fence;
+ struct nouveau_fence_chan *fctx =3D chan->fence;
struct nouveau_eventh *handler;
+ struct list_head done_list;
int ret =3D 0;
=20
+ BUG_ON(fence->channel !=3D chan);
+
ret =3D nouveau_event_new(pfifo->uevent, 0,
nouveau_fence_wait_uevent_handler,
- priv, &handler);
+ fctx, &handler);
if (ret)
return ret;
=20
nouveau_event_get(handler);
=20
+ INIT_LIST_HEAD(&done_list);
+
if (fence->timeout) {
unsigned long timeout =3D fence->timeout - jiffies;
=20
if (time_before(jiffies, fence->timeout)) {
-#ifdef __NetBSD__
- spin_lock(&priv->waitlock);
+ spin_lock(&fctx->lock);
if (intr) {
DRM_SPIN_TIMED_WAIT_UNTIL(ret,
- &priv->waitqueue, &priv->waitlock,
+ &fctx->waitqueue, &fctx->lock,
timeout,
- nouveau_fence_done(fence));
+ nouveau_fence_done_locked(fence, chan));
} else {
DRM_SPIN_TIMED_WAIT_NOINTR_UNTIL(ret,
- &priv->waitqueue, &priv->waitlock,
+ &fctx->waitqueue, &fctx->lock,
timeout,
- nouveau_fence_done(fence));
- }
- spin_unlock(&priv->waitlock);
-#else
- if (intr) {
- ret =3D wait_event_interruptible_timeout(
- priv->waiting,
- nouveau_fence_done(fence),
- timeout);
- } else {
- ret =3D wait_event_timeout(priv->waiting,
- nouveau_fence_done(fence),
- timeout);
+ nouveau_fence_done_locked(fence, chan));
}
-#endif
+ nouveau_fence_gc_grab(fctx, &done_list);
+ spin_unlock(&fctx->lock);
}
=20
if (ret >=3D 0) {
@@ -243,50 +495,53 @@ nouveau_fence_wait_uevent(struct nouveau_fence *fence=
, bool intr)
ret =3D -EBUSY;
}
} else {
-#ifdef __NetBSD__
- spin_lock(&priv->waitlock);
- if (intr) {
- DRM_SPIN_WAIT_UNTIL(ret, &priv->waitqueue,
- &priv->waitlock,
- nouveau_fence_done(fence));
- } else {
- DRM_SPIN_WAIT_NOINTR_UNTIL(ret, &priv->waitqueue,
- &priv->waitlock,
- nouveau_fence_done(fence));
- }
- spin_unlock(&priv->waitlock);
-#else
+ spin_lock(&fctx->lock);
if (intr) {
- ret =3D wait_event_interruptible(priv->waiting,
- nouveau_fence_done(fence));
+ DRM_SPIN_WAIT_UNTIL(ret, &fctx->waitqueue,
+ &fctx->lock,
+ nouveau_fence_done_locked(fence, chan));
} else {
- wait_event(priv->waiting, nouveau_fence_done(fence));
+ DRM_SPIN_WAIT_NOINTR_UNTIL(ret, &fctx->waitqueue,
+ &fctx->lock,
+ nouveau_fence_done_locked(fence, chan));
}
-#endif
+ nouveau_fence_gc_grab(fctx, &done_list);
+ spin_unlock(&fctx->lock);
}
=20
nouveau_event_ref(NULL, &handler);
+
+ nouveau_fence_gc_free(&done_list);
+
if (unlikely(ret < 0))
return ret;
=20
return 0;
}
=20
+/*
+ * nouveau_fence_wait(fence, lazy, intr)
+ *
+ * Wait for fence to complete. Wait interruptibly iff intr is
+ * true. If lazy is true, may sleep, either for a single tick or
+ * for an interrupt; otherwise will busy-wait.
+ */
int
nouveau_fence_wait(struct nouveau_fence *fence, bool lazy, bool intr)
{
- struct nouveau_channel *chan =3D fence->channel;
- struct nouveau_fence_priv *priv =3D chan ? chan->drm->fence : NULL;
-#ifndef __NetBSD__
- unsigned long sleep_time =3D NSEC_PER_MSEC / 1000;
- ktime_t t;
-#endif
+ struct nouveau_channel *chan;
+ struct nouveau_fence_priv *priv;
+ unsigned long delay_usec =3D 1;
int ret =3D 0;
=20
+ if ((chan =3D nouveau_fence_channel_acquire(fence)) =3D=3D NULL)
+ goto out0;
+
+ priv =3D chan->drm->fence;
while (priv && priv->uevent && lazy && !nouveau_fence_done(fence)) {
- ret =3D nouveau_fence_wait_uevent(fence, intr);
+ ret =3D nouveau_fence_wait_uevent(fence, chan, intr);
if (ret < 0)
- return ret;
+ goto out1;
}
=20
while (!nouveau_fence_done(fence)) {
@@ -295,33 +550,19 @@ nouveau_fence_wait(struct nouveau_fence *fence, bool =
lazy, bool intr)
break;
}
=20
-#ifdef __NetBSD__
- if (lazy)
- kpause("nvfencep", intr, 1, NULL);
- else
- DELAY(1);
-#else
- __set_current_state(intr ? TASK_INTERRUPTIBLE :
- TASK_UNINTERRUPTIBLE);
- if (lazy) {
- t =3D ktime_set(0, sleep_time);
- schedule_hrtimeout(&t, HRTIMER_MODE_REL);
- sleep_time *=3D 2;
- if (sleep_time > NSEC_PER_MSEC)
- sleep_time =3D NSEC_PER_MSEC;
- }
-
- if (intr && signal_pending(current)) {
- ret =3D -ERESTARTSYS;
- break;
+ if (lazy && delay_usec >=3D 1000*hztoms(1)) {
+ /* XXX errno NetBSD->Linux */
+ ret =3D -kpause("nvfencew", intr, 1, NULL);
+ if (ret !=3D -EWOULDBLOCK)
+ break;
+ } else {
+ DELAY(delay_usec);
+ delay_usec *=3D 2;
}
-#endif
}
=20
-#ifndef __NetBSD__
- __set_current_state(TASK_RUNNING);
-#endif
- return ret;
+out1: nouveau_fence_channel_release(chan);
+out0: return ret;
}
=20
int
@@ -331,13 +572,14 @@ nouveau_fence_sync(struct nouveau_fence *fence, struc=
t nouveau_channel *chan)
struct nouveau_channel *prev;
int ret =3D 0;
=20
- prev =3D fence ? fence->channel : NULL;
- if (prev) {
+ if (fence !=3D NULL &&
+ (prev =3D nouveau_fence_channel_acquire(fence)) !=3D NULL) {
if (unlikely(prev !=3D chan && !nouveau_fence_done(fence))) {
ret =3D fctx->sync(fence, prev, chan);
if (unlikely(ret))
ret =3D nouveau_fence_wait(fence, true, false);
}
+ nouveau_fence_channel_release(prev);
}
=20
return ret;
@@ -347,12 +589,14 @@ static void
nouveau_fence_del(struct kref *kref)
{
struct nouveau_fence *fence =3D container_of(kref, typeof(*fence), kref);
+
kfree(fence);
}
=20
void
nouveau_fence_unref(struct nouveau_fence **pfence)
{
+
if (*pfence)
kref_put(&(*pfence)->kref, nouveau_fence_del);
*pfence =3D NULL;
@@ -361,6 +605,7 @@ nouveau_fence_unref(struct nouveau_fence **pfence)
struct nouveau_fence *
nouveau_fence_ref(struct nouveau_fence *fence)
{
+
if (fence)
kref_get(&fence->kref);
return fence;
@@ -382,6 +627,7 @@ nouveau_fence_new(struct nouveau_channel *chan, bool sy=
smem,
=20
INIT_LIST_HEAD(&fence->work);
fence->sysmem =3D sysmem;
+ fence->done =3D false;
kref_init(&fence->kref);
=20
ret =3D nouveau_fence_emit(fence, chan);
diff --git a/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.h b/sys/e=
xternal/bsd/drm2/dist/drm/nouveau/nouveau_fence.h
index f6f12ba1f38f..a0c32455bd55 100644
--- a/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.h
+++ b/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.h
@@ -9,6 +9,7 @@ struct nouveau_fence {
struct kref kref;
=20
bool sysmem;
+ bool done;
=20
struct nouveau_channel *channel;
unsigned long timeout;
@@ -27,9 +28,15 @@ void nouveau_fence_work(struct nouveau_fence *, void (*)=
(void *), void *);
int nouveau_fence_wait(struct nouveau_fence *, bool lazy, bool intr);
int nouveau_fence_sync(struct nouveau_fence *, struct nouveau_channel *);
=20
+/*
+ * struct nouveau_fence_chan:
+ *
+ * State common to all fences in a single nouveau_channel.
+ */
struct nouveau_fence_chan {
struct list_head pending;
struct list_head flip;
+ struct list_head done;
=20
int (*emit)(struct nouveau_fence *);
int (*sync)(struct nouveau_fence *, struct nouveau_channel *,
@@ -39,9 +46,16 @@ struct nouveau_fence_chan {
int (*sync32)(struct nouveau_channel *, u64, u32);
=20
spinlock_t lock;
+ drm_waitqueue_t waitqueue;
+ volatile unsigned refcnt;
u32 sequence;
};
=20
+/*
+ * struct nouveau_fence_priv:
+ *
+ * Device-specific operations on fences.
+ */
struct nouveau_fence_priv {
void (*dtor)(struct nouveau_drm *);
bool (*suspend)(struct nouveau_drm *);
@@ -49,12 +63,6 @@ struct nouveau_fence_priv {
int (*context_new)(struct nouveau_channel *);
void (*context_del)(struct nouveau_channel *);
=20
-#ifdef __NetBSD__
- spinlock_t waitlock;
- drm_waitqueue_t waitqueue;
-#else
- wait_queue_head_t waiting;
-#endif
bool uevent;
};
=20
diff --git a/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_nv84_fence.c b/=
sys/external/bsd/drm2/dist/drm/nouveau/nouveau_nv84_fence.c
index 0bf784f0f11b..d4e6b8fa9992 100644
--- a/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_nv84_fence.c
+++ b/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_nv84_fence.c
@@ -216,11 +216,6 @@ nv84_fence_destroy(struct nouveau_drm *drm)
{
struct nv84_fence_priv *priv =3D drm->fence;
=20
-#ifdef __NetBSD__
- spin_lock_destroy(&priv->base.waitlock);
- DRM_DESTROY_WAITQUEUE(&priv->base.waitqueue);
-#endif
-
nouveau_bo_unmap(priv->bo_gart);
if (priv->bo_gart)
nouveau_bo_unpin(priv->bo_gart);
@@ -250,12 +245,6 @@ nv84_fence_create(struct nouveau_drm *drm)
priv->base.context_new =3D nv84_fence_context_new;
priv->base.context_del =3D nv84_fence_context_del;
=20
-#ifdef __NetBSD__
- spin_lock_init(&priv->base.waitlock);
- DRM_INIT_WAITQUEUE(&priv->base.waitqueue, "nvfenceq");
-#else
- init_waitqueue_head(&priv->base.waiting);
-#endif
priv->base.uevent =3D true;
=20
ret =3D nouveau_bo_new(drm->dev, 16 * (pfifo->max + 1), 0,
--=_73WqrbnRAUdWsUzKH5e7RXeLfJD5xyxZ--
From: Greg Oster <oster@netbsd.org>
To: Taylor R Campbell <campbell@mumble.net>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/53441: nouveau panic in 8.0_RC2 amd64
Date: Thu, 16 Aug 2018 07:28:41 -0600
On Thu, 16 Aug 2018 05:37:54 +0000
Taylor R Campbell <campbell@mumble.net> wrote:
> Please revert the previous patch, and try the attached patch instead.
re-patched, and running with new kernel. Will let you know how this
one goes...
Thanks!
Later...
Greg Oster
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53441 CVS commit: src/sys/external/bsd/drm2/dist/drm/nouveau
Date: Thu, 23 Aug 2018 01:06:51 +0000
Module Name: src
Committed By: riastradh
Date: Thu Aug 23 01:06:51 UTC 2018
Modified Files:
src/sys/external/bsd/drm2/dist/drm/nouveau: nouveau_fence.c
nouveau_fence.h nouveau_nv84_fence.c
Log Message:
Rewrite nouveau_fence in an attempt to make it make sense.
PR kern/53441
XXX pullup-7
XXX pullup-8
To generate a diff of this commit:
cvs rdiff -u -r1.4 -r1.5 \
src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c
cvs rdiff -u -r1.2 -r1.3 \
src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.h \
src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_nv84_fence.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53441 CVS commit: src/sys/external/bsd/drm2/dist/drm/nouveau
Date: Thu, 23 Aug 2018 01:10:04 +0000
Module Name: src
Committed By: riastradh
Date: Thu Aug 23 01:10:04 UTC 2018
Modified Files:
src/sys/external/bsd/drm2/dist/drm/nouveau: nouveau_fence.c
nouveau_fence.h
Log Message:
Fences may last longer than their channels.
- Use a reference count on the nouveau_fence_chan object.
- Acquire it with kpreemption disabled.
- Use xcall to wait for kpreempt-disabled sections to complete.
PR kern/53441
XXX pullup-7
XXX pullup-8
To generate a diff of this commit:
cvs rdiff -u -r1.5 -r1.6 \
src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c
cvs rdiff -u -r1.3 -r1.4 \
src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53441 CVS commit: src/sys/external/bsd/drm2/dist/drm/nouveau
Date: Thu, 23 Aug 2018 01:10:21 +0000
Module Name: src
Committed By: riastradh
Date: Thu Aug 23 01:10:21 UTC 2018
Modified Files:
src/sys/external/bsd/drm2/dist/drm/nouveau: nouveau_fence.c
nouveau_fence.h
Log Message:
Defer nouveau_fence_unref until spin unlock.
- kfree while holding a spin lock is not a good idea.
- Make sure we GC every time we might signal fences.
PR kern/53441
XXX pullup-7
XXX pullup-8
To generate a diff of this commit:
cvs rdiff -u -r1.6 -r1.7 \
src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c
cvs rdiff -u -r1.4 -r1.5 \
src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.h
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53441 CVS commit: src/sys/external/bsd/drm2/dist/drm/nouveau
Date: Thu, 23 Aug 2018 01:10:29 +0000
Module Name: src
Committed By: riastradh
Date: Thu Aug 23 01:10:28 UTC 2018
Modified Files:
src/sys/external/bsd/drm2/dist/drm/nouveau: nouveau_fence.c
Log Message:
Attempt to make sense of return values of nouveau_fence_wait.
PR kern/53441
XXX pullup-7
XXX pullup-8
To generate a diff of this commit:
cvs rdiff -u -r1.7 -r1.8 \
src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Taylor R Campbell" <riastradh@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53441 CVS commit: src/sys/external/bsd/drm2/dist/drm/nouveau
Date: Thu, 23 Aug 2018 01:10:36 +0000
Module Name: src
Committed By: riastradh
Date: Thu Aug 23 01:10:36 UTC 2018
Modified Files:
src/sys/external/bsd/drm2/dist/drm/nouveau: nouveau_fence.c
Log Message:
Fix edge case of reference counting, oops.
PR kern/53441
XXX pullup-7
XXX pullup-8
To generate a diff of this commit:
cvs rdiff -u -r1.8 -r1.9 \
src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/53441 CVS commit: [netbsd-8] src/sys/external/bsd/drm2/dist/drm/nouveau
Date: Fri, 31 Aug 2018 17:35:51 +0000
Module Name: src
Committed By: martin
Date: Fri Aug 31 17:35:51 UTC 2018
Modified Files:
src/sys/external/bsd/drm2/dist/drm/nouveau [netbsd-8]: nouveau_fence.c
nouveau_fence.h nouveau_nv84_fence.c
Log Message:
Pull up following revision(s) (requested by riastradh in ticket #996):
sys/external/bsd/drm2/dist/drm/nouveau/nouveau_nv84_fence.c: revision 1.3
sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.h: revision 1.3
sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.h: revision 1.4
sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.h: revision 1.5
sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c: revision 1.5
sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c: revision 1.6
sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c: revision 1.7
sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c: revision 1.8
sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c: revision 1.9
Rewrite nouveau_fence in an attempt to make it make sense.
PR kern/53441
XXX pullup-7
XXX pullup-8
Fences may last longer than their channels.
- Use a reference count on the nouveau_fence_chan object.
- Acquire it with kpreemption disabled.
- Use xcall to wait for kpreempt-disabled sections to complete.
PR kern/53441
XXX pullup-7
XXX pullup-8
Defer nouveau_fence_unref until spin unlock.
- kfree while holding a spin lock is not a good idea.
- Make sure we GC every time we might signal fences.
PR kern/53441
XXX pullup-7
XXX pullup-8
Attempt to make sense of return values of nouveau_fence_wait.
PR kern/53441
XXX pullup-7
XXX pullup-8
Fix edge case of reference counting, oops.
PR kern/53441
XXX pullup-7
XXX pullup-8
To generate a diff of this commit:
cvs rdiff -u -r1.4 -r1.4.10.1 \
src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.c
cvs rdiff -u -r1.2 -r1.2.24.1 \
src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_fence.h
cvs rdiff -u -r1.2 -r1.2.10.1 \
src/sys/external/bsd/drm2/dist/drm/nouveau/nouveau_nv84_fence.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->feedback
State-Changed-By: maya@NetBSD.org
State-Changed-When: Sun, 23 Sep 2018 14:07:05 +0000
State-Changed-Why:
Setting a reminder to see if there are panics after a few days/weeks. (I don't think you provided feedback after the new commits)
From: Greg Oster <oster@netbsd.org>
To: maya@NetBSD.org
Cc: gnats-bugs@NetBSD.org, kern-bug-people@netbsd.org,
netbsd-bugs@netbsd.org, gnats-admin@netbsd.org
Subject: Re: kern/53441 (nouveau panic in 8.0_RC2 amd64)
Date: Sun, 23 Sep 2018 13:36:56 -0600
On Sun, 23 Sep 2018 14:07:06 +0000 (UTC)
maya@NetBSD.org wrote:
> Synopsis: nouveau panic in 8.0_RC2 amd64
>
> State-Changed-From-To: open->feedback
> State-Changed-By: maya@NetBSD.org
> State-Changed-When: Sun, 23 Sep 2018 14:07:05 +0000
> State-Changed-Why:
> Setting a reminder to see if there are panics after a few days/weeks.
> (I don't think you provided feedback after the new commits)
I have had zero panics since the new commits, and since things were
pulled up to 8.0.
Can call this one done.
Thanks!
Later...
Greg Oster
State-Changed-From-To: feedback->closed
State-Changed-By: maya@NetBSD.org
State-Changed-When: Sun, 23 Sep 2018 19:44:49 +0000
State-Changed-Why:
Reported fixed, thanks riastradh, thanks for the report, go.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.