NetBSD Problem Report #58531
From manu@netbsd.org Wed Jul 31 15:04:22 2024
Return-Path: <manu@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
client-signature RSA-PSS (2048 bits) client-digest SHA256)
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 520331A923C
for <gnats-bugs@gnats.NetBSD.org>; Wed, 31 Jul 2024 15:04:22 +0000 (UTC)
Message-Id: <20240731150421.D571884DA9@mail.netbsd.org>
Date: Wed, 31 Jul 2024 15:04:21 +0000 (UTC)
From: manu@netbsd.org
Reply-To: manu@netbsd.org
To: gnats-bugs@NetBSD.org
Subject: NetBSD 10.0 deadlock in nd_timer
X-Send-Pr-Version: 3.95
>Number: 58531
>Category: kern
>Synopsis: NetBSD 10.0 deadlock in nd_timer
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: kern-bug-people
>State: analyzed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Jul 31 15:05:00 +0000 2024
>Closed-Date:
>Last-Modified: Wed Sep 25 14:49:26 +0000 2024
>Originator: Emmanuel Dreyfus
>Release: NetBSD 10.0
>Organization:
NetBSD
>Environment:
System: NetBSD 10.0
Architecture: i386
Machine: i386
>Description:
Without LOCKDEBUG, the kernel freeze a few time a day
With LOCKDEBUG, I get this panic:
[ 73171.2786945] cpu1[20 softclk/1]: hogging kernel lock
[ 73171.2786945] ipi_msg_cpu_handler(0,0,c84513c0,c09dc4f3,0,c846b000,1,1,dfc26000,dfe73f70) at netbsd:ipi_msg_cpu_handler+0x4c
[ 73171.2886997] ipi_cpu_handler(dfc26000,0,6,80,c84517c0,dfe6ac28,c0102c92,dfe6ab88,0,0) at netbsd:ipi_cpu_handler+0x90
[ 73171.2987044] x86_ipi_handler(dfe6ab88,0,0,0,0,0,0,0,0,0) at netbsd:x86_ipi_handler+0x6b
address 0x34 is invalid
address 0x30 is invalid
[ 73171.3187143] DDB lost frame for netbsd:Xresume_lapic_ipi+0x22, trying 0xdfe73f78
[ 73171.3187143] Xresume_lapic_ipi() at netbsd:Xresume_lapic_ipi+0x22
[ 73171.3287193] --- interrupt ---
[ 73171.3287193] 0:
[ 73171.3287193] Kernel lock error: _kernel_lock,266: spinout
[ 73171.3287193] lock address : netbsd:kernel_lock
[ 73171.3287193] type : spin
[ 73171.3287193] initialized : netbsd:main+0x5c
[ 73171.3287193] shared holds : 0 exclusive: 1
[ 73171.3287193] shares wanted: 0 exclusive: 1
[ 73171.3287193] relevant cpu : 0 last held: 1
[ 73171.3287193] relevant lwp : 0x00000000c856f300 last held: 0x00000000c859a940
[ 73171.3287193] last locked* : netbsd:nd_timer+0x30
[ 73171.3287193] unlocked : netbsd:ipintr+0x1f0
[ 73171.3287193] curcpu holds : 0 wanted by: 0x00000000c856f300
[ 73171.4010615] panic: LOCKDEBUG: Kernel lock error: _kernel_lock,266: spinout
[ 73171.4010615] cpu0: Begin traceback...
[ 73171.4010615] vpanic(c0f368cc,dfd7fc90,dfd7fcbc,c09f6e4d,c0f368cc,c0f2d347,c0e4fe00,10a,c0f308d6,1) at netbsd:vpanic+0x196
[ 73171.4010615] panic(c0f368cc,c0f2d347,c0e4fe00,10a,c0f308d6,1,c0f308d6,10a,c0e4fe00,6f9086) at netbsd:panic+0x18
[ 73171.4010615] lockdebug_abort1(6,c0f308d6,1,c42f513c,c0f308d6,4,0,0,c42e10c0,1) at netbsd:lockdebug_abort1+0xdb
[ 73171.4010615] _kernel_lock(1,c8aafbc4,c8c7d290,dfd7fd54,c0ba15ca,0,0,c92eca00,dfd7fd60,c0993ff7) at netbsd:_kernel_lock+0x2e3
[ 73171.4010615] filter_event(c9345240,c8b3c4c0,c8b3c510,3c,c8b3c4c0,dfd7fdb8,c0a1f69d,c8b3c518,0,2da) at netbsd:filter_event+0x101
[ 73171.4010615] knote(c8b3c518,0,2da,c0e4ff08,c0ab3cf4,6,0,c42e1210,c9345240,c8b3c4c0) at netbsd:knote+0x2e
[ 73171.4010615] selnotify(c8b3c510,0,0,c84518c0,c93df1c0,6,c0b94a90,1,2a,c8d03956) at netbsd:selnotify+0x2b
[ 73171.4010615] bpf_deliver(2a,2a,2,3d5,5,c0a0019b,1000,c92e22c0,1feb0,c8837834) at netbsd:bpf_deliver+0x30c
[ 73171.4010615] wm_send_common_locked(c84517c0,c8451080,dfe005c4,c856f300,c8451080,9ac900,0,c8839010,64,c8837834) at netbsd:wm_send_common_locked+0x41e
[ 73171.4010615] wm_handle_queue(c8839000,1,0,0,0,0,0,0,0,0) at netbsd:wm_handle_queue+0x260
[ 73171.4010615] softint_dispatch(c856f040,4,70796c47,68,63637553,737365,0,0,0,0) at netbsd:softint_dispatch+0xe6
[ 73171.4010615] Bad frame pointer: 0xc8615000
[ 73171.4010615] cpu0: End traceback...
[ 73171.4010615] fatal breakpoint trap in supervisor mode
[ 73171.4010615] trap type 1 code 0 eip 0xc0127eb4 cs 0x8 eflags 0x202 cr2 0xb7127000 ilevel 0x8 esp 0xdfd7fc74
[ 73171.4010615] curlwp 0xc856f300 pid 0 lid 3 lowest kstack 0xdfd7d2c0
Stopped in pid 0.3 (system) at netbsd:breakpoint+0x4: popl %ebp
db{0}> ps
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
848 848 3 1 180 c9414940 openssl select
334 334 3 0 180 c95d7cc0 sleep nanoslp
16719 16719 3 1 180 c94c6400 xargs wait
4911 4911 3 1 180 c94f3700 sh wait
14642 14642 3 1 180 c93ff900 ksh pause
12211 12211 3 0 180 c95d7a00 tclsh wait
23339 23339 3 0 180 c94c6c40 cron pipe_rd
20690 20690 3 1 180 c93b0080 cron pipe_rd
854 854 3 1 180 c899ab80 getty ttyraw
1115 1115 3 0 180 c92e4d40 cron nanoslp
958 1116 3 1 180 c92e47c0 named parked
958 1113 3 0 180 c8b40a00 named parked
958 1117 3 1 180 c93b0b80 timer parked
958 1105 3 0 180 c93b08c0 isc-net-0003 kqueue
958 1096 3 1 180 c93b0600 isc-net-0002 kqueue
958 812 3 0 180 c93b0340 isc-net-0001 kqueue
958 1013 3 1 180 c9350040 isc-net-0000 kqueue
958 958 3 1 180 c9350300 named sigwait
681 681 3 1 180 c8aed9c0 openvpn netio
680 680 3 1 180 c8b40740 openvpn poll
955 955 3 0 180 c8b401c0 parpd kqueue
828 828 3 1 180 c92e4240 sshd poll
511 511 3 1 180 c8b40480 syslogd kqueue
1 1 3 1 180 c8a7b6c0 init wait
0 181 3 0 200 c899a8c0 physiod physiod
0 204 3 1 200 c8a7bc40 pooldrain pooldrain
0 203 3 1 200 c899a340 ioflush syncer
0 202 3 0 200 c899a600 pgdaemon pgdaemon
0 199 3 0 200 c899a080 swwreboot swwreboot
0 197 3 0 200 c897ab40 usb1 usbevt
0 196 3 1 200 c899e0c0 usb0 usbevt
0 195 3 1 200 c8a7b980 npfgc0 npfgcw
0 194 3 1 200 c8a7b400 rt_free rt_free
0 193 3 1 200 c8a7b140 unpgc unpgc
0 192 3 1 200 c89afc00 key_timehandler key_timehandler
0 184 3 1 200 c89af940 carp_wqinput/1 carp_wqinput
0 183 3 0 200 c89af680 carp_wqinput/0 carp_wqinput
0 182 3 1 200 c89af3c0 icmp_wqinput/1 icmp_wqinput
0 31 3 0 200 c89af100 icmp_wqinput/0 icmp_wqinput
0 63 3 1 200 c899ebc0 rt_timer rt_timer
0 126 3 1 200 c899e900 vmem_rehash vmem_rehash
0 125 3 1 200 c899e640 coretemp1 coretemp1
0 124 3 1 200 c899e380 coretemp0 coretemp0
0 115 3 0 200 c897a880 entbutler entropy
0 114 3 1 240 c897a5c0 atabus0 atath
0 113 3 0 200 c897a300 wm7Reset wm7Reset
0 112 3 1 200 c897a040 wm7TxRx/1 wm7TxRx
0 111 3 0 200 c88f9d40 wm7TxRx/0 wm7TxRx
0 110 3 0 200 c88f9a80 wm6Reset wm6Reset
0 109 3 1 200 c88f97c0 wm6TxRx/1 wm6TxRx
0 108 3 0 200 c88f9500 wm6TxRx/0 wm6TxRx
0 107 3 0 200 c88f9240 wm5Reset wm5Reset
0 106 3 1 200 c88d3d00 wm5TxRx/1 wm5TxRx
0 105 3 0 200 c88d3a40 wm5TxRx/0 wm5TxRx
0 104 3 0 200 c88d3780 wm4Reset wm4Reset
0 1038d3200 wm4TxRx/0 wm4TxRx
0 101 3 0 200 c8894cc0 wm3Reset wm3Reset
0 100 3 1 200 c8894a00 wm3TxRx/1 wm3TxRx
0 99 3 0 200 c8894740 wm3TxRx/0 wm3TxRx
0 98 3 0 200 c8894480 wm2Reset wm2Reset
0 97 3 1 200 c88941c0 wm2TxRx/1 wm2TxRx
0 96 3 0 200 c8812c80 wm2TxR0 c8812440 wm1TxRx/0 wm1TxRx
0 27 3 0 200 c8812180 wm0Reset wm0Reset
0 26 3 1 200 c859dc40 wm0TxRx/1 wm0TxRx
0 25 3 0 200 c859d980 wm0TxRx/0 wm0TxRx
0 24 3 0 200 c859d6c0 usbtask-dr usbtsk
0 23 3 0 200 c859d400 usbtask-hc usbtsk
0 22 3 1 200 c859d140 xcal0 softbio/1
0 21 1 1 200 c859ac00 softser/1
0 > 20 7 1 200 c859a940 softclk/1
0 19 1 1 200 c859a680 softbio/1
0 18 1 1 200 c859a3c0 softnet/1
0 > 17 1 1 201 c859a100 idle/1
0 16 3 0 200 c857fbc0 sysmon smtaskq
0 15 3 0 200 c857f900 pmfsuspend pmfsuspend
0 14 3 0 200 c857f640 pmfevent pmfevent
0 13 3 0 200 c857f380 sopendfree sopendfr
0 12 3 0 200 c857c8571600 vdrain vdrain
0 8 3 0 200 c8571340 modunload mod_unld
0 7 3 0 200 c8571080 xcall/0 xcall
0 6 1 0 200 c856fb40 softser/0
0 5 1 0 200 c856f880 softclk/0
0 4 1 0 200 c856f5c0 softbio/0
0 > 3 7 0 200 c856f300 softnet/0
0 > 2 1 0 201 c856f040 idle/0
0 0 3 1 200 c42f5180 swapper uvm
>How-To-Repeat:
The machine is a relatively idle router, it happens a few time
a day.
>Fix:
boot -1 seems an obvious and effective workaround. I am certain
we can do better.
>Release-Note:
>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: Emmanuel Dreyfus <manu@NetBSD.org>
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/58531: NetBSD 10.0 deadlock in nd_timer
Date: Wed, 31 Jul 2024 15:19:27 +0000
> Date: Wed, 31 Jul 2024 15:04:21 +0000 (UTC)
> From: manu@netbsd.org
>
> 0 > 20 7 1 200 c859a940 softclk/1
Can you get `bt/a c859a940' output, or `mach cpu 1' and `bt' output?
From: Emmanuel Dreyfus <manu@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/58531: NetBSD 10.0 deadlock in nd_timer
Date: Sun, 11 Aug 2024 13:05:09 +0000
tt took some time, but here is another crash.
[ 41423.0297902] Kernel lock error: _kernel_lock,266: spinout
[ 41423.0297902] lock address : netbsd:kernel_lock
[ 41423.0297902] type : spin
[ 41423.0297902] initialized : netbsd:main+0x5c
[ 41423.0297902] shared holds : 0 exclusive: 1
[ 41423.0297902] shares wanted: 0 exclusive: 1
[ 41423.0297902] relevant cpu : 0 last held: 1
[ 41423.0297902] relevant lwp : 0x00000000c856f300 last held: 0x00000000c8841940
[ 41423.0297902] last locked* : netbsd:nd_timer+0x30
[ 41423.0297902] unlocked : netbsd:filter_event+0x12a
[ 41423.0297902] curcpu holds : 0 wanted by: 0x00000000c856f300
Like last time, last locked is softclk/1
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
0 > 20 7 1 200 c8841940 softclk/1
mutex_enter(c84519c0,2da,c8451ac0,c0ab8e91,0,c894a834,2,dfe47c70,c0ab8f16,c894a834) at netbsd:mutex_enter+0x553
wm_start(c894a834,dfe47c70,2,2a,c894a834,3cf31000,0,dfe47cc0,c0ac776c,c894a834) at netbsd:wm_start+0x24
if_transmit(c894a834,c8df7764,c894a834,2,c0a37af6,2,2,c8df7848,c894a834,c8df7764) at netbsd:if_transmit+0x151
ether_output(c894a834,c8df7764,dfe47cf0,0,7,dfe47d40,6,dfe47d3c,c8df7764,c894a834) at netbsd:ether_output+0x2ec
arprequest(c894bf63,dfe47e01,c8ed297c,0,c09dc303,2,1cb,dfe47e01,c894bf63,845236c1) at netbsd:arprequest+0x1a9
arp_llinfo_output(c894a834,dfe47de0,dfe47de0,dfe47e01,0,dfe47dec,c09f7b28,fffffffe,0,dfe47de0) at netbsd:arp_llinfo_output+0x164
nd_timer(c8ed28dc,c09c9dac,7,0,dfc26150,c84513c0,dfe701ec,c8841940,c846b004,10c) at netbsd:nd_timer+0x3c0
callout_softclock(0,0,c8582ccc,0,0,0,0,c0efabdb,84,3d8bdf84) at netbsd:callout_softclock+0xc3
softint_dispatch(c8841100,2,5b2ddee,3a411a81,c3caef3,f8229e46,address 0xdfe48000 is invalid 0,address 0xdfe48004 is invalid 0,address 0xdfe48008 is invalid 0,address 0xdfe4800c is invalid 0) at netbsd:softint_dispatch+0xe6
wm_start() is short. We wait for txq->txq_lock
mutex_enter(txq->txq_lock);
if (!txq->txq_stopping)
wm_start_locked(ifp);
mutex_exit(txq->txq_lock);
I already rebooted hence I cannot tell what thread hold it. Let us wait the
next crash.
--
Emmanuel Dreyfus
manu@netbsd.org
From: Emmanuel Dreyfus <manu@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/58531: NetBSD 10.0 deadlock in nd_timer
Date: Sun, 11 Aug 2024 16:23:16 +0000
On Sun, Aug 11, 2024 at 01:05:09PM +0000, Emmanuel Dreyfus wrote:
> wm_start() is short. We wait for txq->txq_lock
> mutex_enter(txq->txq_lock);
> if (!txq->txq_stopping)
> wm_start_locked(ifp);
> mutex_exit(txq->txq_lock);
>
> I already rebooted hence I cannot tell what thread hold it. Let us wait the
> next crash.
New crash, backtrace starts by:
mutex_enter(c84519c0,2da,c8451ac0,c0ab8e91,0,c894a834,2,dfe47c70,c0ab8f16,c894a8
34) at netbsd:mutex_enter+0x555
wm_start(c894a834,dfe47c70,102,2a,c894a834,3cf31000,0,dfe47cc0,c0ac776c,c894a834
) at netbsd:wm_start+0x24
show lock c84519c0 says:
lock address : c84519c0
type : spin
initialized : netbsd:wm_alloc_txrx_queues.part.0+0x4d
shared holds : 0 exclusive: 1
shares wanted: 0 exclusive: 1
relevant cpu : 0 last held: 0
relevant lwp : 0x00000000c856f300 last held: 0x00000000c856f300
last locked* : netbsd:wm_handle_queue+0x29
unlocked : netbsd:wm_intr_legacy+0x80
owner field : 0x0000000000010600 wait/spin: 0/1
Here is last held:
PID LID S CPU FLAGS STRUCT LWP * NAME WAIT
0 > 3 7 0 200 c856f300 softnet/0
Backtrace:
breakpoint(c0f3676c,5,0,0,c43cfc80,c43c7220,c0e4fe00,dfd7fc84,c0a027c7,c0f368cc) at netbsd:breakpoint+0x4
vpanic(c0f368cc,dfd7fc90,dfd7fcbc,c09f6e4d,c0f368cc,c0f2d347,c0e4fe00,10a,c0f308d6,1) at netbsd:vpanic+0x196
panic(c0f368cc,c0f2d347,c0e4fe00,10a,c0f308d6,1,c0f308d6,10a,c0e4fe00,10e916) at netbsd:panic+0x18
lockdebug_abort1(6,c0f308d6,1,c42f513c,c0f308d6,c8451380,0,0,c42e10c0,1) at netbsd:lockdebug_abort1+0xdb
_kernel_lock(1,c85b1d0c,c8451400,dfd7fd54,c0ba15ca,0,0,c8bd6f40,dfd7fd60,c0993ff7) at netbsd:_kernel_lock+0x2e3
filter_event(c9465c80,c9474400,c9474450,3c,c9474400,dfd7fdb8,c0a1f69d,c9474458,0,2da) at netbsd:filter_event+0x101
knote(c9474458,0,2da,c0e4ff08,c0ab3cf4,6,0,c42e1210,c9465c80,c9474400) at netbsd:knote+0x2e
selnotify(c9474450,0,0,c8451ac0,c940888c,6,c0b94a90,1,2a,c8e7da82) at netbsd:selnotify+0x2b
bpf_deliver(2a,2a,2,90c,5,c0a0019b,1000,c8e74188,1feb0,c894a834) at netbsd:bpf_deliver+0x30c
wm_send_common_locked(c84519c0,c89ad034,c89ad034,c89ad034,dfd7ff5c,ab9015,0,c8849010,64,c894a834) at netbsd:wm_send_common_locked+0x41e
wm_handle_queue(c8849000,61766e49,2064696c,61726150,6574656d,72,0,64616f4c,72724520,726f) at netbsd:wm_handle_queue+0x260
softint_dispatch(c856f040,4,70796c47,68,63637553,737365,0,0,0,0) at netbsd:softint_dispatch+0xe6
We have a deadlock here:
wm_handle_queue/wm_send_common_locked/bpf_deliver/selnotify/knote/filter_event waits for kernel_lock with txq->txq_lock held
softint_dispatch/callout_softclock/nd_timer/arp_llinfo_output/arprequest/ether_output/if_transmit/wm_start waits for txq->txq_lock with kernel_lock held.
--
Emmanuel Dreyfus
manu@netbsd.org
From: Emmanuel Dreyfus <manu@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/58531: NetBSD 10.0 deadlock in nd_timer
Date: Tue, 13 Aug 2024 00:52:09 +0000
On Sun, Aug 11, 2024 at 04:25:01PM +0000, gnats-admin@netbsd.org wrote:
> We have a deadlock here:
>
> wm_handle_queue/wm_send_common_locked/bpf_deliver/selnotify/knote/filter_event waits for kernel_lock with txq->txq_lock held
>
> softint_dispatch/callout_softclock/nd_timer/arp_llinfo_output/arprequest/ether_output/if_transmit/wm_start waits for txq->txq_lock with kernel_lock held.
I do not see an easy way through this. Enabling NET_MPSAFE would remove
kernel_lock here, but I use MROUTING, PIM, altq and ppp, which are
known as MP-unsafe in src/doc/TODO.smpnet.
Is boot -1 the only workaround?
--
Emmanuel Dreyfus
manu@netbsd.org
From: matthew green <mrg@eterna23.net>
To: gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, manu@netbsd.org
Subject: re: kern/58531: NetBSD 10.0 deadlock in nd_timer
Date: Tue, 13 Aug 2024 11:10:53 +1000
> We have a deadlock here:
> =
> wm_handle_queue/wm_send_common_locked/bpf_deliver/selnotify/knote/filte=
r_event waits for kernel_lock with txq->txq_lock held
> =
> softint_dispatch/callout_softclock/nd_timer/arp_llinfo_output/arpreques=
t/ether_output/if_transmit/wm_start waits for txq->txq_lock with kernel_l=
ock held
usually we'd want kernel lock taken before other locks so in
thie case, the first one would already have kernel lock and
would just take a ref on the existing lock
for the first case, see if you can ensure that kernel_lock is
held before taking txq_lock, perhaps only for the case that
calls the knote.
apparently we need to de-kernel-lock-ify select backends?
.mrg.
From: Rin Okuyama <rokuyama.rk@gmail.com>
To: matthew green <mrg@eterna23.net>, gnats-bugs@netbsd.org
Cc: kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org, manu@netbsd.org, ozaki-r@NetBSD.org
Subject: Re: kern/58531: NetBSD 10.0 deadlock in nd_timer
Date: Tue, 13 Aug 2024 11:54:14 +0900
On 2024/08/13 10:10, matthew green wrote:
>> We have a deadlock here:
>>
>> wm_handle_queue/wm_send_common_locked/bpf_deliver/selnotify/knote/filter_event waits for kernel_lock with txq->txq_lock held
>>
>> softint_dispatch/callout_softclock/nd_timer/arp_llinfo_output/arprequest/ether_output/if_transmit/wm_start waits for txq->txq_lock with kernel_lock held
>
> usually we'd want kernel lock taken before other locks so in
> thie case, the first one would already have kernel lock and
> would just take a ref on the existing lock
I just discussed this with ozaki-r@. He pointed out that
wm(4) is now marked IFEF_MPSAFE regardless of NET_MPSAFE:
https://github.com/NetBSD/src/commit/2f5368b82e369741e8d99b3fd6cda9a14e76e550
As a result, if(4) routines do not take KERNEL_LOCK for wm(4).
These days, many drivers assert IFEF_MPSAFE unconditionally.
We must revisit all of them, if it matters...
> for the first case, see if you can ensure that kernel_lock is
> held before taking txq_lock, perhaps only for the case that
> calls the knote.
>
> apparently we need to de-kernel-lock-ify select backends?
Yeah, at least FILTEROP_MPSAFE should be turned on for bpf(4);
it has a fine-grained lock, and this flag bit seems to be
left untouched just accidentally.
manu@, can you please test whether the attached patch below
fixes your problem or not?
Thanks,
rin
----
diff --git a/sys/net/bpf.c b/sys/net/bpf.c
index f2532265e34..4ae5e7653f0 100644
--- a/sys/net/bpf.c
+++ b/sys/net/bpf.c
@@ -1611,7 +1611,7 @@ filt_bpfread(struct knote *kn, long hint)
}
static const struct filterops bpfread_filtops = {
- .f_flags = FILTEROP_ISFD,
+ .f_flags = FILTEROP_ISFD | FILTEROP_MPSAFE,
.f_attach = NULL,
.f_detach = filt_bpfrdetach,
.f_event = filt_bpfread,
----
From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/58531 CVS commit: src/sys/net
Date: Thu, 15 Aug 2024 02:22:46 +0000
Module Name: src
Committed By: rin
Date: Thu Aug 15 02:22:46 UTC 2024
Modified Files:
src/sys/net: bpf.c
Log Message:
bpf: Mark bpfread_filtops FILTEROP_MPSAFE
Fix deadlock for non-NET_MPSAFE kernel, reported as
PR kern/58531 (thanks manu@ for test).
I've confirmed that there is no new regression for ATF with
any combination of -HEAD/netbsd-10 and default/NET_MPSAFE
rump kernels (aarch64).
Although, some problems have been reported on MP-safety for
bpf(4), PR kern/58596. But, it should take some time to fix.
At the moment, commit this part in advance.
OK ozaki-r@
To generate a diff of this commit:
cvs rdiff -u -r1.252 -r1.253 src/sys/net/bpf.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->needs-pullups
State-Changed-By: rin@NetBSD.org
State-Changed-When: Thu, 15 Aug 2024 02:39:52 +0000
State-Changed-Why:
[pullup-10 #784]
Patch seems applicable and necessary for netbsd-9 also, for which
if_vmx.c, usbnet.c, and if_ipsec.c advertise IFEF_MPSAFE regardless of
NET_MPSAFE. I will send a request after ATF runs.
From: Rin Okuyama <rokuyama.rk@gmail.com>
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
netbsd-bugs@netbsd.org, gnats-admin@netbsd.org, rin@NetBSD.org,
manu@netbsd.org, ozaki-r@NetBSD.org, Taylor R Campbell <riastradh@NetBSD.org>
Cc:
Subject: Re: kern/58531 (NetBSD 10.0 deadlock in nd_timer)
Date: Thu, 15 Aug 2024 11:44:42 +0900
On 2024/08/15 11:39, rin@NetBSD.org wrote:
> Patch seems applicable and necessary for netbsd-9 also, for which
> if_vmx.c, usbnet.c, and if_ipsec.c advertise IFEF_MPSAFE regardless of
> NET_MPSAFE. I will send a request after ATF runs.
Oops, *not* applicable actually; kern_event.c uses the giant lock
unconditionally for netbsd-9.
If we wish to avoid deadlocks, we need to drop IFEF_MPSAFE for
non-NET_MPSAFE kernels?
Thanks,
rin
From: matthew green <mrg@eterna23.net>
To: Rin Okuyama <rokuyama.rk@gmail.com>
Cc: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
netbsd-bugs@netbsd.org, gnats-admin@netbsd.org, rin@NetBSD.org,
manu@netbsd.org, ozaki-r@NetBSD.org,
Taylor R Campbell <riastradh@NetBSD.org>
Subject: re: kern/58531 (NetBSD 10.0 deadlock in nd_timer)
Date: Thu, 15 Aug 2024 13:42:17 +1000
> If we wish to avoid deadlocks, we need to drop IFEF_MPSAFE for
> non-NET_MPSAFE kernels?
it seems to me that !NET_MPSAFE kernels should probably ignore
IFEF_MPSAFE most of the time.
can we get the "unprotected" protocols at least "protected" so
we can turn it on all the time? (after fixing these sorts of
bugs too :-)
.mrg.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/58531 CVS commit: [netbsd-10] src/sys/net
Date: Thu, 22 Aug 2024 19:31:08 +0000
Module Name: src
Committed By: martin
Date: Thu Aug 22 19:31:08 UTC 2024
Modified Files:
src/sys/net [netbsd-10]: bpf.c
Log Message:
Pull up following revision(s) (requested by rin in ticket #784):
sys/net/bpf.c: revision 1.253
bpf: Mark bpfread_filtops FILTEROP_MPSAFE
Fix deadlock for non-NET_MPSAFE kernel, reported as
PR kern/58531 (thanks manu@ for test).
I've confirmed that there is no new regression for ATF with
any combination of -HEAD/netbsd-10 and default/NET_MPSAFE
rump kernels (aarch64).
Although, some problems have been reported on MP-safety for
bpf(4), PR kern/58596. But, it should take some time to fix.
At the moment, commit this part in advance.
OK ozaki-r@
To generate a diff of this commit:
cvs rdiff -u -r1.249.2.1 -r1.249.2.2 src/sys/net/bpf.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Rin Okuyama" <rin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/58531 CVS commit: src/sys/net
Date: Wed, 18 Sep 2024 23:20:20 +0000
Module Name: src
Committed By: rin
Date: Wed Sep 18 23:20:20 UTC 2024
Modified Files:
src/sys/net: if_tun.c
Log Message:
tun(4): Mark tunread_filtops `FILTEROP_MPSAFE`
Filter handlers have already been MP-safe since 2018:
https://mail-index.netbsd.org/source-changes/2018/08/06/msg097317.html
Note that we do not expect deadlocks similar to bpf(4) (PR kern/58531),
b/w KERNEL_LOCK and spin mutex for TX queue.
For tun(4), filt_tunread() acquires adaptive mutex. This is forbidden
when spin mutex is already held.
Such a path must have already been detected if present.
Thanks ozaki-r@ for discussion.
To generate a diff of this commit:
cvs rdiff -u -r1.176 -r1.177 src/sys/net/if_tun.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "Martin Husemann" <martin@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/58531 CVS commit: [netbsd-10] src/sys/net
Date: Sat, 21 Sep 2024 12:17:27 +0000
Module Name: src
Committed By: martin
Date: Sat Sep 21 12:17:27 UTC 2024
Modified Files:
src/sys/net [netbsd-10]: if_tun.c
Log Message:
Pull up following revision(s) (requested by rin in ticket #899):
sys/net/if_tun.c: revision 1.177
tun(4): Mark tunread_filtops `FILTEROP_MPSAFE`
Filter handlers have already been MP-safe since 2018:
https://mail-index.netbsd.org/source-changes/2018/08/06/msg097317.html
Note that we do not expect deadlocks similar to bpf(4) (PR kern/58531),
b/w KERNEL_LOCK and spin mutex for TX queue.
For tun(4), filt_tunread() acquires adaptive mutex. This is forbidden
when spin mutex is already held.
Such a path must have already been detected if present.
Thanks ozaki-r@ for discussion.
To generate a diff of this commit:
cvs rdiff -u -r1.173.4.2 -r1.173.4.3 src/sys/net/if_tun.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: needs-pullups->analyzed
State-Changed-By: rin@NetBSD.org
State-Changed-When: Wed, 25 Sep 2024 14:49:26 +0000
State-Changed-Why:
Pulled up to netbsd-10. Leave this `analyzed` for a while:
- Aren't there similar deadlock scenarios?
- Is netbsd-9 not affected?
- How about mrg@'s proposal?
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.