NetBSD Problem Report #57171

From www@netbsd.org  Fri Jan  6 17:13:41 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id E17AD1A9239
	for <gnats-bugs@gnats.NetBSD.org>; Fri,  6 Jan 2023 17:13:41 +0000 (UTC)
Message-Id: <20230106171339.6B2951A923C@mollari.NetBSD.org>
Date: Fri,  6 Jan 2023 17:13:39 +0000 (UTC)
From: joel.bertrand@systella.fr
Reply-To: joel.bertrand@systella.fr
To: gnats-bugs@NetBSD.org
Subject: altqd takes 100% of a CPU when daemon is stopped
X-Send-Pr-Version: www-1.0

>Number:         57171
>Category:       kern
>Synopsis:       altqd takes 100% of a CPU when daemon is stopped
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          feedback
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Jan 06 17:15:00 +0000 2023
>Closed-Date:    
>Last-Modified:  Tue Aug 01 12:05:01 +0000 2023
>Originator:     BERTRAND Joël
>Release:        10.0_BETA
>Organization:
Systella SAS
>Environment:
NetBSD legendre.systella.fr 10.0_BETA NetBSD 10.0_BETA (CUSTOM) #4: Fri Jan  6 08:06:42 CET 2023  root@legendre.systella.fr:/usr/src/netbsd-10/obj/sys/arch/amd64/compile/CUSTOM amd64

>Description:
I see this issue for a very long time (if I remember since -7).

I have installed a NetBSD-10.0 beta from scratch in a VM (KVM), host station runs Linux Devuan. I see the same issue on my main server also.
Test VM only contains one Ethernet adapter (wm0).

Running kernel is a customized kernel (GENERIC + altq support).

I have written in /etc/altq.conf:
interface wm0 bandwidth 1000M priq

class priq wm0 high_class_lan NULL priority 1
class priq wm0 low_class_lan NULL priority 0 default

filter wm0 high_class_lan 192.168.10.250 0 0 0 17
filter wm0 high_class_lan 0 0 192.168.10.250 0 17
filter wm0 high_class_lan 192.168.10.253 0 0 0 17
filter wm0 high_class_lan 0 0 192.168.10.253 0 17
filter wm0 high_class_lan 0 10000 0 0 17
filter wm0 high_class_lan 0 0 0 10000 17

# QoS
conditioner wm0 af41_agr0 <mark 0xb8>
filter wm0 af41_agr0 0 10000 0 0 17
filter wm0 af41_agr0 0 0 0 10000 17
filter wm0 af41_agr0 0 0 192.168.10.250 0 17
filter wm0 af41_agr0 0 0 192.168.10.253 0 17

I can start altqd with /etc/rc.d/altqd onestart and altqd runs fine and as expected.

netbsd-test1# /etc/rc.d/altqd onestart
Starting altqd.
netbsd-test1# 

Now, I want to stop altqd with /etc/rc.d/altqd onestop:

netbsd-test1# /etc/rc.d/altqd onestop
Stopping altqd.

Please note that command doesn't finish. I have to send ^C:

Stopping altqd.
^C
netbsd-test1# 

Now altqd daemon takes 100% of a CPU:
load averages:  0.56,  0.15,  0.05;               up 0+00:05:12        18:05:53
21 processes: 19 sleeping, 2 on CPU
CPU states: 50.0% user,  0.0% nice,  0.0% system,  0.0% interrupt, 50.0% idle
Memory: 66M Act, 10M Exec, 41M File, 848M Free
Swap: 1020M Total, 1020M Free / Pools: 36M Used

  PID USERNAME PRI NICE   SIZE   RES STATE       TIME   WCPU    CPU COMMAND
 1575 root      31    0    20M 1552K CPU/0       1:19 96.54% 95.72% altqd
 4025 root      43    0    24M 2200K CPU/1       0:00  0.00%  0.00% top
    0 root      96    0     0K   29M sccomp/0    0:00  0.00%  0.00% [system]
 1316 root      85    0    73M 5624K wait/0      0:00  0.00%  0.00% login
 1279 postfix   85    0    59M 4396K kqueue/1    0:00  0.00%  0.00% qmgr
 1210 postfix   85    0    59M 4364K kqueue/1    0:00  0.00%  0.00% pickup
  950 root      85    0    73M 3176K poll/1      0:00  0.00%  0.00% sshd
 1205 root      85    0    59M 2884K kqueue/0    0:00  0.00%  0.00% master
  568 root      85    0    34M 2384K kqueue/1    0:00  0.00%  0.00% syslogd
 1452 root      85    0    20M 2184K wait/0      0:00  0.00%  0.00% sh
  453 _dhcpcd   85    0    19M 2176K poll/1      0:00  0.00%  0.00% dhcpcd
  458 root      85    0    19M 1732K poll/1      0:00  0.00%  0.00% dhcpcd
 1322 root      85    0    20M 1724K nanosl/1    0:00  0.00%  0.00% cron
 1239 root      85    0    20M 1684K ttyraw/1    0:00  0.00%  0.00% getty
 1323 root      85    0    20M 1680K ttyraw/0    0:00  0.00%  0.00% getty
 1135 root      85    0    20M 1676K ttyraw/1    0:00  0.00%  0.00% getty
    1 root      85    0    20M 1644K wait/0      0:00  0.00%  0.00% init
 1331 root      85    0    24M 1520K kqueue/1    0:00  0.00%  0.00% inetd


netbsd-test1# gdb -p 1575
GNU gdb (GDB) 11.0.50.20200914-git
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "x86_64--netbsd".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word".
Attaching to process 1575
Reading symbols from /usr/sbin/altqd...
(No debugging symbols found in /usr/sbin/altqd)
Reading symbols from /usr/lib/libutil.so.7...
(No debugging symbols found in /usr/lib/libutil.so.7)
Reading symbols from /usr/lib/libm.so.0...
(No debugging symbols found in /usr/lib/libm.so.0)
Reading symbols from /usr/lib/libc.so.12...
(No debugging symbols found in /usr/lib/libc.so.12)
--Type <RET> for more, q to quit, c to continue without paging-- 
Reading symbols from /usr/libexec/ld.elf_so...
(No debugging symbols found in /usr/libexec/ld.elf_so)
[Switching to LWP 1575 of process 1575]
0x0000000011e05649 in qop_clear ()
(gdb) bt
#0  0x0000000011e05649 in qop_clear ()
#1  0x0000000011e05774 in qop_delete_if ()
#2  0x0000000011e058c6 in qcmd_destroyall ()
#3  0x0000000011e12c45 in main ()
(gdb)

Crash returns:
crash> bt/t 0t1575
crash: kvm_read(0x7f7fff66bd58, 8): kvm_read: Bad address
trace: pid 1575 lid 1575
crash>> ps
PID    LID S CPU     FLAGS       STRUCT LWP *               NAME WAIT
4317 >4317 7   1       100   fffff6299b7a0100              crash
1575 >1575 7   0     40100   fffff6299b881300              altqd
1452  1452 3   0       180   fffff62960d5c6c0                 sh wait
1323  1323 3   0       180   fffff6299c0ac280              getty ttyraw
1239  1239 3   1       180   fffff6299b881740              getty ttyraw
1135  1135 3   1       180   fffff6299c0d5b40              getty ttyraw
1316  1316 3   0       180   fffff6299c0d5700              login wait
1322  1322 3   1       180   fffff6299b717080               cron nanoslp
1331  1331 3   1       180   fffff62999c09140              inetd kqueue
1279  1279 3   1       180   fffff6299b881b80               qmgr kqueue
1210  1210 3   1       180   fffff6299ad700c0             pickup kqueue
1205  1205 3   0       180   fffff6299b7174c0             master kqueue
950    950 3   1       180   fffff6299ad70940               sshd poll
937    937 3   0       180   fffff6299b7a0540             powerd kqueue
568    568 3   1       180   fffff6299b9da340            syslogd kqueue
368    368 3   0       180   fffff6299b7b2480             dhcpcd poll
466    466 3   1       180   fffff6299b7b2040             dhcpcd poll
458    458 3   1       180   fffff6299b9da780             dhcpcd poll
453    453 3   1       180   fffff6299b9dabc0             dhcpcd poll
1        1 3   0       180   fffff6299c143240               init wait
0      162 3   0       200   fffff6299c0ac6c0            physiod physiod
0      206 3   0       200   fffff6299c1fc500          pooldrain pooldrain
0      205 3   0       200   fffff6299c0d52c0            ioflush syncer
0      204 3   1       200   fffff6299c0acb00           pgdaemon pgdaemon
0      201 3   1       200   fffff6299c143ac0          swwreboot swwreboot
0      200 3   1       200   fffff6299c1fc0c0      iscsi_cleanup cleanup
0      199 3   0       200   fffff6299c1fc940          atapibus0 sccomp
0      196 3   0       200   fffff6299c1e7900               usb3 usbevt
0      195 3   1       200   fffff6299c1e74c0               usb2 usbevt
0      194 3   0       200   fffff6299c1e7080               usb1 usbevt
0      193 3   0       200   fffff6299c1d1100               usb0 usbevt
0      192 3   1       200   fffff6299c143680             npfgc0 npfgcw
0      177 3   1       200   fffff6299c166a80            rt_free rt_free
0       94 3   1       200   fffff6299c166640              unpgc unpgc
0       31 3   1       200   fffff6299c166200    key_timehandler key_timehandler
0       63 3   1       200   fffff6299c151a40    icmp6_wqinput/1 icmp6_wqinput
0      126 3   0       200   fffff6299c151600    icmp6_wqinput/0 icmp6_wqinput
0      125 3   1       200   fffff6299c1511c0          nd6_timer nd6_timer
0      124 3   1       200   fffff6299c17ca00    carp6_wqinput/1 carp6_wqinput
0      123 3   0       200   fffff6299c17c5c0    carp6_wqinput/0 carp6_wqinput
0      122 3   1       200   fffff6299c17c180     carp_wqinput/1 carp_wqinput
0      121 3   0       200   fffff6299c1279c0     carp_wqinput/0 carp_wqinput
0      120 3   1       200   fffff6299c127580     icmp_wqinput/1 icmp_wqinput
0      119 3   0       200   fffff6299c127140     icmp_wqinput/0 icmp_wqinput
0      118 3   1       200   fffff6299c1d1980           rt_timer rt_timer
0      117 3   0       200   fffff6299c1d1540        vmem_rehash vmem_rehash
0      108 3   1       200   fffff62960e8c8c0          entbutler entropy
0      107 3   0       240   fffff62960e8c480            atabus5 atath
0      106 3   1       240   fffff62960e8c040            atabus4 atath
0      105 3   0       240   fffff62960f6bbc0            atabus3 atath
0      104 3   1       240   fffff62960f6b780            atabus2 atath
0      103 3   0       240   fffff62960f6b340            atabus1 atath
0      102 3   0       240   fffff62960e6bb80            atabus0 atath
0      101 3   0       200   fffff62960e6b740         usbtask-dr usbtsk
0      100 3   0       200   fffff62960e6b300         usbtask-hc usbtsk
0       99 3   1       200   fffff62960d33b40              viomb balloon
0       98 3   0       200   fffff62960d33700           wm0Reset wm0Reset
0       97 3   1       200   fffff62960d332c0          wm0TxRx/1 wm0TxRx
0       96 3   0       200   fffff62960d5cb00          wm0TxRx/0 wm0TxRx
0       29 3   0       200   fffff62960d5c280               pms0 pmsreset
0       28 3   1       200   fffff62960b2dac0            xcall/1 xcall
0       27 1   1       200   fffff62960b2d680          softser/1
0       26 1   1       200   fffff62960b2d240          softclk/1
0       25 1   1       200   fffff62960aeca80          softbio/1
0       24 1   1       200   fffff62960aec640          softnet/1
0       23 1   1       201   fffff62960aec200             idle/1
0       22 3   0       200   fffff6299c356a40           lnxsyswq lnxsyswq
0       21 3   0       200   fffff6299c356600           lnxubdwq lnxubdwq
0       20 3   0       200   fffff6299c3561c0           lnxpwrwq lnxpwrwq
0       19 3   0       200   fffff6299c371a00           lnxlngwq lnxlngwq
0       18 3   0       200   fffff6299c3715c0           lnxhipwq lnxhipwq
0       17 3   0       200   fffff6299c371180           lnxrcugc lnxrcugc
0       16 3   0       200   fffff6299c37e9c0             sysmon smtaskq
0       15 3   0       200   fffff6299c37e580         pmfsuspend pmfsuspend
0       14 3   0       200   fffff6299c37e140           pmfevent pmfevent
0       13 3   0       200   fffff6299c39b980         sopendfree sopendfr
0       12 3   0       200   fffff6299c39b540             ifwdog ifwdog
0       11 3   0       200   fffff6299c39b100            iflnkst iflnkst
0       10 3   0       200   fffff6299c5c2940           nfssilly nfssilly
0        9 3   0       200   fffff6299c5c2500             vdrain vdrain
0        8 3   0       200   fffff6299c5c20c0          modunload mod_unld
0        7 3   0       200   fffff6299c5f3900            xcall/0 xcall
0        6 1   0       200   fffff6299c5f34c0          softser/0
0        5 1   0       200   fffff6299c5f3080          softclk/0
0        4 1   0       200   fffff6299c61d8c0          softbio/0
0        3 1   0       200   fffff6299c61d480          softnet/0
0        2 1   0       201   fffff6299c61d040             idle/0
0        0 3   0       200   ffffffff8188a740            swapper uvm
crash> ps/w
PID   LID          COMMAND     EMUL  PRI WAIT-MSG    WAIT-CHANNEL
4317 >4317            crash   netbsd   43              0
1575 >1575            altqd   netbsd   28              0
1452  1452               sh   netbsd   43 wait         fffff6299bf75098
1323  1323            getty   netbsd   43 ttyraw       fffff6299b112b08
1239  1239            getty   netbsd   43 ttyraw       fffff6299b852a88
1135  1135            getty   netbsd   43 ttyraw       fffff6299be68988
1316  1316            login   netbsd   43 wait         fffff6299c1fb3d8
1322  1322             cron   netbsd   43 nanoslp      fffff6299b717080
1331  1331            inetd   netbsd   43 kqueue       fffff629944036a0
1279  1279             qmgr   netbsd   43 kqueue       fffff62995050a60
1210  1210           pickup   netbsd   43 kqueue       fffff629945028e0
1205  1205           master   netbsd   43 kqueue       fffff62997a42b20
950    950             sshd   netbsd   43 poll         fffff6299c321700
937    937           powerd   netbsd   43 kqueue       fffff6299a9f19a0
568    568          syslogd   netbsd   43 kqueue       fffff6299bf98fa0
368    368           dhcpcd   netbsd   43 poll         fffff6299c6fd400
466    466           dhcpcd   netbsd   43 poll         fffff6299c321700
458    458           dhcpcd   netbsd   43 poll         fffff6299c321700
453    453           dhcpcd   netbsd   43 poll         fffff6299c321700
1        1             init   netbsd   43 wait         fffff6299c1fb058
0      162           system   netbsd  123 physiod      fffff6299be164c8
0      206           system   netbsd  125 pooldrain    ffffffff8190f980
0      205           system   netbsd  124 syncer       fffff6299c0d52c0
0      204           system   netbsd  126 pgdaemon     ffffffff8190d548
0      201           system   netbsd   43 swwreboot    fffff6299c024088
0      200           system   netbsd   96 cleanup      ffffffff819834f0
0      199           system   netbsd   96 sccomp       ffffd38002448ba0
0      196           system   netbsd   96 usbevt       fffff62960f674b8
0      195           system   netbsd   96 usbevt       ffffd38001f2a478
0      194           system   netbsd   96 usbevt       ffffd38001f28478
0      193           system   netbsd   96 usbevt       ffffd38001f26478
0      192           system   netbsd   96 npfgcw       fffff6299c14f7c8
0      177           system   netbsd  222 rt_free      fffff6299c2829c8
0       94           system   netbsd   96 unpgc        ffffffff81980ab0
0       31           system   netbsd  222 key_timehandler fffff6299c282888
0       63           system   netbsd  222 icmp6_wqinput fffff6299c1a8ec8
0      126           system   netbsd  222 icmp6_wqinput fffff6299c1a8e88
0      125           system   netbsd  222 nd6_timer    fffff6299c282748
0      124           system   netbsd  222 carp6_wqinput fffff6299c1a88c8
0      123           system   netbsd  222 carp6_wqinput fffff6299c1a8888
0      122           system   netbsd  222 carp_wqinput fffff6299c1a8448
0      121           system   netbsd  222 carp_wqinput fffff6299c1a8408
0      120           system   netbsd  222 icmp_wqinput fffff6299c1f5e88
0      119           system   netbsd  222 icmp_wqinput fffff6299c1f5e48
0      118           system   netbsd  222 rt_timer     fffff6299c282608
0      117           system   netbsd  125 vmem_rehash  fffff6299c2824c8
0      108           system   netbsd   43 entropy      ffffffff818b21a8
0      107           system   netbsd   96 atath        ffffd3800244aba0
0      106           system   netbsd   96 atath        ffffd3800244a3b0
0      105           system   netbsd   96 atath        ffffd38002449bc0
0      104           system   netbsd   96 atath        ffffd380024493d0
0      103           system   netbsd   96 atath        ffffd38002448be0
0      102           system   netbsd   96 atath        ffffd380024483f0
0      101           system   netbsd   96 usbtsk       ffffffff818d52d8
0      100           system   netbsd   96 usbtsk       ffffffff818d5298
0       99           system   netbsd    0 balloon      fffff62960db6608
0       98           system   netbsd  222 wm0Reset     fffff62960c67708
0       97           system   netbsd  222 wm0TxRx      fffff62960da30c8
0       96           system   netbsd  222 wm0TxRx      fffff62960da3088
0       29           system   netbsd   96 pmsreset     fffff62960ca4e94
0       28           system   netbsd  127 xcall        ffffd3806191c010
0       27           system   netbsd  223              0
0       26           system   netbsd  220              0
0       25           system   netbsd  221              0
0       24           system   netbsd  222              0
0       23           system   netbsd    0              0
0       22           system   netbsd   43 lnxsyswq     fffff6299c6fdc08
0       21           system   netbsd   43 lnxubdwq     fffff6299c6fdb08
0       20           system   netbsd   43 lnxpwrwq     fffff6299c6fda08
0       19           system   netbsd   43 lnxlngwq     fffff6299c6fd908
0       18           system   netbsd   43 lnxhipwq     fffff6299c6fd808
0       17           system   netbsd   43 lnxrcugc     ffffffff818b0788
0       16           system   netbsd   96 smtaskq      ffffffff818f5f60
0       15           system   netbsd   43 pmfsuspend   fffff6299c633808
0       14           system   netbsd   43 pmfevent     fffff6299c6336c8
0       13           system   netbsd   96 sopendfr     ffffffff81980a30
0       12           system   netbsd  222 ifwdog       fffff6299c633588
0       11           system   netbsd  222 iflnkst      fffff6299c633448
0       10           system   netbsd   43 nfssilly     fffff6299c633308
0        9           system   netbsd  125 vdrain       ffffffff81981c30
0        8           system   netbsd  125 mod_unld     ffffffff819738b0
0        7           system   netbsd  127 xcall        ffffffff8183bcd0
0        6           system   netbsd  223              0
0        5           system   netbsd  220              0
0        4           system   netbsd  221              0
0        3           system   netbsd  222              0
0        2           system   netbsd    0              0
0        0           system   netbsd  125 uvm          ffffffff8188a740
crash>  continue
netbsd-test1# 

kill -9 stops altqd. It's a nice feature of NetBSD 10.0. With -9, altqd couldn't be killed by kill -9.
>How-To-Repeat:
Build a kernel with ALTQ support:
GENERIC config file with
options     ALTQ        # Manipulate network interfaces' output queues
options     ALTQ_BLUE   # Stochastic Fair Blue
options     ALTQ_CBQ    # Class-Based Queueing
options     ALTQ_CDNR   # Diffserv Traffic Conditioner
options     ALTQ_FIFOQ  # First-In First-Out Queue
options     ALTQ_FLOWVALVE  # RED/flow-valve (red-penalty-box)
options     ALTQ_HFSC   # Hierarchical Fair Service Curve
options     ALTQ_LOCALQ # Local queueing discipline
options     ALTQ_PRIQ   # Priority Queueing
options     ALTQ_RED    # Random Early Detection
options     ALTQ_RIO    # RED with IN/OUT
options     ALTQ_WFQ    # Weighted Fair Queueing

Configure altq.conf, start altq.conf et stop daemon.

>Fix:

>Release-Note:

>Audit-Trail:
From: Taylor R Campbell <riastradh@NetBSD.org>
To: =?UTF-8?Q?BERTRAND_Jo=c3=abl?= <joel.bertrand@systella.fr>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/57171: altqd takes 100% of a CPU when daemon is stopped
Date: Fri, 6 Jan 2023 19:28:49 +0000

 Can you do the following:

 1. When altqd is hanging and taking 100% CPU, hit ^T at the terminal
    and show what it prints?

 2. Install the debug.tgz set (or debug.tar.xz) in the VM, try
    attaching gdb again, and do `bt' and `info locals'?

 It might be stuck in logic in userland.  Possible there's some bug
 with the data structure management in qop.c that sends it into an
 infinite loop.

From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/57171: altqd takes 100% of a CPU when daemon is stopped
Date: Sat, 07 Jan 2023 04:18:32 +0700

     Date:        Fri,  6 Jan 2023 19:30:02 +0000 (UTC)
     From:        Taylor R Campbell <riastradh@NetBSD.org>
     Message-ID:  <20230106193002.6D4751A923A@mollari.NetBSD.org>

   |  1. When altqd is hanging and taking 100% CPU, hit ^T at the terminal
   |     and show what it prints?

 Unless altqd is running in the same process group as the terminal (which
 seems unlikely for a daemon started by rc.d scripts) that isn't going to
 help.

   |  It might be stuck in logic in userland.

 From the output shown, that's almost certain (no kernel stack backtrace
 is possible, as there's no kernel stack at the time).

   | Possible there's some bug with the data structure management
   | in qop.c that sends it into an infinite loop.

 Seems plausible, qop_clear() (where things are being stuck, apparently)
 has 2 loops, neither of which is manifestly certain to terminate, though
 what actually happens depends upon what is actually in the lists being
 manipulated.

 But in the output interface case, if every element on the list has a
 child, then nothing happens, the list will never change, and hence
 never become empty, and thus, the code will loop forever.  But of course,
 only if every entry on the list has a child, if that's impossible, then
 each iteration must delete something.

 For the input interface case, there's a similar potential problem, if
 nothing on the list has a parent that is root, and root has no child,
 then nothing changes, and if the list is not empty, it will loop forever.
 Again, I have no idea if this is possible.   [Also, to me, the
 qop_delete_class(root) inside the while(!LIST_EMPTY()) loop looks
 very suspicious.]

 kre

From: =?UTF-8?Q?BERTRAND_Jo=c3=abl?= <joel.bertrand@systella.fr>
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org, gnats-admin@netbsd.org,
        netbsd-bugs@netbsd.org
Cc: 
Subject: Re: kern/57171: altqd takes 100% of a CPU when daemon is stopped
Date: Sat, 7 Jan 2023 15:03:11 +0100

 Taylor R Campbell a écrit :
 > The following reply was made to PR kern/57171; it has been noted by GNATS.
 > 
 > From: Taylor R Campbell <riastradh@NetBSD.org>
 > To: =?UTF-8?Q?BERTRAND_Jo=c3=abl?= <joel.bertrand@systella.fr>
 > Cc: gnats-bugs@NetBSD.org
 > Subject: Re: kern/57171: altqd takes 100% of a CPU when daemon is stopped
 > Date: Fri, 6 Jan 2023 19:28:49 +0000
 > 
 >  Can you do the following:
 >  
 >  1. When altqd is hanging and taking 100% CPU, hit ^T at the terminal
 >     and show what it prints?

 netbsd-test1# /etc/rc.d/altqd onestart
 Starting altqd.
 netbsd-test1# /etc/rc.d/altqd onestop
 Stopping altqd.
 [ 1918.6360379] load: 0.08  cmd: sleep 2081 [nanoslp] 0.00u 0.00s 0% 1520k
 sleep: Between -1 and 0 seconds left out of the original 0 and a bit

 >  2. Install the debug.tgz set (or debug.tar.xz) in the VM, try
 >     attaching gdb again, and do `bt' and `info locals'?

 netbsd-test1# gdb -p 1779
 GNU gdb (GDB) 11.0.50.20200914-git
 Copyright (C) 2020 Free Software Foundation, Inc.
 License GPLv3+: GNU GPL version 3 or later
 <http://gnu.org/licenses/gpl.html>
 This is free software: you are free to change and redistribute it.
 There is NO WARRANTY, to the extent permitted by law.
 Type "show copying" and "show warranty" for details.
 This GDB was configured as "x86_64--netbsd".
 Type "show configuration" for configuration details.
 For bug reporting instructions, please see:
 <https://www.gnu.org/software/gdb/bugs/>.
 Find the GDB manual and other documentation resources online at:
     <http://www.gnu.org/software/gdb/documentation/>.

 For help, type "help".
 Type "apropos word" to search for commands related to "word".
 Attaching to process 1779
 Reading symbols from /usr/sbin/altqd...
 (No debugging symbols found in /usr/sbin/altqd)
 Reading symbols from /usr/lib/libutil.so.7...
 Reading symbols from /usr/libdata/debug//usr/lib/libutil.so.7.24.debug...
 Reading symbols from /usr/lib/libm.so.0...
 Reading symbols from /usr/libdata/debug//usr/lib/libm.so.0.12.debug...
 Reading symbols from /usr/lib/libc.so.12...
 Reading symbols from /usr/libdata/debug//usr/lib/libc.so.12.220.debug...
 Reading symbols from /usr/libexec/ld.elf_so...
 Reading symbols from /usr/libdata/debug//libexec/ld.elf_so.debug...
 [Switching to LWP 1779 of process 1779]
 0x00000000a9e0564b in qop_clear ()
 (gdb) bt
 #0  0x00000000a9e0564b in qop_clear ()
 #1  0x00000000a9e05774 in qop_delete_if ()
 #2  0x00000000a9e058c6 in qcmd_destroyall ()
 #3  0x00000000a9e12c45 in main ()
 (gdb) info locals
 No symbol table info available.
 (gdb)

 	Please not that there is no altqd or altqd.debug in debug.tar.xz.

 >  It might be stuck in logic in userland.  Possible there's some bug
 >  with the data structure management in qop.c that sends it into an
 >  infinite loop.
 >  
 > 

From: Robert Elz <kre@munnari.OZ.AU>
To: =?UTF-8?Q?BERTRAND_Jo=c3=abl?= <joel.bertrand@systella.fr>
Cc: gnats-bugs@netbsd.org
Subject: Re: kern/57171: altqd takes 100% of a CPU when daemon is stopped
Date: Sat, 07 Jan 2023 23:13:00 +0700

     Date:        Sat, 7 Jan 2023 15:03:11 +0100
     From:        =3D?UTF-8?Q?BERTRAND_Jo=3Dc3=3Dabl?=3D <joel.bertrand=40=
 systella.fr>
     Message-ID:  <51e63d08-760d-819c-ff82-22527578c76d=40systella.fr>

   =7C 	Please not that there is no altqd or altqd.debug in debug.tar.xz.

 That's very odd - these are sets you downloaded?

 It is certainly there in both 10_BETA and HEAD debug.tar.xz sets
 in builds I have done for myself (which shouldn't be vastly different
 than the NetBSD builds I think):

 tar tzf debug.tar.xz =7C grep altq
 ./usr/lib/librumpnet_altq_g.a
 ./usr/libdata/debug/usr/lib/librumpnet_altq.so.0.0.debug
 ./usr/libdata/debug/usr/sbin/altqd.debug
 ./usr/libdata/debug/usr/sbin/altqstat.debug

 The results are the same for both HEAD and 10_BETA (ie: nothing has
 changed since 10 was branched that changes this).

 kre

State-Changed-From-To: open->feedback
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 19 Jul 2023 19:58:05 +0000
State-Changed-Why:
feedback requested about debug data


From: =?UTF-8?Q?BERTRAND_Jo=c3=abl?= <joel.bertrand@systella.fr>
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org,
        gnats-admin@netbsd.org, riastradh@NetBSD.org
Cc: 
Subject: Re: kern/57171 (altqd takes 100% of a CPU when daemon is stopped)
Date: Sat, 29 Jul 2023 09:47:54 +0200

 riastradh@NetBSD.org wrote:
 > Synopsis: altqd takes 100% of a CPU when daemon is stopped
 > 
 > State-Changed-From-To: open->feedback
 > State-Changed-By: riastradh@NetBSD.org
 > State-Changed-When: Wed, 19 Jul 2023 19:58:05 +0000
 > State-Changed-Why:
 > feedback requested about debug data
 > 
 > 
 > 

 	I have rebuilt my tree, started and stopped altqd.

 legendre# gdb /usr/sbin/altqd 22610
 GNU gdb (GDB) 11.0.50.20200914-git
 Copyright (C) 2020 Free Software Foundation, Inc.
 License GPLv3+: GNU GPL version 3 or later
 <http://gnu.org/licenses/gpl.html>
 This is free software: you are free to change and redistribute it.
 There is NO WARRANTY, to the extent permitted by law.
 Type "show copying" and "show warranty" for details.
 This GDB was configured as "x86_64--netbsd".
 Type "show configuration" for configuration details.
 For bug reporting instructions, please see:
 <https://www.gnu.org/software/gdb/bugs/>.
 Find the GDB manual and other documentation resources online at:
      <http://www.gnu.org/software/gdb/documentation/>.

 For help, type "help".
 Type "apropos word" to search for commands related to "word"...
 Reading symbols from /usr/sbin/altqd...
 Reading symbols from /usr/libdata/debug//usr/sbin/altqd.debug...
 Attaching to program: /usr/sbin/altqd, process 22610
 Reading symbols from /usr/lib/libutil.so.7...
 Reading symbols from /usr/libdata/debug//usr/lib/libutil.so.7.24.debug...
 Reading symbols from /usr/lib/libm.so.0...
 Reading symbols from /usr/libdata/debug//usr/lib/libm.so.0.12.debug...
 Reading symbols from /usr/lib/libc.so.12...
 --Type <RET> for more, q to quit, c to continue without paging--
 Reading symbols from /usr/libdata/debug//usr/lib/libc.so.12.220.debug...
 Reading symbols from /usr/libexec/ld.elf_so...
 Reading symbols from /usr/libdata/debug//usr/libexec/ld.elf_so.debug...
 [Switching to LWP 22610 of process 22610]
 qop_clear (ifinfo=0x7253a46be480)
      at /usr/src/netbsd-10/src/usr.sbin/altq/libaltq/qop.c:467
 467                     while (!LIST_EMPTY(&ifinfo->cllist)) {
 (gdb) bt
 #0  qop_clear (ifinfo=0x7253a46be480)
      at /usr/src/netbsd-10/src/usr.sbin/altq/libaltq/qop.c:467
 #1  0x0000000152a05774 in qop_delete_if (ifinfo=0x7253a46be480)
      at /usr/src/netbsd-10/src/usr.sbin/altq/libaltq/qop.c:394
 #2  0x0000000152a058c6 in qcmd_destroyall ()
      at /usr/src/netbsd-10/src/usr.sbin/altq/libaltq/qop.c:204
 #3  0x0000000152a12c45 in main (argc=<optimized out>, argv=<optimized out>)
      at /usr/src/netbsd-10/src/usr.sbin/altq/altqd/altqd.c:313
 (gdb) info locals
 root = <optimized out>
 clinfo = <optimized out>
 (gdb)

 	Best regards,

 	JKB

From: =?UTF-8?Q?BERTRAND_Jo=c3=abl?= <joel.bertrand@systella.fr>
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org,
        gnats-admin@netbsd.org, riastradh@NetBSD.org
Cc: 
Subject: Re: kern/57171 (altqd takes 100% of a CPU when daemon is stopped)
Date: Tue, 1 Aug 2023 12:03:45 +0200

 [Switching to LWP 6565 of process 6565]
 qop_clear (ifinfo=0x75c271ca3480)
     at /usr/src/netbsd-10/src/usr.sbin/altq/libaltq/qop.c:468
 468                             LIST_FOREACH(clinfo, &ifinfo->cllist, next)
 (gdb) print *ifinfo
 $1 = {next = {le_next = 0x75c271ca3360, le_prev = 0xfa418b88 <qop_iflist>},
   ifname = 0x75c271ccc0d0 "_re0", bandwidth = 0, ifmtu = 1500, ifindex = 4,
   enabled = 0, cllist = {lh_first = 0x75c271ca34e0}, fltr_rules = {
     lh_first = 0x0}, resv_class = 0x0, qdisc = 0xfa418740 <cdnr_qdisc>,
   private = 0x0, enable_hook = 0x0, delete_hook = 0x0}
 (gdb) print ifinfo->cllist->lh_first
 $2 = (struct classinfo *) 0x75c271ca34e0
 (gdb) print *ifinfo->cllist->lh_first
 $3 = {next = {le_next = 0x0, le_prev = 0x75c271ca34a8}, handle = 0,
   clname = 0x75c271ca60e0 "cdnr_root", ifinfo = 0x75c271ca3480, parent =
 0x0,
   sibling = 0x0, child = 0x0, fltrlist = {lh_first = 0x0}, private = 0x0,
   delete_hook = 0x0}
 (gdb) print ifinfo->cllist->lh_first->next.le_next
 $4 = (struct classinfo *) 0x0

 	I have rebuilt a clean tree, I always obtain <optimized out> for locals...

 	Best regards,

 	JKB

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.