NetBSD Problem Report #57193
From ocb@l25.fi Sat Jan 21 19:01:21 2023
Return-Path: <ocb@l25.fi>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id CFC3E1A923B
for <gnats-bugs@gnats.NetBSD.org>; Sat, 21 Jan 2023 19:01:21 +0000 (UTC)
Message-Id: <20230121190058.036E72C0E59@r.l25.fi>
Date: Sat, 21 Jan 2023 20:00:57 +0100 (CET)
From: ocb@l25.fi
Reply-To: ocb@l25.fi
To: gnats-bugs@NetBSD.org
Subject: blacklistd(8) endless loop
X-Send-Pr-Version: 3.95
>Number: 57193
>Category: bin
>Synopsis: blacklistd runs into endless loop when multiple equal entries are in database
>Confidential: no
>Severity: non-critical
>Priority: low
>Responsible: christos
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Jan 21 19:05:00 +0000 2023
>Last-Modified: Sun Apr 23 12:50:01 +0000 2023
>Originator: ocb@l25.fi
>Release: NetBSD 9.2
>Organization:
>Environment:
System: NetBSD vps 9.2 NetBSD 9.2 (KERNEL1VM) #2: Sun Jul 10 17:11:03 CEST 2022 build@r:/home/build/obj/sys/arch/amd64/compile/KERNEL1VM amd64
Architecture: x86_64
Machine: amd64
>Description:
it appears to me that blacklistd(8) on 9.2 runs into endless loop when it tries to delete an address from blacklistd database which has been added multiple times. the reason it is added multiple times is possibly due to fast multi-threaded ssh brute force attackers.
another possible reason is from what man blacklistctl(8) says; 'Sometimes the reported number of failed attempts can exceed the number of attempts that blacklistd(8) is configured to block. This can happen either because the rule has been removed manually, or because there were more attempts in flight while the rule block was being added. This condition is normal; in that case blacklistd(8) will first attempt to remove the existing rule, and then it will re-add it to make sure that there is only one rule active.'
(gdb) bt
#0 0x00007bb81715dccc in hash4 (keyarg=<optimized out>, len=<optimized out>) at /usr/src/lib/libc/db/hash/hash_func.c:188
#1 0x00007bb81715a8a7 in __call_hash (len=288, k=0x7f7fffe67800 "\020\002", hashp=0x7bb817db8000) at /usr/src/lib/libc/db/hash/hash.c:901
#2 hash_access (hashp=0x7bb817db8000, action=HASH_DELETE, key=0x7f7fffe67660, val=0x0) at /usr/src/lib/libc/db/hash/hash.c:591
#3 0x0000000102e070a0 in state_del (db=0x7bb817ddf140, c=0x7f7fffe67800) at /usr/src/external/bsd/blacklist/bin/state.c:125
#4 0x0000000102e027c3 in update () at /usr/src/external/bsd/blacklist/bin/blacklistd.c:320
#5 0x0000000102e033aa in main (argc=0, argv=0x7f7fffe67a58) at /usr/src/external/bsd/blacklist/bin/blacklistd.c:564
# blacklistctl dump -n | wc -l
160
# blacklistctl dump -n | awk '{a[$0]++}END{for(i in a){if(a[i]-1)print i,a[i]}}'
123.30.187.208/32:22 4/5 2022/11/18 00:04:07 3
201.217.194.32/32:22 4/5 2022/11/17 12:42:46 3
^ there are three completely equal entries with equal times for above two ip addresses.
according to the stack trace, state_del is called with c=0x7f7fffe67800
(gdb) x/2 0x7f7fffe67800
0x7f7fffe67800: 0x16000210 0x20c2d9c9
0x20c2d9c9 = 201.217.194.32
it appears that it loops when trying to delete one of the duplicate entries. this situation has happened two times already in past two weeks before last blacklistd restart.
when this situation happens, blacklistd stops processing new failed login events from openssh and starts consuming cpu.
USER PID %CPU %MEM VSZ RSS TTY STAT STARTED TIME COMMAND
root 4412 94.7 0.1 21928 1244 ? Os 9Jan23 15671:59.44 /sbin/blacklistd
i have not found the code responsible for this yet, but will keep looking. will test current in the coming days, even though im not sure why is this happening yet and how to reproduce it. above thoughts are not confirmed.
--
(gdb) info registers
rax 0x72df0a52 1927219794
rbx 0x103009de8 4345339368
rcx 0xcec05492 3468711058
rdx 0x5 5
rsi 0x7f7fffe678fb 140187730868475
rdi 0x7f7fffe678fa 140187730868474
rbp 0x7f7fffe67a10 0x7f7fffe67a10
rsp 0x7f7fffe67970 0x7f7fffe67970
r8 0x7f7fffe678f5 140187730868469
r9 0x7f7fffe677f8 140187730868216
r10 0x0 0
r11 0x0 0
r12 0x10300a340 4345340736
r13 0x1 1
r14 0x7c252d55d278 136499116233336
r15 0x1 1
rip 0x102e033aa 0x102e033aa <main+1863>
eflags 0xa82 [ SF IF OF ]
cs 0x47 71
ss 0x3f 63
ds 0x23 35
es 0x23 35
fs 0x0 0
gs 0x0 0
fs_base <unavailable>
gs_base <unavailable>
(gdb) fr 0
#0 0x00007bb81715dccc in hash4 (keyarg=<optimized out>, len=<optimized out>)
at /usr/src/lib/libc/db/hash/hash_func.c:188
188 in /usr/src/lib/libc/db/hash/hash_func.c
(gdb) info locals
key = <optimized out>
loop = 5
len = <optimized out>
key = <optimized out>
loop = <optimized out>
keyarg = <optimized out>
h = <optimized out>
key = <optimized out>
loop = <optimized out>
(gdb) fr 1
#1 0x00007bb81715a8a7 in __call_hash (len=288, k=0x7f7fffe67800 "\020\002", hashp=0x7bb817db8000)
at /usr/src/lib/libc/db/hash/hash.c:901
901 /usr/src/lib/libc/db/hash/hash.c: No such file or directory.
(gdb) info locals
n = <optimized out>
bucket = <optimized out>
n = <optimized out>
bucket = <optimized out>
(gdb) fr 2
#2 hash_access (hashp=0x7bb817db8000, action=HASH_DELETE, key=0x7f7fffe67660, val=0x0)
at /usr/src/lib/libc/db/hash/hash.c:591
591 in /usr/src/lib/libc/db/hash/hash.c
(gdb) info locals
rbufp = <optimized out>
bufp = 0x103009de8
save_bufp = <optimized out>
bp = <optimized out>
n = <optimized out>
ndx = <optimized out>
off = 4096
size = 288
kp = 0x7f7fffe67800 "\020\002"
pageno = <optimized out>
(gdb) fr 3
#3 0x0000000102e070a0 in state_del (db=0x7bb817ddf140, c=0x7f7fffe67800) at /usr/src/external/bsd/blacklist/bin/state.c:125
125 /usr/src/external/bsd/blacklist/bin/state.c: No such file or directory.
(gdb) info locals
rv = 0
k = {data = 0x7f7fffe67800, size = 288}
__func__ = "state_del"
(gdb) fr 4
#4 0x0000000102e027c3 in update () at /usr/src/external/bsd/blacklist/bin/blacklistd.c:320
320 /usr/src/external/bsd/blacklist/bin/blacklistd.c: No such file or directory.
(gdb) info locals
when = 1668771766
ts = {tv_sec = 1673277103, tv_nsec = 240446134}
c = {c_ss = {ss_len = 16 '\020', ss_family = 2 '\002', __ss_pad1 = "\000\026\311\331\302 ", __ss_align = 0, __ss_pad2 = '\000' <repeats 111 times>}, c_lmask = 32, c_port = 22, c_proto = 6, c_family = 2, c_uid = 0, c_nfail = 5,
c_name = "blacklistd-vps", '\000' <repeats 113 times>, c_rmask = -1, c_duration = 86400}
dbi = {count = 4, last = 1668685366, id = '\000' <repeats 63 times>}
f = 1
n = 0
buf = "\226]@\371~\177\000\000\000\000\000\027\270{\000\000\000\340\337\027\270{\000\000\000\344\337\027\270{\000\000\000\350\337\027\270{\000\000\000\354\337\027\270{\000\000@\f@\371~\177\000\000\000\344\337\027\270{\000\000`x\346\377\177\177\000\000\000\340\337\027\270{\000\000\000\200\333\027\001", '\000' <repeats 12 times>, "\354\337\027\270{\000\000\001\000\000\000\000\000\000\000\260\236\000\027\270{\000\000\000\000\000\000\000\000\000"
ss = 0x7f7fffe67800
__func__ = "update"
(gdb) fr 5
#5 0x0000000102e033aa in main (argc=0, argv=0x7f7fffe67a58) at /usr/src/external/bsd/blacklist/bin/blacklistd.c:564
564 in /usr/src/external/bsd/blacklist/bin/blacklistd.c
(gdb) info locals
t = 0
c = -1
tout = 15000
flags = 4196354
flush = 0
restore = 0
ret = 1
spath = 0x0
blsock = 0x0
nblsock = 0
maxblsock = 0
pfd = 0x7bb817ddf0f0
bl = 0x7bb817ddf000
nfd = 1
maxfd = 10
>How-To-Repeat:
>Fix:
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: port-amd64-maintainer->christos
Responsible-Changed-By: riastradh@NetBSD.org
Responsible-Changed-When: Sun, 22 Jan 2023 20:50:50 +0000
Responsible-Changed-Why:
over to blocklistd maintainer
From: ocb@l25.fi
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/57193 (blacklistd runs into endless loop when multiple equal entries are in database)
Date: Sun, 23 Apr 2023 13:25:38 +0200 (CEST)
this may have been caused by misconfigured npf which did not correctly add IP addresses to blacklist database, therefore it encountered what blacklistd manual says or an unknown race condition with a prerequisite that the source address is connecting with many threads. probability to encounter this condition should increase if nfail is set to a larger number.
was not able to reproduce on current, but did not test much.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.