NetBSD Problem Report #57193

From ocb@l25.fi  Sat Jan 21 19:01:21 2023
Return-Path: <ocb@l25.fi>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id CFC3E1A923B
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 21 Jan 2023 19:01:21 +0000 (UTC)
Message-Id: <20230121190058.036E72C0E59@r.l25.fi>
Date: Sat, 21 Jan 2023 20:00:57 +0100 (CET)
From: ocb@l25.fi
Reply-To: ocb@l25.fi
To: gnats-bugs@NetBSD.org
Subject: blacklistd(8) endless loop 
X-Send-Pr-Version: 3.95

>Number:         57193
>Category:       bin
>Synopsis:       blacklistd runs into endless loop when multiple equal entries are in database
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    christos
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jan 21 19:05:00 +0000 2023
>Last-Modified:  Sun Apr 23 12:50:01 +0000 2023
>Originator:     ocb@l25.fi
>Release:        NetBSD 9.2
>Organization:

>Environment:


System: NetBSD vps 9.2 NetBSD 9.2 (KERNEL1VM) #2: Sun Jul 10 17:11:03 CEST 2022 build@r:/home/build/obj/sys/arch/amd64/compile/KERNEL1VM amd64
Architecture: x86_64
Machine: amd64
>Description:

it appears to me that blacklistd(8) on 9.2 runs into endless loop when it tries to delete an address from blacklistd database which has been added multiple times. the reason it is added multiple times is possibly due to fast multi-threaded ssh brute force attackers. 

another possible reason is from what man blacklistctl(8) says; 'Sometimes the reported number of failed attempts can exceed the number of attempts that blacklistd(8) is configured to block. This can happen either because the rule has been removed manually, or because there were more attempts in flight while the rule block was being added. This condition is normal; in that case blacklistd(8) will first attempt to remove the existing rule, and then it will re-add it to make sure that there is only one rule active.'

(gdb) bt
#0  0x00007bb81715dccc in hash4 (keyarg=<optimized out>, len=<optimized out>) at /usr/src/lib/libc/db/hash/hash_func.c:188
#1  0x00007bb81715a8a7 in __call_hash (len=288, k=0x7f7fffe67800 "\020\002", hashp=0x7bb817db8000) at /usr/src/lib/libc/db/hash/hash.c:901
#2  hash_access (hashp=0x7bb817db8000, action=HASH_DELETE, key=0x7f7fffe67660, val=0x0) at /usr/src/lib/libc/db/hash/hash.c:591
#3  0x0000000102e070a0 in state_del (db=0x7bb817ddf140, c=0x7f7fffe67800) at /usr/src/external/bsd/blacklist/bin/state.c:125
#4  0x0000000102e027c3 in update () at /usr/src/external/bsd/blacklist/bin/blacklistd.c:320
#5  0x0000000102e033aa in main (argc=0, argv=0x7f7fffe67a58) at /usr/src/external/bsd/blacklist/bin/blacklistd.c:564

# blacklistctl dump -n | wc -l
   160

# blacklistctl dump -n | awk '{a[$0]++}END{for(i in a){if(a[i]-1)print i,a[i]}}'
   123.30.187.208/32:22           4/5     2022/11/18 00:04:07 3
   201.217.194.32/32:22           4/5     2022/11/17 12:42:46 3

^ there are three completely equal entries with equal times for above two ip addresses.

according to the stack trace, state_del is called with c=0x7f7fffe67800

(gdb) x/2 0x7f7fffe67800
0x7f7fffe67800:   0x16000210      0x20c2d9c9

0x20c2d9c9 = 201.217.194.32

it appears that it loops when trying to delete one of the duplicate entries. this situation has happened two times already in past two weeks before last blacklistd restart.

when this situation happens, blacklistd stops processing new failed login events from openssh and starts consuming cpu.

USER  PID %CPU %MEM   VSZ  RSS TTY STAT STARTED        TIME COMMAND
root 4412 94.7  0.1 21928 1244 ?   Os    9Jan23 15671:59.44 /sbin/blacklistd

i have not found the code responsible for this yet, but will keep looking. will test current in the coming days, even though im not sure why is this happening yet and how to reproduce it. above thoughts are not confirmed.

--

(gdb) info registers
rax            0x72df0a52          1927219794
rbx            0x103009de8         4345339368
rcx            0xcec05492          3468711058
rdx            0x5                 5
rsi            0x7f7fffe678fb      140187730868475
rdi            0x7f7fffe678fa      140187730868474
rbp            0x7f7fffe67a10      0x7f7fffe67a10
rsp            0x7f7fffe67970      0x7f7fffe67970
r8             0x7f7fffe678f5      140187730868469
r9             0x7f7fffe677f8      140187730868216
r10            0x0                 0
r11            0x0                 0
r12            0x10300a340         4345340736
r13            0x1                 1
r14            0x7c252d55d278      136499116233336
r15            0x1                 1
rip            0x102e033aa         0x102e033aa <main+1863>
eflags         0xa82               [ SF IF OF ]
cs             0x47                71
ss             0x3f                63
ds             0x23                35
es             0x23                35
fs             0x0                 0
gs             0x0                 0
fs_base        <unavailable>
gs_base        <unavailable>

(gdb) fr 0
#0  0x00007bb81715dccc in hash4 (keyarg=<optimized out>, len=<optimized out>)
    at /usr/src/lib/libc/db/hash/hash_func.c:188
188     in /usr/src/lib/libc/db/hash/hash_func.c

(gdb) info locals
key = <optimized out>
loop = 5
len = <optimized out>
key = <optimized out>
loop = <optimized out>
keyarg = <optimized out>
h = <optimized out>
key = <optimized out>
loop = <optimized out>

(gdb) fr 1
#1  0x00007bb81715a8a7 in __call_hash (len=288, k=0x7f7fffe67800 "\020\002", hashp=0x7bb817db8000)
    at /usr/src/lib/libc/db/hash/hash.c:901
901     /usr/src/lib/libc/db/hash/hash.c: No such file or directory.
(gdb) info locals
n = <optimized out>
bucket = <optimized out>
n = <optimized out>
bucket = <optimized out>

(gdb) fr 2
#2  hash_access (hashp=0x7bb817db8000, action=HASH_DELETE, key=0x7f7fffe67660, val=0x0)
    at /usr/src/lib/libc/db/hash/hash.c:591
591     in /usr/src/lib/libc/db/hash/hash.c

(gdb) info locals
rbufp = <optimized out>
bufp = 0x103009de8
save_bufp = <optimized out>
bp = <optimized out>
n = <optimized out>
ndx = <optimized out>
off = 4096
size = 288
kp = 0x7f7fffe67800 "\020\002"
pageno = <optimized out>

(gdb) fr 3
#3  0x0000000102e070a0 in state_del (db=0x7bb817ddf140, c=0x7f7fffe67800) at /usr/src/external/bsd/blacklist/bin/state.c:125
125     /usr/src/external/bsd/blacklist/bin/state.c: No such file or directory.
(gdb) info locals
rv = 0
k = {data = 0x7f7fffe67800, size = 288}
__func__ = "state_del"

(gdb) fr 4
#4  0x0000000102e027c3 in update () at /usr/src/external/bsd/blacklist/bin/blacklistd.c:320
320     /usr/src/external/bsd/blacklist/bin/blacklistd.c: No such file or directory.

(gdb) info locals
when = 1668771766
ts = {tv_sec = 1673277103, tv_nsec = 240446134}
c = {c_ss = {ss_len = 16 '\020', ss_family = 2 '\002', __ss_pad1 = "\000\026\311\331\302 ", __ss_align = 0, __ss_pad2 = '\000' <repeats 111 times>}, c_lmask = 32, c_port = 22, c_proto = 6, c_family = 2, c_uid = 0, c_nfail = 5,
  c_name = "blacklistd-vps", '\000' <repeats 113 times>, c_rmask = -1, c_duration = 86400}
dbi = {count = 4, last = 1668685366, id = '\000' <repeats 63 times>}
f = 1
n = 0
buf = "\226]@\371~\177\000\000\000\000\000\027\270{\000\000\000\340\337\027\270{\000\000\000\344\337\027\270{\000\000\000\350\337\027\270{\000\000\000\354\337\027\270{\000\000@\f@\371~\177\000\000\000\344\337\027\270{\000\000`x\346\377\177\177\000\000\000\340\337\027\270{\000\000\000\200\333\027\001", '\000' <repeats 12 times>, "\354\337\027\270{\000\000\001\000\000\000\000\000\000\000\260\236\000\027\270{\000\000\000\000\000\000\000\000\000"
ss = 0x7f7fffe67800
__func__ = "update"

(gdb) fr 5
#5  0x0000000102e033aa in main (argc=0, argv=0x7f7fffe67a58) at /usr/src/external/bsd/blacklist/bin/blacklistd.c:564
564     in /usr/src/external/bsd/blacklist/bin/blacklistd.c

(gdb) info locals
t = 0
c = -1
tout = 15000
flags = 4196354
flush = 0
restore = 0
ret = 1
spath = 0x0
blsock = 0x0
nblsock = 0
maxblsock = 0
pfd = 0x7bb817ddf0f0
bl = 0x7bb817ddf000
nfd = 1
maxfd = 10


>How-To-Repeat:

>Fix:


>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: port-amd64-maintainer->christos
Responsible-Changed-By: riastradh@NetBSD.org
Responsible-Changed-When: Sun, 22 Jan 2023 20:50:50 +0000
Responsible-Changed-Why:
over to blocklistd maintainer


From: ocb@l25.fi
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/57193 (blacklistd runs into endless loop when multiple equal entries are in database)
Date: Sun, 23 Apr 2023 13:25:38 +0200 (CEST)

 this may have been caused by misconfigured npf which did not correctly add IP addresses to blacklist database, therefore it encountered what blacklistd manual says or an unknown race condition with a prerequisite that the source address is connecting with many threads. probability to encounter this condition should increase if nfail is set to a larger number.

 was not able to reproduce on current, but did not test much.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2023 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.