NetBSD Problem Report #57179
From www@netbsd.org Mon Jan 9 18:53:40 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 9B37D1A9239
for <gnats-bugs@gnats.NetBSD.org>; Mon, 9 Jan 2023 18:53:40 +0000 (UTC)
Message-Id: <20230109185339.2DBAE1A923A@mollari.NetBSD.org>
Date: Mon, 9 Jan 2023 18:53:39 +0000 (UTC)
From: cmeerw@cmeerw.org
Reply-To: cmeerw@cmeerw.org
To: gnats-bugs@NetBSD.org
Subject: occasional pkg_add core dumps
X-Send-Pr-Version: www-1.0
>Number: 57179
>Category: bin
>Synopsis: occasional pkg_add core dumps
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: bin-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Mon Jan 09 18:55:00 +0000 2023
>Closed-Date: Sat Feb 03 15:42:18 +0000 2024
>Last-Modified: Sat Feb 03 15:42:18 +0000 2024
>Originator: Christof Meerwald
>Release: 10.0_BETA
>Organization:
>Environment:
NetBSD arm-netbsd 10.0_BETA NetBSD 10.0_BETA (GENERIC64) #0: Wed Jan 4 12:17:54 UTC 2023 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/evbarm/compile/GENERIC64 evbarm
>Description:
I am seeing occasional core dumps from pkg_add, e.g.
gdb /usr/sbin/pkg_add /usr/pkg/pkg_add.core
GNU gdb (GDB) 11.0.50.20200914-git
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64--netbsd".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/pkg_add...
(No debugging symbols found in /usr/sbin/pkg_add)
[New process 1969]
Core was generated by `pkg_add'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x0000f05b79f80098 in free () from /usr/lib/libc.so.12
(gdb) where
#0 0x0000f05b79f80098 in free () from /usr/lib/libc.so.12
#1 0x0000f05b7a7996bc in fetch_close () from /usr/lib/libfetch.so.3
#2 0x0000f05b7a798e84 in fetch_cache_put () from /usr/lib/libfetch.so.3
#3 0x0000f05b7a794558 in ?? () from /usr/lib/libfetch.so.3
#4 0x0000f05b7a79a010 in fetchIO_close () from /usr/lib/libfetch.so.3
#5 0x000000000315eabc in fetch_archive_close ()
#6 0x0000f05b7a3d9214 in ?? () from /usr/lib/libarchive.so.4
#7 0x0000f05b7a3d9420 in ?? () from /usr/lib/libarchive.so.4
#8 0x0000f05b7a3d94bc in ?? () from /usr/lib/libarchive.so.4
#9 0x0000f05b7a3da1ac in ?? () from /usr/lib/libarchive.so.4
#10 0x0000000003156f40 in pkg_do ()
#11 0x0000000003158178 in check_dependencies ()
#12 0x0000000003157d78 in pkg_do ()
#13 0x00000000031583a4 in pkg_perform ()
#14 0x000000000316227c in main ()
This was from doing a "pkg_add groff" on an aarch64 system (Free Oracle Cloud instance, but also seen it on a local Raspberry Pi 3)
>How-To-Repeat:
Not sure if it's always reproducible, but have seen it on two separate systems, both when doing "pkg_add groff"
Note, that I had changed PKG_PATH to use http (instead of https, see PR #57124):
http://cdn.NetBSD.org/pub/pkgsrc/packages/NetBSD/aarch64/10.0/All
>Fix:
>Release-Note:
>Audit-Trail:
State-Changed-From-To: open->feedback
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Mon, 11 Dec 2023 01:34:44 +0000
State-Changed-Why:
Can you install the debug.tgz (or .tar.xz) set and see if you can get
more information from the stack trace, such as the local variables with
`frame 0', `frame 1', `frame 2', ..., and `info locals'?
Guessing this is a double-free, so all we'll need is the line number of
the free return address, but just in case the extra info won't hurt.
From: Christof Meerwald <cmeerw@cmeerw.org>
To: gnats-bugs@netbsd.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, riastradh@netbsd.org
Subject: Re: bin/57179 (occasional pkg_add core dumps)
Date: Mon, 11 Dec 2023 20:31:30 +0100
Ok, just did some debugging on this now.
To reproduce the issue it seems to be important that PKG_PATH is
actually set to
http://cdn.NetBSD.org/pub/pkgsrc/packages/NetBSD/aarch64/10.0/All
A consequence of this is that this URL actually redirects to
http://cdn.netbsd.org/pub/pkgsrc/packages/NetBSD/aarch64/10.0/All/
So we'll actually see two different hosts in the connection cache:
cdn.NetBSD.org and cdn.netbsd.org
Source code I am referring to is
http://cvsweb.netbsd.org/bsdweb.cgi/src/external/bsd/fetch/dist/libfetch/common.c?annotate=1.4&only_with_tag=MAIN
First bug I noticed is in "fetch_cache_get" where last_conn is
initialized to NULL, but never updated. This is probably just a
resource/memory leak as we'll always get into the "else" branch in
line 382 (and throw away the initial part of the connection_cache).
But the main issue (the one that is then resulting in the core dumps)
is in fetch_cache_put. There is actually two parts to it.
First part (the one that leads to the memory corruption) is that after
closing a connection in line 421, on the next iteration, "last" will
be set to that closed connection. So if we then also close the next
connection on that next iteration, the "next_cached" link in the list
isn't updated correctly (as we are updating the "next_cached" of the
closed connection). This then leads to the core dump on the next call
to "fetch_cache_put".
Now the remaining issue is, why are we even closing two connections
from the connection_cache in one single call to fetch_cache_put? The
issue here is that once we reach the host_count limit, we don't reset
the "host_count" and ultimately close all remaining connections in
connection_cache (even if those connections are for different hosts
that haven't reached the host_count limit). In my case
connection_cache contained 4 connections for "cdn.NetBSD.org",
followed by a connection for "cdn.netbsd.org". When trying to put
another "cdn.NetBSD.org" connection into the cache, it realised that
the fourth connection in the cache is over the host limit, closed it,
and continued to the last "cdn.netbsd.org" connection. But as
host_count wasn't decremented, it then proceeded to also close that
"cdn.netbsd.org" connection (resulting in the linked-list corruption).
Hope that helps.
From: "Christos Zoulas" <christos@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57179 CVS commit: src/external/bsd/fetch/dist/libfetch
Date: Thu, 28 Dec 2023 19:55:46 -0500
Module Name: src
Committed By: christos
Date: Fri Dec 29 00:55:46 UTC 2023
Modified Files:
src/external/bsd/fetch/dist/libfetch: common.c
Log Message:
PR/57179: Christof Meerwald: Fix bugs in fetch_cache_{get,put}.
To generate a diff of this commit:
cvs rdiff -u -r1.4 -r1.5 src/external/bsd/fetch/dist/libfetch/common.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Christof Meerwald <cmeerw@cmeerw.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/57179 (occasional pkg_add core dumps)
Date: Thu, 1 Feb 2024 19:59:08 +0100
Finally managed to test on NetBSD HEAD and I am still seeing core
dumps.
I believe the loop in fetch_cache_put should look somewhat like this:
for (iter = connection_cache; iter; ) {
++global_count;
if (strcmp(conn->cache_url->host, iter->cache_url->host) == 0)
++host_count;
if (global_count < cache_global_limit &&
host_count < cache_per_host_limit) {
oiter = NULL;
last = iter;
} else {
--global_count;
if (host_count >= cache_per_host_limit)
--host_count;
if (last != NULL)
last->next_cached = iter->next_cached;
else
connection_cache = iter->next_cached;
oiter = iter;
}
iter = iter->next_cached;
if (oiter)
(*oiter->cache_close)(oiter);
}
I have moved "last = iter;" into the "then" branch (so "last" won't
end up pointing to an entry that has been closed), and I have added a
conditional "--host_count" into the "else" branch, so we don't end up
closing all remaining cache entries (with mismatched host names).
This is completely untested as I don't have the system set up to
re-compile the library.
Christof
--
https://cmeerw.org sip:cmeerw at cmeerw.org
mailto:cmeerw at cmeerw.org xmpp:cmeerw at cmeerw.org
From: "Christos Zoulas" <christos@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/57179 CVS commit: src/external/bsd/fetch/dist/libfetch
Date: Fri, 2 Feb 2024 17:19:05 -0500
Module Name: src
Committed By: christos
Date: Fri Feb 2 22:19:05 UTC 2024
Modified Files:
src/external/bsd/fetch/dist/libfetch: common.c common.h fetch.3 ftp.c
http.c
Log Message:
Sync with pkgsrc and try to fix more memory corruption from PR/57179.
To generate a diff of this commit:
cvs rdiff -u -r1.6 -r1.7 src/external/bsd/fetch/dist/libfetch/common.c
cvs rdiff -u -r1.2 -r1.3 src/external/bsd/fetch/dist/libfetch/common.h
cvs rdiff -u -r1.4 -r1.5 src/external/bsd/fetch/dist/libfetch/fetch.3 \
src/external/bsd/fetch/dist/libfetch/http.c
cvs rdiff -u -r1.7 -r1.8 src/external/bsd/fetch/dist/libfetch/ftp.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: Christof Meerwald <cmeerw@cmeerw.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/57179 (occasional pkg_add core dumps)
Date: Sat, 3 Feb 2024 11:31:44 +0100
Seems to be fixed now, I am no longer seeing any core dumps.
Thank you very much.
State-Changed-From-To: feedback->closed
State-Changed-By: wiz@NetBSD.org
State-Changed-When: Sat, 03 Feb 2024 15:42:18 +0000
State-Changed-Why:
Confirmed fixed, thanks!
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.