NetBSD Problem Report #57179

From www@netbsd.org  Mon Jan  9 18:53:40 2023
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 9B37D1A9239
	for <gnats-bugs@gnats.NetBSD.org>; Mon,  9 Jan 2023 18:53:40 +0000 (UTC)
Message-Id: <20230109185339.2DBAE1A923A@mollari.NetBSD.org>
Date: Mon,  9 Jan 2023 18:53:39 +0000 (UTC)
From: cmeerw@cmeerw.org
Reply-To: cmeerw@cmeerw.org
To: gnats-bugs@NetBSD.org
Subject: occasional pkg_add core dumps
X-Send-Pr-Version: www-1.0

>Number:         57179
>Category:       bin
>Synopsis:       occasional pkg_add core dumps
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    bin-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Jan 09 18:55:00 +0000 2023
>Closed-Date:    Sat Feb 03 15:42:18 +0000 2024
>Last-Modified:  Sat Feb 03 15:42:18 +0000 2024
>Originator:     Christof Meerwald
>Release:        10.0_BETA
>Organization:
>Environment:
NetBSD arm-netbsd 10.0_BETA NetBSD 10.0_BETA (GENERIC64) #0: Wed Jan  4 12:17:54 UTC 2023  mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/evbarm/compile/GENERIC64 evbarm
>Description:
I am seeing occasional core dumps from pkg_add, e.g.

gdb /usr/sbin/pkg_add /usr/pkg/pkg_add.core 
GNU gdb (GDB) 11.0.50.20200914-git
Copyright (C) 2020 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Type "show copying" and "show warranty" for details.
This GDB was configured as "aarch64--netbsd".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<https://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
    <http://www.gnu.org/software/gdb/documentation/>.

For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from /usr/sbin/pkg_add...
(No debugging symbols found in /usr/sbin/pkg_add)
[New process 1969]
Core was generated by `pkg_add'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000f05b79f80098 in free () from /usr/lib/libc.so.12
(gdb) where
#0  0x0000f05b79f80098 in free () from /usr/lib/libc.so.12
#1  0x0000f05b7a7996bc in fetch_close () from /usr/lib/libfetch.so.3
#2  0x0000f05b7a798e84 in fetch_cache_put () from /usr/lib/libfetch.so.3
#3  0x0000f05b7a794558 in ?? () from /usr/lib/libfetch.so.3
#4  0x0000f05b7a79a010 in fetchIO_close () from /usr/lib/libfetch.so.3
#5  0x000000000315eabc in fetch_archive_close ()
#6  0x0000f05b7a3d9214 in ?? () from /usr/lib/libarchive.so.4
#7  0x0000f05b7a3d9420 in ?? () from /usr/lib/libarchive.so.4
#8  0x0000f05b7a3d94bc in ?? () from /usr/lib/libarchive.so.4
#9  0x0000f05b7a3da1ac in ?? () from /usr/lib/libarchive.so.4
#10 0x0000000003156f40 in pkg_do ()
#11 0x0000000003158178 in check_dependencies ()
#12 0x0000000003157d78 in pkg_do ()
#13 0x00000000031583a4 in pkg_perform ()
#14 0x000000000316227c in main ()

This was from doing a "pkg_add groff" on an aarch64 system (Free Oracle Cloud instance, but also seen it on a local Raspberry Pi 3)

>How-To-Repeat:
Not sure if it's always reproducible, but have seen it on two separate systems, both when doing "pkg_add groff"

Note, that I had changed PKG_PATH to use http (instead of https, see PR #57124):

http://cdn.NetBSD.org/pub/pkgsrc/packages/NetBSD/aarch64/10.0/All
>Fix:

>Release-Note:

>Audit-Trail:

State-Changed-From-To: open->feedback
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Mon, 11 Dec 2023 01:34:44 +0000
State-Changed-Why:
Can you install the debug.tgz (or .tar.xz) set and see if you can get
more information from the stack trace, such as the local variables with
`frame 0', `frame 1', `frame 2', ..., and `info locals'?

Guessing this is a double-free, so all we'll need is the line number of
the free return address, but just in case the extra info won't hurt.


From: Christof Meerwald <cmeerw@cmeerw.org>
To: gnats-bugs@netbsd.org
Cc: gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, riastradh@netbsd.org
Subject: Re: bin/57179 (occasional pkg_add core dumps)
Date: Mon, 11 Dec 2023 20:31:30 +0100

 Ok, just did some debugging on this now.

 To reproduce the issue it seems to be important that PKG_PATH is
 actually set to

   http://cdn.NetBSD.org/pub/pkgsrc/packages/NetBSD/aarch64/10.0/All

 A consequence of this is that this URL actually redirects to

   http://cdn.netbsd.org/pub/pkgsrc/packages/NetBSD/aarch64/10.0/All/

 So we'll actually see two different hosts in the connection cache:
 cdn.NetBSD.org and cdn.netbsd.org

 Source code I am referring to is
 http://cvsweb.netbsd.org/bsdweb.cgi/src/external/bsd/fetch/dist/libfetch/common.c?annotate=1.4&only_with_tag=MAIN

 First bug I noticed is in "fetch_cache_get" where last_conn is
 initialized to NULL, but never updated. This is probably just a
 resource/memory leak as we'll always get into the "else" branch in
 line 382 (and throw away the initial part of the connection_cache).

 But the main issue (the one that is then resulting in the core dumps)
 is in fetch_cache_put. There is actually two parts to it.

 First part (the one that leads to the memory corruption) is that after
 closing a connection in line 421, on the next iteration, "last" will
 be set to that closed connection. So if we then also close the next
 connection on that next iteration, the "next_cached" link in the list
 isn't updated correctly (as we are updating the "next_cached" of the
 closed connection). This then leads to the core dump on the next call
 to "fetch_cache_put".

 Now the remaining issue is, why are we even closing two connections
 from the connection_cache in one single call to fetch_cache_put? The
 issue here is that once we reach the host_count limit, we don't reset
 the "host_count" and ultimately close all remaining connections in
 connection_cache (even if those connections are for different hosts
 that haven't reached the host_count limit). In my case
 connection_cache contained 4 connections for "cdn.NetBSD.org",
 followed by a connection for "cdn.netbsd.org". When trying to put
 another "cdn.NetBSD.org" connection into the cache, it realised that
 the fourth connection in the cache is over the host limit, closed it,
 and continued to the last "cdn.netbsd.org" connection. But as
 host_count wasn't decremented, it then proceeded to also close that
 "cdn.netbsd.org" connection (resulting in the linked-list corruption).

 Hope that helps.

From: "Christos Zoulas" <christos@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57179 CVS commit: src/external/bsd/fetch/dist/libfetch
Date: Thu, 28 Dec 2023 19:55:46 -0500

 Module Name:	src
 Committed By:	christos
 Date:		Fri Dec 29 00:55:46 UTC 2023

 Modified Files:
 	src/external/bsd/fetch/dist/libfetch: common.c

 Log Message:
 PR/57179: Christof Meerwald: Fix bugs in fetch_cache_{get,put}.


 To generate a diff of this commit:
 cvs rdiff -u -r1.4 -r1.5 src/external/bsd/fetch/dist/libfetch/common.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Christof Meerwald <cmeerw@cmeerw.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/57179 (occasional pkg_add core dumps)
Date: Thu, 1 Feb 2024 19:59:08 +0100

 Finally managed to test on NetBSD HEAD and I am still seeing core
 dumps.

 I believe the loop in fetch_cache_put should look somewhat like this:

 	for (iter = connection_cache; iter; ) {
 		++global_count;
 		if (strcmp(conn->cache_url->host, iter->cache_url->host) == 0)
 			++host_count;
 		if (global_count < cache_global_limit &&
 		    host_count < cache_per_host_limit) {
 			oiter = NULL;
 			last = iter;
 		} else {
 			--global_count;
 			if (host_count >= cache_per_host_limit)
 				--host_count;
 			if (last != NULL)
 				last->next_cached = iter->next_cached;
 			else
 				connection_cache = iter->next_cached;
 			oiter = iter;
 		}
 		iter = iter->next_cached;
 		if (oiter)
 			(*oiter->cache_close)(oiter);
 	}

 I have moved "last = iter;" into the "then" branch (so "last" won't
 end up pointing to an entry that has been closed), and I have added a
 conditional "--host_count" into the "else" branch, so we don't end up
 closing all remaining cache entries (with mismatched host names).

 This is completely untested as I don't have the system set up to
 re-compile the library.


 Christof

 -- 

 https://cmeerw.org                             sip:cmeerw at cmeerw.org
 mailto:cmeerw at cmeerw.org                   xmpp:cmeerw at cmeerw.org

From: "Christos Zoulas" <christos@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/57179 CVS commit: src/external/bsd/fetch/dist/libfetch
Date: Fri, 2 Feb 2024 17:19:05 -0500

 Module Name:	src
 Committed By:	christos
 Date:		Fri Feb  2 22:19:05 UTC 2024

 Modified Files:
 	src/external/bsd/fetch/dist/libfetch: common.c common.h fetch.3 ftp.c
 	    http.c

 Log Message:
 Sync with pkgsrc and try to fix more memory corruption from PR/57179.


 To generate a diff of this commit:
 cvs rdiff -u -r1.6 -r1.7 src/external/bsd/fetch/dist/libfetch/common.c
 cvs rdiff -u -r1.2 -r1.3 src/external/bsd/fetch/dist/libfetch/common.h
 cvs rdiff -u -r1.4 -r1.5 src/external/bsd/fetch/dist/libfetch/fetch.3 \
     src/external/bsd/fetch/dist/libfetch/http.c
 cvs rdiff -u -r1.7 -r1.8 src/external/bsd/fetch/dist/libfetch/ftp.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Christof Meerwald <cmeerw@cmeerw.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/57179 (occasional pkg_add core dumps)
Date: Sat, 3 Feb 2024 11:31:44 +0100

 Seems to be fixed now, I am no longer seeing any core dumps.

 Thank you very much.

State-Changed-From-To: feedback->closed
State-Changed-By: wiz@NetBSD.org
State-Changed-When: Sat, 03 Feb 2024 15:42:18 +0000
State-Changed-Why:
Confirmed fixed, thanks!


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.