NetBSD Problem Report #58208

From www@netbsd.org  Sun Apr 28 13:45:03 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id B6D501A923A
	for <gnats-bugs@gnats.NetBSD.org>; Sun, 28 Apr 2024 13:45:03 +0000 (UTC)
Message-Id: <20240428134432.47C651A923A@mollari.NetBSD.org>
Date: Sun, 28 Apr 2024 13:44:32 +0000 (UTC)
From: campbell+netbsd@mumble.net
Reply-To: campbell+netbsd@mumble.net
To: gnats-bugs@NetBSD.org
Subject: ctype(3) provides poor runtime feedback of abuse
X-Send-Pr-Version: www-1.0

>Number:         58208
>Category:       lib
>Synopsis:       ctype(3) provides poor runtime feedback of abuse
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    lib-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sun Apr 28 13:50:00 +0000 2024
>Originator:     Taylor R Campbell
>Release:        current, 10, 9, ...
>Organization:
The NetBSD isfoundation
>Environment:
>Description:
The ctype(3) functions, such as isprint/isdigit/isalpha and toupper/tolower, have a singularly troublesome specification: Their argument has type int, but they are only defined on inputs that are either (a) the value of the EOF macro (which on NetBSD is -1), or (b) representable by unsigned char.  In other words, there are exactly 257 allowed inputs: {-1, 0, 1, 2, 3, ..., 255}.  Any other inputs lead to undefined behaviour.

This is because they are meant for use with stdio functions like fgetc(3):

int ch;
while ((ch = fgetc(fp)) != EOF) {
        if (isspace(ch))
                continue;
        ...
}

Using them to process arbitrary strings via `char *' requires explicit conversion to `unsigned char':

char *s;
for (s = ...; *s != '\0'; s++) {
        if (isspace((unsigned char)*s))
                continue;
        ...
}

Without this conversion, on machines where char is signed such as x86, char values outside the 7-bit US-ASCII range are either (a) undefined behaviour, or (b) in the case of the all-bits-set octet, conflated with EOF.

Our definitions are crafted to trigger a compiler warning to detect this use of char inputs, but it doesn't always work -- it has been broken on netbsd-9.  And in C++ it doesn't apply (separate issue).  Plus, in a misguided attempt to pacify this legitimate warning, some code is written to do nonsense like isspace((int)*s), which suppresses the warning without fixing the problem, because integers like -7 are preserved instead of being converted to 249.

Currently, when inputs below -1 are passed in, the ctype(3) functions read out whatever memory precedes the ctype/tolower/toupper tables, either statically linked into libc for the C locale, or dynamically allocated for other locales.  This may lead to crashes, but more likely it just leads to confusing nondeterministic outputs, like <https://github.com/ledger/ledger/issues/2338>.
>How-To-Repeat:
char s[] = {0xb5, 0};
// optionally: setlocale(LC_ALL, "");
printf("%d\n", isspace(s[0]));
>Fix:
On machines with signed char, we should allocate a guard page before the ctype/tolower/toupper tables so that attempts to pass in negative values other than EOF lead to immediate SIGSEGV rather than to silent corruption of outputs or leakage of unrelated memory.

We can also arrange to have the out-of-line functions check for inputs below -1 and abort another way, but that's not enough because the ABI allows direct access to the tables via ctype_inline.h.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.