NetBSD Problem Report #56121
From woods@xentastic.weird.com Thu Apr 22 01:18:17 2021
Return-Path: <woods@xentastic.weird.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 68A0D1A9253
for <gnats-bugs@gnats.NetBSD.org>; Thu, 22 Apr 2021 01:18:17 +0000 (UTC)
Message-Id: <20210422011812.B0E3B511CA@xentastic.weird.com>
Date: Wed, 21 Apr 2021 18:18:12 -0700 (PDT)
From: "Greg A. Woods" <woods@planix.ca>
Reply-To: "Greg A. Woods" <woods@planix.ca>
To: gnats-bugs@NetBSD.org
Subject: struct kinfo_lwp is inconsistent in NUL-termination of its char arrays, ps assumes all are NUL-terminated!
X-Send-Pr-Version: 3.95
>Number: 56121
>Category: kern
>Synopsis: struct kinfo_lwp is inconsistent in NUL-termination of its char arrays, ps assumes all are NUL-terminated!
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: simonb
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Thu Apr 22 01:20:00 +0000 2021
>Last-Modified: Sat May 01 20:35:01 +0000 2021
>Originator: Greg A. Woods
>Release: NetBSD 9.99.81
>Organization:
Planix, Inc.; Kelowna, BC; Canada
>Environment:
System: NetBSD xentastic 9.99.81 NetBSD 9.99.81 (XEN3_DOM0) #14: Wed Apr 21 12:25:35 PDT 2021 woods@xentastic:/build/woods/xentastic/current-amd64-amd64-obj/build/src/sys/arch/amd64/compile/XEN3_DOM0 amd64
Architecture: x86_64
Machine: amd64
>Description:
I wondered why the WCHAN column in ps(1) output was truncated,
and thought I might widen it to the widest (struct
kinfo_lwp).l_wmesg string and quickly found out this field is
not currently a string, but just a fixed size character array,
and is filled with strncpy() from an identically sized array in
the kernel, which is itself not NUL terminated.
I then noticed that ps(1) passes a pointer to l_wmesg to its
internal strprintorsetwidth() which (when also called with
WIDTHMODE) promptly passes that same pointer to strlen().
I tried my idea to change ps(1) to print an unlimited-width
WCHAN column, only to see garbage characters, proving there was
something quite wrong. (see the final diff hunk below)
A the moment there's a guaranteed NUL byte at the end of struct
kinfo_lwp (i.e. in l_name), courtesy strlcpy(), so nothing
terrible happens.
I would suggest that since the l_name field already has a
padded-out width, and since it is already filled with strlcpy()
and thus always NUL terminated, and since there are pad fields
just before l_wmesg, that l_wmesg also be padded (replacing one
of the pad fields), AND that it also be filled with strlcpy().
>How-To-Repeat:
by reading code, and testing changes
>Fix:
--- sys/sys/sysctl.h.~1.231.~ 2021-03-07 17:24:18.000000000 -0800
+++ sys/sys/sysctl.h 2021-04-21 17:41:09.797635650 -0700
@@ -373,7 +373,7 @@
*/
#define KI_NGROUPS 16
#define KI_MAXCOMLEN 24 /* extra for 8 byte alignment */
-#define KI_WMESGLEN 8
+#define KI_WMESGLEN 12 /* extra for NUL and 8 byte alignment */
#define KI_MAXLOGNAME 24 /* extra for 8 byte alignment */
#define KI_MAXEMULLEN 16
#define KI_LNAMELEN 20 /* extra 4 for alignment */
@@ -566,8 +566,7 @@
uint8_t l_usrpri; /* U_CHAR: User-priority based on l_cpu and p_nice. */
int8_t l_stat; /* CHAR: S* process status. */
int8_t l_pad1; /* fill out to 4-byte boundary */
- int32_t l_pad2; /* .. and then to an 8-byte boundary */
- char l_wmesg[KI_WMESGLEN]; /* wchan message */
+ char l_wmesg[KI_WMESGLEN]; /* wchan message (with NUL) */
uint64_t l_wchan; /* PTR: sleep address. */
uint64_t l_cpuid; /* LONG: CPU id */
uint32_t l_rtime_sec; /* STRUCT TIMEVAL: Real time. */
@@ -575,7 +574,7 @@
uint32_t l_cpticks; /* INT: ticks during l_swtime */
uint32_t l_pctcpu; /* FIXPT_T: cpu usage for ps */
uint32_t l_pid; /* PID_T: process identifier */
- char l_name[KI_LNAMELEN]; /* CHAR[]: name, may be empty */
+ char l_name[KI_LNAMELEN]; /* CHAR[]: name, may be empty (with NUL) */
};
/*
--- sys/kern/init_sysctl.c.~1.227.~ 2021-03-07 17:23:04.000000000 -0800
+++ sys/kern/init_sysctl.c 2021-04-21 18:04:42.321587746 -0700
@@ -1576,7 +1576,7 @@
kl->l_priority = lwp_eprio(l);
kl->l_usrpri = l->l_priority;
if (l->l_wchan)
- strncpy(kl->l_wmesg, l->l_wmesg, sizeof(kl->l_wmesg));
+ strlcpy(kl->l_wmesg, l->l_wmesg, sizeof(kl->l_wmesg));
COND_SET_VALUE(kl->l_wchan, PTRTOUINT64(l->l_wchan), allowaddr);
kl->l_cpuid = cpu_index(l->l_cpu);
bintime2timeval(&l->l_rtime, &tv);
This change isn't necessary (note it does not and should not be changed
to use strlcpy()!), but I think it is a much safer coding practice and
more consistent with related code:
--- kern_proc.c.~1.262.~ 2021-03-07 17:23:06.000000000 -0800
+++ kern_proc.c 2021-04-21 18:01:35.169135140 -0700
@@ -2692,7 +2692,7 @@
KASSERT(l != NULL);
lwp_lock(l);
if (l->l_wchan)
- strncpy(ep->e_wmesg, l->l_wmesg, WMESGLEN);
+ strncpy(ep->e_wmesg, l->l_wmesg, sizeof(ep->e_wmesg));
lwp_unlock(l);
}
ep->e_ppid = p->p_ppid;
This final diff hunk is what I was trying to achieve, and with the
kernel fixes it works, but without them it prints garbage in the wchan
field: (note KI_LNAMELEN includes the NUL so maybe lname() should be
similarly fixed)
--- bin/ps/print.c.~1.132.~ 2020-05-30 15:08:18.000000000 -0700
+++ bin/ps/print.c 2021-04-21 17:58:09.223885510 -0700
@@ -936,7 +936,8 @@
v = ve->var;
if (l->l_wmesg[0]) {
strprintorsetwidth(v, l->l_wmesg, mode);
- v->width = min(v->width, KI_WMESGLEN);
+ if (termwidth != UNLIMITED)
+ v->width = min(v->width, 8); /* xxx or KI_WMESGLEN-1? */
} else {
if (mode == PRINTMODE)
(void)printf("%-*s", v->width, "-");
>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: kern-bug-people->simonb
Responsible-Changed-By: simonb@NetBSD.org
Responsible-Changed-When: Thu, 22 Apr 2021 01:58:34 +0000
Responsible-Changed-Why:
I'll take care of this one.
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: kern/56121: struct kinfo_lwp is inconsistent in NUL-termination
of its char arrays, ps assumes all are NUL-terminated!
Date: Sat, 1 May 2021 20:34:46 +0000
The following response got misfiled by gnats:
From: Simon Burge <simonb@NetBSD.org>
To: "Greg A. Woods" <woods@planix.ca>
Cc: gnats-bugs@NetBSD.org
Subject: Re: struct kinfo_lwp is inconsistent in NUL-termination of its char arrays, ps assumes all are NUL-terminated!
Hi Greg,
"Greg A. Woods" wrote:
> >Number: 56121
> >Category: kern
> >Synopsis: struct kinfo_lwp is inconsistent in NUL-termination of =
its char arrays, ps assumes all are NUL-terminated!
> --- sys/sys/sysctl.h.~1.231.~ 2021-03-07 17:24:18.000000000 -0800
> +++ sys/sys/sysctl.h 2021-04-21 17:41:09.797635650 -0700
> @@ -373,7 +373,7 @@
> */
> #define KI_NGROUPS 16
> #define KI_MAXCOMLEN 24 /* extra for 8 byte alignment */
> -#define KI_WMESGLEN 8
> +#define KI_WMESGLEN 12 /* extra for NUL and 8 byte alignment */
> #define KI_MAXLOGNAME 24 /* extra for 8 byte alignment */
> #define KI_MAXEMULLEN 16
> #define KI_LNAMELEN 20 /* extra 4 for alignment */
> @@ -566,8 +566,7 @@
> uint8_t l_usrpri; /* U_CHAR: User-priority based on l_cpu and p_nic
e. */
> int8_t l_stat; /* CHAR: S* process status. */
> int8_t l_pad1; /* fill out to 4-byte boundary */
> - int32_t l_pad2; /* .. and then to an 8-byte boundary */
> - char l_wmesg[KI_WMESGLEN]; /* wchan message */
> + char l_wmesg[KI_WMESGLEN]; /* wchan message (with NUL) */
> uint64_t l_wchan; /* PTR: sleep address. */
> uint64_t l_cpuid; /* LONG: CPU id */
> uint32_t l_rtime_sec; /* STRUCT TIMEVAL: Real time. */
This part changes the structure of kinfo_proc2 in a non-backwards
compatible way by moving the offset of l_wmesg[*]. I think we just need
to handle the fixed sized buffer directly in ps(1). I'll look a bit
more.
Cheers,
Simon.
[*] For example you should be able to use any ps(1) binary from amd64
or i386 from NetBSD 1.5 onwards on today's -current amd64 kernel.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.