NetBSD Problem Report #56121

From woods@xentastic.weird.com  Thu Apr 22 01:18:17 2021
Return-Path: <woods@xentastic.weird.com>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 68A0D1A9253
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 22 Apr 2021 01:18:17 +0000 (UTC)
Message-Id: <20210422011812.B0E3B511CA@xentastic.weird.com>
Date: Wed, 21 Apr 2021 18:18:12 -0700 (PDT)
From: "Greg A. Woods" <woods@planix.ca>
Reply-To: "Greg A. Woods" <woods@planix.ca>
To: gnats-bugs@NetBSD.org
Subject: struct kinfo_lwp is inconsistent in NUL-termination of its char arrays, ps assumes all are NUL-terminated!
X-Send-Pr-Version: 3.95

>Number:         56121
>Category:       kern
>Synopsis:       struct kinfo_lwp is inconsistent in NUL-termination of its char arrays, ps assumes all are NUL-terminated!
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    simonb
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Apr 22 01:20:00 +0000 2021
>Last-Modified:  Sat May 01 20:35:01 +0000 2021
>Originator:     Greg A. Woods
>Release:        NetBSD 9.99.81
>Organization:
Planix, Inc.; Kelowna, BC; Canada
>Environment:
System: NetBSD xentastic 9.99.81 NetBSD 9.99.81 (XEN3_DOM0) #14: Wed Apr 21 12:25:35 PDT 2021 woods@xentastic:/build/woods/xentastic/current-amd64-amd64-obj/build/src/sys/arch/amd64/compile/XEN3_DOM0 amd64
Architecture: x86_64
Machine: amd64
>Description:

	I wondered why the WCHAN column in ps(1) output was truncated,
	and thought I might widen it to the widest (struct
	kinfo_lwp).l_wmesg string and quickly found out this field is
	not currently a string, but just a fixed size character array,
	and is filled with strncpy() from an identically sized array in
	the kernel, which is itself not NUL terminated.

	I then noticed that ps(1) passes a pointer to l_wmesg to its
	internal strprintorsetwidth() which (when also called with
	WIDTHMODE) promptly passes that same pointer to strlen().

	I tried my idea to change ps(1) to print an unlimited-width
	WCHAN column, only to see garbage characters, proving there was
	something quite wrong.  (see the final diff hunk below)

	A the moment there's a guaranteed NUL byte at the end of struct
	kinfo_lwp (i.e. in l_name), courtesy strlcpy(), so nothing
	terrible happens.

	I would suggest that since the l_name field already has a
	padded-out width, and since it is already filled with strlcpy()
	and thus always NUL terminated, and since there are pad fields
	just before l_wmesg, that l_wmesg also be padded (replacing one
	of the pad fields), AND that it also be filled with strlcpy().

>How-To-Repeat:

	by reading code, and testing changes

>Fix:

--- sys/sys/sysctl.h.~1.231.~	2021-03-07 17:24:18.000000000 -0800
+++ sys/sys/sysctl.h	2021-04-21 17:41:09.797635650 -0700
@@ -373,7 +373,7 @@
  */
 #define	KI_NGROUPS	16
 #define	KI_MAXCOMLEN	24	/* extra for 8 byte alignment */
-#define	KI_WMESGLEN	8
+#define	KI_WMESGLEN	12	/* extra for NUL and 8 byte alignment */
 #define	KI_MAXLOGNAME	24	/* extra for 8 byte alignment */
 #define	KI_MAXEMULLEN	16
 #define	KI_LNAMELEN	20	/* extra 4 for alignment */
@@ -566,8 +566,7 @@
 	uint8_t l_usrpri;		/* U_CHAR: User-priority based on l_cpu and p_nice. */
 	int8_t	l_stat;			/* CHAR: S* process status. */
 	int8_t	l_pad1;			/* fill out to 4-byte boundary */
-	int32_t	l_pad2;			/* .. and then to an 8-byte boundary */
-	char	l_wmesg[KI_WMESGLEN];	/* wchan message */
+	char	l_wmesg[KI_WMESGLEN];	/* wchan message (with NUL) */
 	uint64_t l_wchan;		/* PTR: sleep address. */
 	uint64_t l_cpuid;		/* LONG: CPU id */
 	uint32_t l_rtime_sec;		/* STRUCT TIMEVAL: Real time. */
@@ -575,7 +574,7 @@
 	uint32_t l_cpticks;		/* INT: ticks during l_swtime */
 	uint32_t l_pctcpu;		/* FIXPT_T: cpu usage for ps */
 	uint32_t l_pid;			/* PID_T: process identifier */
-	char	l_name[KI_LNAMELEN];	/* CHAR[]: name, may be empty */
+	char	l_name[KI_LNAMELEN];	/* CHAR[]: name, may be empty (with NUL) */
 };

 /*
--- sys/kern/init_sysctl.c.~1.227.~	2021-03-07 17:23:04.000000000 -0800
+++ sys/kern/init_sysctl.c	2021-04-21 18:04:42.321587746 -0700
@@ -1576,7 +1576,7 @@
 	kl->l_priority = lwp_eprio(l);
 	kl->l_usrpri = l->l_priority;
 	if (l->l_wchan)
-		strncpy(kl->l_wmesg, l->l_wmesg, sizeof(kl->l_wmesg));
+		strlcpy(kl->l_wmesg, l->l_wmesg, sizeof(kl->l_wmesg));
 	COND_SET_VALUE(kl->l_wchan, PTRTOUINT64(l->l_wchan), allowaddr);
 	kl->l_cpuid = cpu_index(l->l_cpu);
 	bintime2timeval(&l->l_rtime, &tv);


This change isn't necessary (note it does not and should not be changed
to use strlcpy()!), but I think it is a much safer coding practice and
more consistent with related code:


--- kern_proc.c.~1.262.~	2021-03-07 17:23:06.000000000 -0800
+++ kern_proc.c	2021-04-21 18:01:35.169135140 -0700
@@ -2692,7 +2692,7 @@
 		KASSERT(l != NULL);
 		lwp_lock(l);
 		if (l->l_wchan)
-			strncpy(ep->e_wmesg, l->l_wmesg, WMESGLEN);
+			strncpy(ep->e_wmesg, l->l_wmesg, sizeof(ep->e_wmesg));
 		lwp_unlock(l);
 	}
 	ep->e_ppid = p->p_ppid;





This final diff hunk is what I was trying to achieve, and with the
kernel fixes it works, but without them it prints garbage in the wchan
field:  (note KI_LNAMELEN includes the NUL so maybe lname() should be
similarly fixed)


--- bin/ps/print.c.~1.132.~	2020-05-30 15:08:18.000000000 -0700
+++ bin/ps/print.c	2021-04-21 17:58:09.223885510 -0700
@@ -936,7 +936,8 @@
 	v = ve->var;
 	if (l->l_wmesg[0]) {
 		strprintorsetwidth(v, l->l_wmesg, mode);
-		v->width = min(v->width, KI_WMESGLEN);
+		if (termwidth != UNLIMITED)
+			v->width = min(v->width, 8); /* xxx or KI_WMESGLEN-1? */
 	} else {
 		if (mode == PRINTMODE)
 			(void)printf("%-*s", v->width, "-");

>Release-Note:

>Audit-Trail:

Responsible-Changed-From-To: kern-bug-people->simonb
Responsible-Changed-By: simonb@NetBSD.org
Responsible-Changed-When: Thu, 22 Apr 2021 01:58:34 +0000
Responsible-Changed-Why:
I'll take care of this one.


From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56121: struct kinfo_lwp is inconsistent in NUL-termination
 of its char arrays, ps assumes all are NUL-terminated!
Date: Sat, 1 May 2021 20:34:46 +0000

 The following response got misfiled by gnats:

 From: Simon Burge <simonb@NetBSD.org>
 To: "Greg A. Woods" <woods@planix.ca>
 Cc: gnats-bugs@NetBSD.org
 Subject: Re: struct kinfo_lwp is inconsistent in NUL-termination of its char arrays, ps assumes all are NUL-terminated!

  Hi Greg,

  "Greg A. Woods" wrote:

  > >Number:         56121
  > >Category:       kern
  > >Synopsis:       struct kinfo_lwp is inconsistent in NUL-termination of =
  its char arrays, ps assumes all are NUL-terminated!

  > --- sys/sys/sysctl.h.~1.231.~        2021-03-07 17:24:18.000000000 -0800
  > +++ sys/sys/sysctl.h 2021-04-21 17:41:09.797635650 -0700
  > @@ -373,7 +373,7 @@
  >   */
  >  #define     KI_NGROUPS      16
  >  #define     KI_MAXCOMLEN    24      /* extra for 8 byte alignment */
  > -#define     KI_WMESGLEN     8
  > +#define     KI_WMESGLEN     12      /* extra for NUL and 8 byte alignment */
  >  #define     KI_MAXLOGNAME   24      /* extra for 8 byte alignment */
  >  #define     KI_MAXEMULLEN   16
  >  #define     KI_LNAMELEN     20      /* extra 4 for alignment */
  > @@ -566,8 +566,7 @@
  >      uint8_t l_usrpri;               /* U_CHAR: User-priority based on l_cpu and p_nic
  e. */
  >      int8_t  l_stat;                 /* CHAR: S* process status. */
  >      int8_t  l_pad1;                 /* fill out to 4-byte boundary */
  > -    int32_t l_pad2;                 /* .. and then to an 8-byte boundary */
  > -    char    l_wmesg[KI_WMESGLEN];   /* wchan message */
  > +    char    l_wmesg[KI_WMESGLEN];   /* wchan message (with NUL) */
  >      uint64_t l_wchan;               /* PTR: sleep address. */
  >      uint64_t l_cpuid;               /* LONG: CPU id */
  >      uint32_t l_rtime_sec;           /* STRUCT TIMEVAL: Real time. */

  This part changes the structure of kinfo_proc2 in a non-backwards
  compatible way by moving the offset of l_wmesg[*].  I think we just need
  to handle the fixed sized buffer directly in ps(1).  I'll look a bit
  more.

  Cheers,
  Simon.

  [*] For example you should be able to use any ps(1) binary from amd64
      or i386 from NetBSD 1.5 onwards on today's -current amd64 kernel.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.