NetBSD Problem Report #59029
From www@netbsd.org Fri Jan 24 22:17:41 2025
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits)
client-signature RSA-PSS (2048 bits))
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id CCB181A923A
for <gnats-bugs@gnats.NetBSD.org>; Fri, 24 Jan 2025 22:17:40 +0000 (UTC)
Message-Id: <20250124221739.B18A81A923B@mollari.NetBSD.org>
Date: Fri, 24 Jan 2025 22:17:39 +0000 (UTC)
From: david@gutteridge.ca
Reply-To: david@gutteridge.ca
To: gnats-bugs@NetBSD.org
Subject: cut(1) -n argument doesn't work (presently unsupported, though documented)
X-Send-Pr-Version: www-1.0
>Number: 59029
>Category: bin
>Synopsis: cut(1) -n argument doesn't work (presently unsupported, though documented)
>Confidential: no
>Severity: serious
>Priority: medium
>Responsible: gutteridge
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Jan 24 22:20:00 +0000 2025
>Closed-Date: Tue Mar 04 04:16:49 +0000 2025
>Last-Modified: Tue Mar 04 04:16:49 +0000 2025
>Originator: David H. Gutteridge
>Release: current
>Organization:
TNF
>Environment:
>Description:
The -n argument of cut(1) doesn't work; it's unimplemented. It is
documented in the man page and usage(), but code inspection shows it
was never actually implemented in NetBSD's version of the command.
(There's a comment in cut.c that says
/* Since we don't support multi-byte characters, the -c and -b
options are equivalent, and the -n option is meaningless. */
though that's not actually correct, as multi-byte characters have been
supported since 2007. But -n support simply wasn't added then.)
>How-To-Repeat:
Try to use "cut -b X,Y,Z -n foo.txt" on a file with an encoding that
supports multi-byte characters and happens to have some around the
boundaries.
>Fix:
Add missing functionality to cut(1). TBD. (FreeBSD and OpenBSD have it.)
>Release-Note:
>Audit-Trail:
From: "David H. Gutteridge" <gutteridge@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/59029 CVS commit: src/tests/usr.bin/cut
Date: Fri, 24 Jan 2025 22:23:38 +0000
Module Name: src
Committed By: gutteridge
Date: Fri Jan 24 22:23:38 UTC 2025
Modified Files:
src/tests/usr.bin/cut: t_cut.sh
Log Message:
t_cut.sh: add a test case for -n (PR bin/59029)
To generate a diff of this commit:
cvs rdiff -u -r1.2 -r1.3 src/tests/usr.bin/cut/t_cut.sh
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "David H. Gutteridge" <gutteridge@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/59029 CVS commit: src/tests/usr.bin/cut
Date: Fri, 24 Jan 2025 22:26:41 +0000
Module Name: src
Committed By: gutteridge
Date: Fri Jan 24 22:26:41 UTC 2025
Modified Files:
src/tests/usr.bin/cut: t_cut.sh
Log Message:
t_cut.sh: fix argument ordering of new test case, whoops
(PR bin/59029)
To generate a diff of this commit:
cvs rdiff -u -r1.3 -r1.4 src/tests/usr.bin/cut/t_cut.sh
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
Responsible-Changed-From-To: bin-bug-people->gutteridge
Responsible-Changed-By: gutteridge@NetBSD.org
Responsible-Changed-When: Mon, 27 Jan 2025 22:15:57 +0000
Responsible-Changed-Why:
Take this (fairly obscure) bug so it doesn't get lost.
From: "David H. Gutteridge" <david@gutteridge.ca>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/59029: cut(1) -n argument doesn't work (presently
unsupported, though documented)
Date: Wed, 12 Feb 2025 21:53:38 -0500
The existing NetBSD code is a touch ungainly to modify to handle this
context (depending on the approach) since it uses a preprocessor
"trick", and would be uglier with macro-wrapped conditionals. Comparing
other implementations, I see OpenBSD simply made "cut -b -n" be handled
as "cut -c". FreeBSD, on the other hand, uses a distinct function
"b_n_cut" with unique logic. (Neither offer anything useful I can find
for test suite content. It would be good to find an illustration of
where the two approaches give varied output.)
From: Robert Elz <kre@munnari.OZ.AU>
To: gnats-bugs@netbsd.org
Cc: gutteridge@netbsd.org
Subject: Re: bin/59029: cut(1) -n argument doesn't work (presently unsupported, though documented)
Date: Thu, 13 Feb 2025 11:46:16 +0700
Date: Thu, 13 Feb 2025 02:55:01 +0000 (UTC)
From: "David H. Gutteridge via gnats" <gnats-admin@NetBSD.org>
Message-ID: <20250213025501.B38C81A923C@mollari.NetBSD.org>
| It would be good to find an illustration of
| where the two approaches give varied output.)
My guess (without testing it) would be that if we had a file where
at some point in the file, which has up to this point all been
single byte (eg: ascii) chars, we have, at offset (say) 100
100 A B XX YYY ZZ C D E F G H
where the duplicated chars mean a character that has a multi-byte
encoding, not two X chars, and the spaces are just padding for this e-mail.
In that scheme, using -b the bytes would count
100 A B XX YYY ZZ C D E F G H
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
100 101 102 104 107 109 110 111 112 113 114
(with the missing bytes numbers being the additional bytes
needed to encode the multi-byte characters, which don't easily
fit in this display, unless I added more lines).
But using -c the counts would be
100 A B XX YYY ZZ C D E F G H
^ ^ ^ ^ ^ ^ ^ ^ ^ ^ ^
100 101 102 103 104 105 106 107 108 109 110
Specifying 109 as position in a -b list means cut at the 'C', whereas
specifying it in the -c list means cut at the 'G'. In this case -n
is irrelevant, as no multi-byte character would be broken, but it is
clear that using code for -c to implement the user's -b is simply wrong,
regardless of -n being given or not.
I'd assume the "special logic" you noted in the FreeBSD code is to handle
the case where a -b list includes 105 - that is, a byte offset right in the
middle of the Y character. In that case, without -n, the cut would
just happen there, right in the middle of the Y, but with -n the cut needs
to either be before Y or after it, that is, offset 104 or 107 (which is
selected probably is entirely up to the coder).
Neither the standard -b nor -c algorithm would get that right. If you're
looking for an implementation to import to improve ours, pick FreeBSD's.
kre
From: "David H. Gutteridge" <gutteridge@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/59029 CVS commit: src/tests/usr.bin/cut
Date: Sat, 22 Feb 2025 23:38:09 +0000
Module Name: src
Committed By: gutteridge
Date: Sat Feb 22 23:38:09 UTC 2025
Modified Files:
src/tests/usr.bin/cut: t_cut.sh
Log Message:
t_cut.sh: correct a test case for -n (PR bin/59029)
The test case added before was based on how a version of GNU coreutils
cut(1) -- as patched by Red Hat to accept this flag -- behaved, but in
fact it, and OpenBSD's implementation, too, doesn't distinguish between
-c and -b -n, which doesn't align with the meaning/implementation used
in commercial Unix variants that originally offered -n (e.g., Solaris
and Tru64).
The new version of the test case clearly illustrates the differences
between interpretations of this flag (Solaris, FreeBSD, RHEL/Fedora,
OpenBSD).
To generate a diff of this commit:
cvs rdiff -u -r1.4 -r1.5 src/tests/usr.bin/cut/t_cut.sh
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "David H. Gutteridge" <david@gutteridge.ca>
To: gnats-bugs@netbsd.org, kre@munnari.OZ.AU
Cc:
Subject: Re: bin/59029: cut(1) -n argument doesn't work (presently
unsupported, though documented)
Date: Mon, 03 Mar 2025 20:00:19 -0500
On Thu, 13 Feb 2025 at 11:46:16 +0700, Robert Elz wrote:
[...]
> Neither the standard -b nor -c algorithm would get that right. If
you're
> looking for an implementation to import to improve ours, pick
FreeBSD's.
Yes, that's quite right. I looked at the AT&T "Advanced Software
Technology" code that's used in (open-sourced) Solaris (which I infer
is the origin of this) and FreeBSD's intent is to match that logic.
Given the NetBSD test input:
$ cat d_utf8.in=20
foo=C3=84:bar:=C3=84baz
Foo=C3=84:Bar:=C3=84Baz
FOo=C3=84:BAr:=C3=84BAz
FOO=C3=84:BAR:=C3=84BAZ
On implementations like Fedora/RHEL that equate -b -n to -c:
$ cut -b 5,6,7 d_utf8.in=20
=EF=BF=BD:b
=EF=BF=BD:B
=EF=BF=BD:B
=EF=BF=BD:B
$ cut -b 5,6,7 -n d_utf8.in=20
:ba
:Ba
:BA
:BA
$ cut -c 5,6,7 d_utf8.in=20
:ba
:Ba
:BA
:BA
On FreeBSD and Solaris:
$ cut -b 5,6,7 d_utf8.in=20
=EF=BF=BD:b
=EF=BF=BD:B
=EF=BF=BD:B
=EF=BF=BD:B
$ cut -b 5,6,7 -n d_utf8.in=20
=C3=84:b
=C3=84:B
=C3=84:B
=C3=84:B
$ cut -c 5,6,7 d_utf8.in=20
:ba
:Ba
:BA
:BA
I've pulled over the change sets from FreeBSD to apply to ours. (Still
adjusting/cleaning some things up.)
Thanks,
Dave
From: "David H. Gutteridge" <gutteridge@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/59029 CVS commit: src/usr.bin/cut
Date: Tue, 4 Mar 2025 03:54:19 +0000
Module Name: src
Committed By: gutteridge
Date: Tue Mar 4 03:54:19 UTC 2025
Modified Files:
src/usr.bin/cut: cut.1 cut.c
Log Message:
cut(1): implement the -n option (for use with -b)
This command had long advertised the existence of -n (in its usage
message and man page) but had never implemented it. Here we borrow the
implementation written by Tim J. Robbins for FreeBSD, which provides
most code changes and almost all documentation changes applied here. We
also borrow some options handling simplifications from OpenBSD, with
some minor tweaks to code and documentation by me.
Addresses PR bin/59029. This is a pretty obscure feature, it seems, so
it's unlikely it will be pulled up to stable branches.
To generate a diff of this commit:
cvs rdiff -u -r1.18 -r1.19 src/usr.bin/cut/cut.1
cvs rdiff -u -r1.30 -r1.31 src/usr.bin/cut/cut.c
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
From: "David H. Gutteridge" <gutteridge@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc:
Subject: PR/59029 CVS commit: src/tests/usr.bin/cut
Date: Tue, 4 Mar 2025 03:55:39 +0000
Module Name: src
Committed By: gutteridge
Date: Tue Mar 4 03:55:39 UTC 2025
Modified Files:
src/tests/usr.bin/cut: t_cut.sh
Log Message:
t_cut.sh: reflect that PR bin/59029 has been addressed
To generate a diff of this commit:
cvs rdiff -u -r1.5 -r1.6 src/tests/usr.bin/cut/t_cut.sh
Please note that diffs are not public domain; they are subject to the
copyright notices on the relevant files.
State-Changed-From-To: open->closed
State-Changed-By: gutteridge@NetBSD.org
State-Changed-When: Tue, 04 Mar 2025 04:16:49 +0000
State-Changed-Why:
Issue resolved. Not necessary to pull up.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2025
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.