NetBSD Problem Report #59657
From www@netbsd.org Wed Sep 17 01:21:24 2025
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
client-signature RSA-PSS (2048 bits) client-digest SHA256)
(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 31F6D1A923C
for <gnats-bugs@gnats.NetBSD.org>; Wed, 17 Sep 2025 01:21:24 +0000 (UTC)
Message-Id: <20250917012123.11FBB1A923D@mollari.NetBSD.org>
Date: Wed, 17 Sep 2025 01:21:23 +0000 (UTC)
From: pr@xn--rvztrtkrfrgp-bbb7j2b8f0b9d7a21oft.com
Reply-To: pr@xn--rvztrtkrfrgp-bbb7j2b8f0b9d7a21oft.com
To: gnats-bugs@NetBSD.org
Subject: syslogd outputs BOM in the message
X-Send-Pr-Version: www-1.0
>Number: 59657
>Category: bin
>Synopsis: syslogd outputs BOM in the message
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: bin-bug-people
>State: open
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Wed Sep 17 01:25:00 +0000 2025
>Last-Modified: Mon Oct 27 23:35:01 +0000 2025
>Originator: Benedek Gergely
>Release: 10.99.12
>Organization:
>Environment:
NetBSD funcube 10.99.12 NetBSD 10.99.12 (FUNCUBE) #68: Sat Jul 26 15:34:52 BST 2025 potato@funcube:/usr/obj/sys/arch/amd64/compile/FUNCUBE amd64
>Description:
SYSLOG(3) and rfc5424 similarly state:
"If the msgfmt contains UTF-8 characters, then it has to start with
a Byte Order Mark."
The BOM is unexpected as a prefix for every message logged:
2025-09-17T01:59:10.205820+01:00 funcube potato - - - <feff>Árvíztűrő tükörfúrógép
>How-To-Repeat:
syslogd -o rfc5424 -d
logger $(printf "\xEF\xBB\xBF%s" "Árvíztűrő tükörfúrógép")
tail -n 1 /var/log/messages | xxd
00000000: 3230 3235 2d30 392d 3137 5430 313a 3539 2025-09-17T01:59
00000010: 3a31 302e 3230 3538 3230 2b30 313a 3030 :10.205820+01:00
00000020: 2066 756e 6375 6265 2070 6f74 6174 6f20 funcube potato
00000030: 2d20 2d20 2d20 efbb bfc3 8172 76c3 ad7a - - - .....rv..z
00000040: 74c5 b172 c591 2074 c3bc 6bc3 b672 66c3 t..r.. t..k..rf.
00000050: ba72 c3b3 67c3 a970 0a .r..g..p.
>Fix:
Index: ./usr.sbin/syslogd/syslogd.c
===================================================================
RCS file: /cvsroot/src/usr.sbin/syslogd/syslogd.c,v
retrieving revision 1.147
diff -u -r1.147 syslogd.c
--- ./usr.sbin/syslogd/syslogd.c 9 Nov 2024 16:31:31 -0000 1.147
+++ ./usr.sbin/syslogd/syslogd.c 17 Sep 2025 01:08:30 -0000
@@ -1243,6 +1243,7 @@
DPRINTF(D_DATA, "UTF-8 BOM\n");
utf8allowed = true;
p += 3;
+ start += 3; /* skip BOM in output */
}
if (*p != '\0' && !utf8allowed) {
>Audit-Trail:
From: Christos Zoulas <christos@zoulas.com>
To: gnats-bugs@netbsd.org
Cc: gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: bin/59657: syslogd outputs BOM in the message
Date: Fri, 19 Sep 2025 12:16:39 -0400
--Apple-Mail=_E881427E-F680-4FDC-972E-7B2A6D942B37
Content-Type: multipart/alternative;
boundary="Apple-Mail=_8D6F4724-65D5-4C18-9E64-8931C7F899E3"
--Apple-Mail=_8D6F4724-65D5-4C18-9E64-8931C7F899E3
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=utf-8
> SYSLOG(3) and rfc5424 similarly state:=20
>=20
> "If the msgfmt contains UTF-8 characters, then it has to start with=20
> a Byte Order Mark."
>=20
>=20
> The BOM is unexpected as a prefix for every message logged:
>=20
> 2025-09-17T01:59:10.205820+01:00 funcube potato - - - =
<feff>=C3=81rv=C3=ADztűrő t=C3=BCk=C3=B6rf=C3=BAr=C3=B3g=C3=A9p
>> How-To-Repeat:
> syslogd -o rfc5424 -d
>=20
> logger $(printf "\xEF\xBB\xBF%s" "=C3=81rv=C3=ADztűrő =
t=C3=BCk=C3=B6rf=C3=BAr=C3=B3g=C3=A9p")
>=20
>=20
>=20
> tail -n 1 /var/log/messages | xxd
> 00000000: 3230 3235 2d30 392d 3137 5430 313a 3539 2025-09-17T01:59
> 00000010: 3a31 302e 3230 3538 3230 2b30 313a 3030 :10.205820+01:00
> 00000020: 2066 756e 6375 6265 2070 6f74 6174 6f20 funcube potato=20
> 00000030: 2d20 2d20 2d20 efbb bfc3 8172 76c3 ad7a - - - .....rv..z
> 00000040: 74c5 b172 c591 2074 c3bc 6bc3 b672 66c3 t..r.. t..k..rf.
> 00000050: ba72 c3b3 67c3 a970 0a .r..g..p.
>=20
Why do you say that? The BNF in the RFC says:
MSG =3D MSG-ANY / MSG-UTF8
MSG-ANY =3D *OCTET ; not starting with BOM
MSG-UTF8 =3D BOM UTF-8-STRING
BOM =3D %xEF.BB.BF
Now in practice according to ChatGPT:
Almost all modern syslog implementations do not emit a BOM, even for =
UTF-8 content.
Many receivers are tolerant and just assume UTF-8 without requiring BOM.
Some parsers can actually get confused if a BOM is present.
And:
RFC 5424 says the BOM is required if you send UTF-8 MSG.
In practice, it=E2=80=99s usually skipped, and interoperability tends to =
be better without it.
If your tool (msgfmt) prepends a BOM automatically, you should check the =
target syslog receiver. If it understands RFC 5424 to the letter, the =
BOM is technically correct. But if you=E2=80=99re aiming for =
compatibility with common syslog daemons (rsyslog, syslog-ng, journald =
forwarders), skipping the BOM is typically safer.
Perhaps adding a flag to select the behavior? What should the default =
be?
christos
>=20
>> Fix:
> Index: ./usr.sbin/syslogd/syslogd.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> RCS file: /cvsroot/src/usr.sbin/syslogd/syslogd.c,v
> retrieving revision 1.147
> diff -u -r1.147 syslogd.c
> --- ./usr.sbin/syslogd/syslogd.c 9 Nov 2024 16:31:31 -0000 =
1.147
> +++ ./usr.sbin/syslogd/syslogd.c 17 Sep 2025 01:08:30 -0000
> @@ -1243,6 +1243,7 @@
> DPRINTF(D_DATA, "UTF-8 BOM\n");
> utf8allowed =3D true;
> p +=3D 3;
> + start +=3D 3; /* skip BOM in output */
> }
>=20
> if (*p !=3D '\0' && !utf8allowed) {
--Apple-Mail=_8D6F4724-65D5-4C18-9E64-8931C7F899E3
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
charset=utf-8
<html aria-label=3D"message body"><head><meta http-equiv=3D"content-type" =
content=3D"text/html; charset=3Dutf-8"></head><body =
style=3D"overflow-wrap: break-word; -webkit-nbsp-mode: space; =
line-break: after-white-space;"><br =
id=3D"lineBreakAtBeginningOfMessage"><br><blockquote =
type=3D"cite">SYSLOG(3) and rfc5424 similarly state: <br><br>"If the =
msgfmt contains UTF-8 characters, then it has to start with <br>a Byte =
Order Mark."<br><br><br>The BOM is unexpected as a prefix for every =
message logged:<br><br>2025-09-17T01:59:10.205820+01:00 funcube potato - =
- - <feff>=C3=81rv=C3=ADzt&#369;r&#337; =
t=C3=BCk=C3=B6rf=C3=BAr=C3=B3g=C3=A9p<br><blockquote =
type=3D"cite">How-To-Repeat:<br></blockquote>syslogd -o rfc5424 =
-d<br><br>logger $(printf "\xEF\xBB\xBF%s" =
"=C3=81rv=C3=ADzt&#369;r&#337; =
t=C3=BCk=C3=B6rf=C3=BAr=C3=B3g=C3=A9p")<br><br><br><br>tail -n 1 =
/var/log/messages | xxd<br>00000000: 3230 3235 2d30 392d 3137 5430 313a =
3539 2025-09-17T01:59<br>00000010: 3a31 302e 3230 3538 3230 2b30 =
313a 3030 :10.205820+01:00<br>00000020: 2066 756e 6375 6265 2070 =
6f74 6174 6f20 funcube potato <br>00000030: 2d20 2d20 2d20 =
efbb bfc3 8172 76c3 ad7a - - - .....rv..z<br>00000040: 74c5 b172 =
c591 2074 c3bc 6bc3 b672 66c3 t..r.. t..k..rf.<br>00000050: ba72 =
c3b3 67c3 a970 0a =
&n=
bsp; .r..g..p.<br><br></blockquote><br>Why =
do you say that? The BNF in the RFC says:<div><br><div> =
MSG =3D MSG-ANY / =
MSG-UTF8</div><div> MSG-ANY =3D *OCTET =
; not starting with BOM</div><div> MSG-UTF8 =3D=
BOM UTF-8-STRING</div><div> BOM =
=3D =
%xEF.BB.BF</div><div><br></div><div><br></div><div>Now in practice =
according to ChatGPT:</div><div><ul data-start=3D"862" =
data-end=3D"1092"><li data-start=3D"862" data-end=3D"951"><p =
data-start=3D"864" data-end=3D"951">Almost all modern syslog =
implementations do =
not emit a BOM, even for UTF-8 content.</p></li><li =
data-start=3D"952" data-end=3D"1028"><p data-start=3D"954" =
data-end=3D"1028">Many receivers are tolerant and just assume UTF-8 =
without requiring BOM.</p></li><li data-start=3D"1029" =
data-end=3D"1092"><p data-start=3D"1031" data-end=3D"1092">Some parsers =
can actually get confused if a BOM is =
present.</p></li></ul></div><div>And:</div><div><ul data-start=3D"1129" =
data-end=3D"1288"><li data-start=3D"1129" data-end=3D"1193"><p =
data-start=3D"1131" data-end=3D"1193"><strong data-start=3D"1131" =
data-end=3D"1191">RFC 5424 says the BOM is required if you send UTF-8 =
MSG.</p></li><li data-start=3D"1194" data-end=3D"1288"><p =
data-start=3D"1196" data-end=3D"1288"><strong data-start=3D"1196" =
data-end=3D"1233">In practice, it=E2=80=99s usually skipped, =
and interoperability tends to be better without it.</p></li></ul><p =
data-start=3D"1290" data-end=3D"1610">If your tool (<code =
data-start=3D"1304" data-end=3D"1312">msgfmt) prepends a BOM =
automatically, you should check the target syslog receiver. If it =
understands RFC 5424 to the letter, the BOM is technically correct. But =
if you=E2=80=99re aiming for compatibility with common syslog daemons =
(rsyslog, syslog-ng, journald forwarders), skipping the BOM is typically =
safer.</p><p data-start=3D"1290" data-end=3D"1610">Perhaps adding a flag =
to select the behavior? What should the default be?</p><p =
data-start=3D"1290" data-end=3D"1610">christos</p></div><blockquote =
type=3D"cite"><br><blockquote type=3D"cite">Fix:<br></blockquote>Index: =
./usr.sbin/syslogd/syslogd.c<br>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D<br>RCS file: =
/cvsroot/src/usr.sbin/syslogd/syslogd.c,v<br>retrieving revision =
1.147<br>diff -u -r1.147 syslogd.c<br>--- ./usr.sbin/syslogd/syslogd.c =
9 Nov 2024 16:31:31 -0000 =
1.147<br>+++ =
./usr.sbin/syslogd/syslogd.c =
17 Sep 2025 01:08:30 =
-0000<br>@@ -1243,6 +1243,7 @@<br> =
&n=
bsp; DPRINTF(D_DATA, "UTF-8 BOM\n");<br> =
&n=
bsp; utf8allowed =3D true;<br> =
&n=
bsp; p +=3D 3;<br>+ =
&n=
bsp; start +=3D 3; /* skip BOM in output */<br> =
}<br><br> =
if (*p !=3D '\0' && =
!utf8allowed) {<br></blockquote><br></div></body></html>=
--Apple-Mail=_8D6F4724-65D5-4C18-9E64-8931C7F899E3--
--Apple-Mail=_E881427E-F680-4FDC-972E-7B2A6D942B37
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename=signature.asc
Content-Type: application/pgp-signature;
name=signature.asc
Content-Description: Message signed with OpenPGP
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
iF0EARECAB0WIQS+BJlbqPkO0MDBdsRxESqxbLM7OgUCaM2B5wAKCRBxESqxbLM7
OvQXAJ9iUiFrauG8L+Ja/OitOkvIlRBy6wCgghpWnOmrsCWaecpLjTUIQHHBxrQ=
=Uf4G
-----END PGP SIGNATURE-----
--Apple-Mail=_E881427E-F680-4FDC-972E-7B2A6D942B37--
From: Christos Zoulas <christos@zoulas.com>
To: gnats-bugs@netbsd.org
Cc: gnats-admin@netbsd.org,
netbsd-bugs@netbsd.org
Subject: Re: bin/59657: syslogd outputs BOM in the message
Date: Fri, 19 Sep 2025 12:16:39 -0400
--Apple-Mail=_E881427E-F680-4FDC-972E-7B2A6D942B37
Content-Type: multipart/alternative;
boundary="Apple-Mail=_8D6F4724-65D5-4C18-9E64-8931C7F899E3"
--Apple-Mail=_8D6F4724-65D5-4C18-9E64-8931C7F899E3
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
charset=utf-8
> SYSLOG(3) and rfc5424 similarly state:=20
>=20
> "If the msgfmt contains UTF-8 characters, then it has to start with=20
> a Byte Order Mark."
>=20
>=20
> The BOM is unexpected as a prefix for every message logged:
>=20
> 2025-09-17T01:59:10.205820+01:00 funcube potato - - - =
<feff>=C3=81rv=C3=ADztűrő t=C3=BCk=C3=B6rf=C3=BAr=C3=B3g=C3=A9p
>> How-To-Repeat:
> syslogd -o rfc5424 -d
>=20
> logger $(printf "\xEF\xBB\xBF%s" "=C3=81rv=C3=ADztűrő =
t=C3=BCk=C3=B6rf=C3=BAr=C3=B3g=C3=A9p")
>=20
>=20
>=20
> tail -n 1 /var/log/messages | xxd
> 00000000: 3230 3235 2d30 392d 3137 5430 313a 3539 2025-09-17T01:59
> 00000010: 3a31 302e 3230 3538 3230 2b30 313a 3030 :10.205820+01:00
> 00000020: 2066 756e 6375 6265 2070 6f74 6174 6f20 funcube potato=20
> 00000030: 2d20 2d20 2d20 efbb bfc3 8172 76c3 ad7a - - - .....rv..z
> 00000040: 74c5 b172 c591 2074 c3bc 6bc3 b672 66c3 t..r.. t..k..rf.
> 00000050: ba72 c3b3 67c3 a970 0a .r..g..p.
>=20
Why do you say that? The BNF in the RFC says:
MSG =3D MSG-ANY / MSG-UTF8
MSG-ANY =3D *OCTET ; not starting with BOM
MSG-UTF8 =3D BOM UTF-8-STRING
BOM =3D %xEF.BB.BF
Now in practice according to ChatGPT:
Almost all modern syslog implementations do not emit a BOM, even for =
UTF-8 content.
Many receivers are tolerant and just assume UTF-8 without requiring BOM.
Some parsers can actually get confused if a BOM is present.
And:
RFC 5424 says the BOM is required if you send UTF-8 MSG.
In practice, it=E2=80=99s usually skipped, and interoperability tends to =
be better without it.
If your tool (msgfmt) prepends a BOM automatically, you should check the =
target syslog receiver. If it understands RFC 5424 to the letter, the =
BOM is technically correct. But if you=E2=80=99re aiming for =
compatibility with common syslog daemons (rsyslog, syslog-ng, journald =
forwarders), skipping the BOM is typically safer.
Perhaps adding a flag to select the behavior? What should the default =
be?
christos
>=20
>> Fix:
> Index: ./usr.sbin/syslogd/syslogd.c
> =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
> RCS file: /cvsroot/src/usr.sbin/syslogd/syslogd.c,v
> retrieving revision 1.147
> diff -u -r1.147 syslogd.c
> --- ./usr.sbin/syslogd/syslogd.c 9 Nov 2024 16:31:31 -0000 =
1.147
> +++ ./usr.sbin/syslogd/syslogd.c 17 Sep 2025 01:08:30 -0000
> @@ -1243,6 +1243,7 @@
> DPRINTF(D_DATA, "UTF-8 BOM\n");
> utf8allowed =3D true;
> p +=3D 3;
> + start +=3D 3; /* skip BOM in output */
> }
>=20
> if (*p !=3D '\0' && !utf8allowed) {
--Apple-Mail=_8D6F4724-65D5-4C18-9E64-8931C7F899E3
Content-Transfer-Encoding: quoted-printable
Content-Type: text/html;
charset=utf-8
<html aria-label=3D"message body"><head><meta http-equiv=3D"content-type" =
content=3D"text/html; charset=3Dutf-8"></head><body =
style=3D"overflow-wrap: break-word; -webkit-nbsp-mode: space; =
line-break: after-white-space;"><br =
id=3D"lineBreakAtBeginningOfMessage"><br><blockquote =
type=3D"cite">SYSLOG(3) and rfc5424 similarly state: <br><br>"If the =
msgfmt contains UTF-8 characters, then it has to start with <br>a Byte =
Order Mark."<br><br><br>The BOM is unexpected as a prefix for every =
message logged:<br><br>2025-09-17T01:59:10.205820+01:00 funcube potato - =
- - <feff>=C3=81rv=C3=ADzt&#369;r&#337; =
t=C3=BCk=C3=B6rf=C3=BAr=C3=B3g=C3=A9p<br><blockquote =
type=3D"cite">How-To-Repeat:<br></blockquote>syslogd -o rfc5424 =
-d<br><br>logger $(printf "\xEF\xBB\xBF%s" =
"=C3=81rv=C3=ADzt&#369;r&#337; =
t=C3=BCk=C3=B6rf=C3=BAr=C3=B3g=C3=A9p")<br><br><br><br>tail -n 1 =
/var/log/messages | xxd<br>00000000: 3230 3235 2d30 392d 3137 5430 313a =
3539 2025-09-17T01:59<br>00000010: 3a31 302e 3230 3538 3230 2b30 =
313a 3030 :10.205820+01:00<br>00000020: 2066 756e 6375 6265 2070 =
6f74 6174 6f20 funcube potato <br>00000030: 2d20 2d20 2d20 =
efbb bfc3 8172 76c3 ad7a - - - .....rv..z<br>00000040: 74c5 b172 =
c591 2074 c3bc 6bc3 b672 66c3 t..r.. t..k..rf.<br>00000050: ba72 =
c3b3 67c3 a970 0a =
&n=
bsp; .r..g..p.<br><br></blockquote><br>Why =
do you say that? The BNF in the RFC says:<div><br><div> =
MSG =3D MSG-ANY / =
MSG-UTF8</div><div> MSG-ANY =3D *OCTET =
; not starting with BOM</div><div> MSG-UTF8 =3D=
BOM UTF-8-STRING</div><div> BOM =
=3D =
%xEF.BB.BF</div><div><br></div><div><br></div><div>Now in practice =
according to ChatGPT:</div><div><ul data-start=3D"862" =
data-end=3D"1092"><li data-start=3D"862" data-end=3D"951"><p =
data-start=3D"864" data-end=3D"951">Almost all modern syslog =
implementations do =
not emit a BOM, even for UTF-8 content.</p></li><li =
data-start=3D"952" data-end=3D"1028"><p data-start=3D"954" =
data-end=3D"1028">Many receivers are tolerant and just assume UTF-8 =
without requiring BOM.</p></li><li data-start=3D"1029" =
data-end=3D"1092"><p data-start=3D"1031" data-end=3D"1092">Some parsers =
can actually get confused if a BOM is =
present.</p></li></ul></div><div>And:</div><div><ul data-start=3D"1129" =
data-end=3D"1288"><li data-start=3D"1129" data-end=3D"1193"><p =
data-start=3D"1131" data-end=3D"1193"><strong data-start=3D"1131" =
data-end=3D"1191">RFC 5424 says the BOM is required if you send UTF-8 =
MSG.</p></li><li data-start=3D"1194" data-end=3D"1288"><p =
data-start=3D"1196" data-end=3D"1288"><strong data-start=3D"1196" =
data-end=3D"1233">In practice, it=E2=80=99s usually skipped, =
and interoperability tends to be better without it.</p></li></ul><p =
data-start=3D"1290" data-end=3D"1610">If your tool (<code =
data-start=3D"1304" data-end=3D"1312">msgfmt) prepends a BOM =
automatically, you should check the target syslog receiver. If it =
understands RFC 5424 to the letter, the BOM is technically correct. But =
if you=E2=80=99re aiming for compatibility with common syslog daemons =
(rsyslog, syslog-ng, journald forwarders), skipping the BOM is typically =
safer.</p><p data-start=3D"1290" data-end=3D"1610">Perhaps adding a flag =
to select the behavior? What should the default be?</p><p =
data-start=3D"1290" data-end=3D"1610">christos</p></div><blockquote =
type=3D"cite"><br><blockquote type=3D"cite">Fix:<br></blockquote>Index: =
./usr.sbin/syslogd/syslogd.c<br>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D<br>RCS file: =
/cvsroot/src/usr.sbin/syslogd/syslogd.c,v<br>retrieving revision =
1.147<br>diff -u -r1.147 syslogd.c<br>--- ./usr.sbin/syslogd/syslogd.c =
9 Nov 2024 16:31:31 -0000 =
1.147<br>+++ =
./usr.sbin/syslogd/syslogd.c =
17 Sep 2025 01:08:30 =
-0000<br>@@ -1243,6 +1243,7 @@<br> =
&n=
bsp; DPRINTF(D_DATA, "UTF-8 BOM\n");<br> =
&n=
bsp; utf8allowed =3D true;<br> =
&n=
bsp; p +=3D 3;<br>+ =
&n=
bsp; start +=3D 3; /* skip BOM in output */<br> =
}<br><br> =
if (*p !=3D '\0' && =
!utf8allowed) {<br></blockquote><br></div></body></html>=
--Apple-Mail=_8D6F4724-65D5-4C18-9E64-8931C7F899E3--
--Apple-Mail=_E881427E-F680-4FDC-972E-7B2A6D942B37
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
filename=signature.asc
Content-Type: application/pgp-signature;
name=signature.asc
Content-Description: Message signed with OpenPGP
-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - http://gpgtools.org
iF0EARECAB0WIQS+BJlbqPkO0MDBdsRxESqxbLM7OgUCaM2B5wAKCRBxESqxbLM7
OvQXAJ9iUiFrauG8L+Ja/OitOkvIlRBy6wCgghpWnOmrsCWaecpLjTUIQHHBxrQ=
=Uf4G
-----END PGP SIGNATURE-----
--Apple-Mail=_E881427E-F680-4FDC-972E-7B2A6D942B37--
From: pr@xn--rvztrtkrfrgp-bbb7j2b8f0b9d7a21oft.com
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: bin/59657: syslogd outputs BOM in the message
Date: Mon, 27 Oct 2025 23:23:28 +0000
On Sun, Sep 21, 2025 at 09:25:01AM +0000, Christos Zoulas via gnats wrote:
> Now in practice according to ChatGPT:
> Almost all modern syslog implementations do not emit a BOM, even for =
> UTF-8 content.
>
> Many receivers are tolerant and just assume UTF-8 without requiring BOM.
>
ChatGPT that fount of knowledge.
I'd call the FreeBSD implementation the opposite of tolerant of UTF-8:
957 /*
958 * Removes characters from log messages that are unsafe to display.
959 * TODO: Permit UTF-8 strings that include a BOM per RFC 5424?
960 */
961 static void
962 parsemsg_remove_unsafe_characters(const char *in, char *out, size_t outlen)
>
> And:
> RFC 5424 says the BOM is required if you send UTF-8 MSG.
>
Yes, that was a terrible idea along with adding an optional BOM to
UTF-8.
apropro it's a can of worms:
https://corp.unicode.org/pipermail/unicode/2020-June/008713.html
>
> Perhaps adding a flag to select the behavior? What should the default =
> be?
The BOM is the flag, we don't need to show the flag as a prefix for
every message. We don't notice the optional UTF-8 BOM "" at the
beginning of files because editors etc. don't show it.
Although it has just occured to me I'm only thinking about this
appearing locally and not being passed onto a remote syslogd. We
shouldn't strip it in that case (the BOM is now the remote syslogd's
problem).
So it'll need to be preserved for F_FORW & F_TLS but not for local
likely to be read by a human destinations like F_FILE, F_WALL etc.
This is actually much messier than I first thought.
At the end of the day it is just something that can be filtered out by
whomever is viewing it.
emacs:
2025-10-27T23:08:58.611141+00:00 funcube potato - - - dr<U+f8>mst<U+f8>rre
2025-10-27T23:09:00.741754+00:00 funcube potato - - - _drømstørre
vim:
2025-10-27T23:08:58.611141+00:00 funcube potato - - - dr<U+f8>mst<U+f8>rre
2025-10-27T23:09:00.741754+00:00 funcube potato - - - <feff>drømstørre
Cheers,
Ben
(Contact us)
$NetBSD: query-full-pr,v 1.49 2026/05/14 01:52:41 riastradh Exp $
$NetBSD: gnats_config.sh,v 1.10 2026/05/13 22:00:09 riastradh Exp $
Copyright © 1994-2026
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.