NetBSD Problem Report #59657

From www@netbsd.org  Wed Sep 17 01:21:24 2025
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256
	 client-signature RSA-PSS (2048 bits) client-digest SHA256)
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 31F6D1A923C
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 17 Sep 2025 01:21:24 +0000 (UTC)
Message-Id: <20250917012123.11FBB1A923D@mollari.NetBSD.org>
Date: Wed, 17 Sep 2025 01:21:23 +0000 (UTC)
From: pr@xn--rvztrtkrfrgp-bbb7j2b8f0b9d7a21oft.com
Reply-To: pr@xn--rvztrtkrfrgp-bbb7j2b8f0b9d7a21oft.com
To: gnats-bugs@NetBSD.org
Subject: syslogd outputs BOM in the message
X-Send-Pr-Version: www-1.0

>Number:         59657
>Category:       bin
>Synopsis:       syslogd outputs BOM in the message
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    bin-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Sep 17 01:25:00 +0000 2025
>Last-Modified:  Mon Oct 27 23:35:01 +0000 2025
>Originator:     Benedek Gergely
>Release:        10.99.12
>Organization:
>Environment:
NetBSD funcube 10.99.12 NetBSD 10.99.12 (FUNCUBE) #68: Sat Jul 26 15:34:52 BST 2025  potato@funcube:/usr/obj/sys/arch/amd64/compile/FUNCUBE amd64
>Description:
SYSLOG(3) and rfc5424 similarly state: 

"If the msgfmt contains UTF-8 characters, then it has to start with 
a Byte Order Mark."


The BOM is unexpected as a prefix for every message logged:

2025-09-17T01:59:10.205820+01:00 funcube potato - - - <feff>Árvízt&#369;r&#337; tükörfúrógép
>How-To-Repeat:
syslogd -o rfc5424 -d

logger $(printf "\xEF\xBB\xBF%s" "Árvízt&#369;r&#337; tükörfúrógép")



tail -n 1 /var/log/messages | xxd
00000000: 3230 3235 2d30 392d 3137 5430 313a 3539  2025-09-17T01:59
00000010: 3a31 302e 3230 3538 3230 2b30 313a 3030  :10.205820+01:00
00000020: 2066 756e 6375 6265 2070 6f74 6174 6f20   funcube potato 
00000030: 2d20 2d20 2d20 efbb bfc3 8172 76c3 ad7a  - - - .....rv..z
00000040: 74c5 b172 c591 2074 c3bc 6bc3 b672 66c3  t..r.. t..k..rf.
00000050: ba72 c3b3 67c3 a970 0a                   .r..g..p.


>Fix:
Index: ./usr.sbin/syslogd/syslogd.c
===================================================================
RCS file: /cvsroot/src/usr.sbin/syslogd/syslogd.c,v
retrieving revision 1.147
diff -u -r1.147 syslogd.c
--- ./usr.sbin/syslogd/syslogd.c        9 Nov 2024 16:31:31 -0000       1.147
+++ ./usr.sbin/syslogd/syslogd.c        17 Sep 2025 01:08:30 -0000
@@ -1243,6 +1243,7 @@
                DPRINTF(D_DATA, "UTF-8 BOM\n");
                utf8allowed = true;
                p += 3;
+               start += 3;  /* skip BOM in output */
        }

        if (*p != '\0' && !utf8allowed) {

>Audit-Trail:
From: Christos Zoulas <christos@zoulas.com>
To: gnats-bugs@netbsd.org
Cc: gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org
Subject: Re: bin/59657: syslogd outputs BOM in the message
Date: Fri, 19 Sep 2025 12:16:39 -0400

 --Apple-Mail=_E881427E-F680-4FDC-972E-7B2A6D942B37
 Content-Type: multipart/alternative;
 	boundary="Apple-Mail=_8D6F4724-65D5-4C18-9E64-8931C7F899E3"


 --Apple-Mail=_8D6F4724-65D5-4C18-9E64-8931C7F899E3
 Content-Transfer-Encoding: quoted-printable
 Content-Type: text/plain;
 	charset=utf-8



 > SYSLOG(3) and rfc5424 similarly state:=20
 >=20
 > "If the msgfmt contains UTF-8 characters, then it has to start with=20
 > a Byte Order Mark."
 >=20
 >=20
 > The BOM is unexpected as a prefix for every message logged:
 >=20
 > 2025-09-17T01:59:10.205820+01:00 funcube potato - - - =
 <feff>=C3=81rv=C3=ADzt&#369;r&#337; t=C3=BCk=C3=B6rf=C3=BAr=C3=B3g=C3=A9p
 >> How-To-Repeat:
 > syslogd -o rfc5424 -d
 >=20
 > logger $(printf "\xEF\xBB\xBF%s" "=C3=81rv=C3=ADzt&#369;r&#337; =
 t=C3=BCk=C3=B6rf=C3=BAr=C3=B3g=C3=A9p")
 >=20
 >=20
 >=20
 > tail -n 1 /var/log/messages | xxd
 > 00000000: 3230 3235 2d30 392d 3137 5430 313a 3539  2025-09-17T01:59
 > 00000010: 3a31 302e 3230 3538 3230 2b30 313a 3030  :10.205820+01:00
 > 00000020: 2066 756e 6375 6265 2070 6f74 6174 6f20   funcube potato=20
 > 00000030: 2d20 2d20 2d20 efbb bfc3 8172 76c3 ad7a  - - - .....rv..z
 > 00000040: 74c5 b172 c591 2074 c3bc 6bc3 b672 66c3  t..r.. t..k..rf.
 > 00000050: ba72 c3b3 67c3 a970 0a                   .r..g..p.
 >=20

 Why do you say that? The BNF in the RFC says:

       MSG             =3D MSG-ANY / MSG-UTF8
       MSG-ANY     =3D *OCTET ; not starting with BOM
       MSG-UTF8   =3D BOM UTF-8-STRING
       BOM             =3D %xEF.BB.BF


 Now in practice according to ChatGPT:
 Almost all modern syslog implementations do not emit a BOM, even for =
 UTF-8 content.

 Many receivers are tolerant and just assume UTF-8 without requiring BOM.

 Some parsers can actually get confused if a BOM is present.

 And:
 RFC 5424 says the BOM is required if you send UTF-8 MSG.

 In practice, it=E2=80=99s usually skipped, and interoperability tends to =
 be better without it.

 If your tool (msgfmt) prepends a BOM automatically, you should check the =
 target syslog receiver. If it understands RFC 5424 to the letter, the =
 BOM is technically correct. But if you=E2=80=99re aiming for =
 compatibility with common syslog daemons (rsyslog, syslog-ng, journald =
 forwarders), skipping the BOM is typically safer.

 Perhaps adding a flag to select the behavior? What should the default =
 be?

 christos

 >=20
 >> Fix:
 > Index: ./usr.sbin/syslogd/syslogd.c
 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
 > RCS file: /cvsroot/src/usr.sbin/syslogd/syslogd.c,v
 > retrieving revision 1.147
 > diff -u -r1.147 syslogd.c
 > --- ./usr.sbin/syslogd/syslogd.c        9 Nov 2024 16:31:31 -0000      =
  1.147
 > +++ ./usr.sbin/syslogd/syslogd.c        17 Sep 2025 01:08:30 -0000
 > @@ -1243,6 +1243,7 @@
 >                DPRINTF(D_DATA, "UTF-8 BOM\n");
 >                utf8allowed =3D true;
 >                p +=3D 3;
 > +               start +=3D 3;  /* skip BOM in output */
 >        }
 >=20
 >        if (*p !=3D '\0' && !utf8allowed) {


 --Apple-Mail=_8D6F4724-65D5-4C18-9E64-8931C7F899E3
 Content-Transfer-Encoding: quoted-printable
 Content-Type: text/html;
 	charset=utf-8

 <html aria-label=3D"message body"><head><meta http-equiv=3D"content-type" =
 content=3D"text/html; charset=3Dutf-8"></head><body =
 style=3D"overflow-wrap: break-word; -webkit-nbsp-mode: space; =
 line-break: after-white-space;"><br =
 id=3D"lineBreakAtBeginningOfMessage"><br><blockquote =
 type=3D"cite">SYSLOG(3) and rfc5424 similarly state: <br><br>"If the =
 msgfmt contains UTF-8 characters, then it has to start with <br>a Byte =
 Order Mark."<br><br><br>The BOM is unexpected as a prefix for every =
 message logged:<br><br>2025-09-17T01:59:10.205820+01:00 funcube potato - =
 - - &lt;feff&gt;=C3=81rv=C3=ADzt&amp;#369;r&amp;#337; =
 t=C3=BCk=C3=B6rf=C3=BAr=C3=B3g=C3=A9p<br><blockquote =
 type=3D"cite">How-To-Repeat:<br></blockquote>syslogd -o rfc5424 =
 -d<br><br>logger $(printf "\xEF\xBB\xBF%s" =
 "=C3=81rv=C3=ADzt&amp;#369;r&amp;#337; =
 t=C3=BCk=C3=B6rf=C3=BAr=C3=B3g=C3=A9p")<br><br><br><br>tail -n 1 =
 /var/log/messages | xxd<br>00000000: 3230 3235 2d30 392d 3137 5430 313a =
 3539 &nbsp;2025-09-17T01:59<br>00000010: 3a31 302e 3230 3538 3230 2b30 =
 313a 3030 &nbsp;:10.205820+01:00<br>00000020: 2066 756e 6375 6265 2070 =
 6f74 6174 6f20 &nbsp;&nbsp;funcube potato <br>00000030: 2d20 2d20 2d20 =
 efbb bfc3 8172 76c3 ad7a &nbsp;- - - .....rv..z<br>00000040: 74c5 b172 =
 c591 2074 c3bc 6bc3 b672 66c3 &nbsp;t..r.. t..k..rf.<br>00000050: ba72 =
 c3b3 67c3 a970 0a =
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
 bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.r..g..p.<br><br></blockquote><br>Why =
 do you say that? The BNF in the RFC says:<div><br><div>&nbsp; &nbsp; =
 &nbsp; MSG &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =3D MSG-ANY / =
 MSG-UTF8</div><div>&nbsp; &nbsp; &nbsp; MSG-ANY &nbsp; &nbsp; =3D *OCTET =
 ; not starting with BOM</div><div>&nbsp; &nbsp; &nbsp; MSG-UTF8 &nbsp; =3D=
  BOM UTF-8-STRING</div><div>&nbsp; &nbsp; &nbsp; BOM &nbsp; &nbsp; =
 &nbsp; &nbsp; &nbsp; &nbsp; =3D =
 %xEF.BB.BF</div><div><br></div><div><br></div><div>Now in practice =
 according to ChatGPT:</div><div><ul data-start=3D"862" =
 data-end=3D"1092"><li data-start=3D"862" data-end=3D"951"><p =
 data-start=3D"864" data-end=3D"951">Almost all modern syslog =
 implementations&nbsp;do =
 not&nbsp;emit a BOM, even for UTF-8 content.</p></li><li =
 data-start=3D"952" data-end=3D"1028"><p data-start=3D"954" =
 data-end=3D"1028">Many receivers are tolerant and just assume UTF-8 =
 without requiring BOM.</p></li><li data-start=3D"1029" =
 data-end=3D"1092"><p data-start=3D"1031" data-end=3D"1092">Some parsers =
 can actually get confused if a BOM is =
 present.</p></li></ul></div><div>And:</div><div><ul data-start=3D"1129" =
 data-end=3D"1288"><li data-start=3D"1129" data-end=3D"1193"><p =
 data-start=3D"1131" data-end=3D"1193"><strong data-start=3D"1131" =
 data-end=3D"1191">RFC 5424 says the BOM is required if you send UTF-8 =
 MSG.</p></li><li data-start=3D"1194" data-end=3D"1288"><p =
 data-start=3D"1196" data-end=3D"1288"><strong data-start=3D"1196" =
 data-end=3D"1233">In practice, it=E2=80=99s usually skipped, =
 and interoperability tends to be better without it.</p></li></ul><p =
 data-start=3D"1290" data-end=3D"1610">If your tool (<code =
 data-start=3D"1304" data-end=3D"1312">msgfmt) prepends a BOM =
 automatically, you should check the target syslog receiver. If it =
 understands RFC 5424 to the letter, the BOM is technically correct. But =
 if you=E2=80=99re aiming for compatibility with common syslog daemons =
 (rsyslog, syslog-ng, journald forwarders), skipping the BOM is typically =
 safer.</p><p data-start=3D"1290" data-end=3D"1610">Perhaps adding a flag =
 to select the behavior? What should the default be?</p><p =
 data-start=3D"1290" data-end=3D"1610">christos</p></div><blockquote =
 type=3D"cite"><br><blockquote type=3D"cite">Fix:<br></blockquote>Index: =
 ./usr.sbin/syslogd/syslogd.c<br>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D<br>RCS file: =
 /cvsroot/src/usr.sbin/syslogd/syslogd.c,v<br>retrieving revision =
 1.147<br>diff -u -r1.147 syslogd.c<br>--- ./usr.sbin/syslogd/syslogd.c =
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;9 Nov 2024 16:31:31 -0000 =
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1.147<br>+++ =
 ./usr.sbin/syslogd/syslogd.c =
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;17 Sep 2025 01:08:30 =
 -0000<br>@@ -1243,6 +1243,7 @@<br> =
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
 bsp;&nbsp;&nbsp;DPRINTF(D_DATA, "UTF-8 BOM\n");<br> =
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
 bsp;&nbsp;&nbsp;utf8allowed =3D true;<br> =
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
 bsp;&nbsp;&nbsp;p +=3D 3;<br>+ =
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
 bsp;&nbsp;start +=3D 3; &nbsp;/* skip BOM in output */<br> =
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br><br> =
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (*p !=3D '\0' &amp;&amp; =
 !utf8allowed) {<br></blockquote><br></div></body></html>=

 --Apple-Mail=_8D6F4724-65D5-4C18-9E64-8931C7F899E3--

 --Apple-Mail=_E881427E-F680-4FDC-972E-7B2A6D942B37
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
 	filename=signature.asc
 Content-Type: application/pgp-signature;
 	name=signature.asc
 Content-Description: Message signed with OpenPGP

 -----BEGIN PGP SIGNATURE-----
 Comment: GPGTools - http://gpgtools.org

 iF0EARECAB0WIQS+BJlbqPkO0MDBdsRxESqxbLM7OgUCaM2B5wAKCRBxESqxbLM7
 OvQXAJ9iUiFrauG8L+Ja/OitOkvIlRBy6wCgghpWnOmrsCWaecpLjTUIQHHBxrQ=
 =Uf4G
 -----END PGP SIGNATURE-----

 --Apple-Mail=_E881427E-F680-4FDC-972E-7B2A6D942B37--

From: Christos Zoulas <christos@zoulas.com>
To: gnats-bugs@netbsd.org
Cc: gnats-admin@netbsd.org,
 netbsd-bugs@netbsd.org
Subject: Re: bin/59657: syslogd outputs BOM in the message
Date: Fri, 19 Sep 2025 12:16:39 -0400

 --Apple-Mail=_E881427E-F680-4FDC-972E-7B2A6D942B37
 Content-Type: multipart/alternative;
 	boundary="Apple-Mail=_8D6F4724-65D5-4C18-9E64-8931C7F899E3"


 --Apple-Mail=_8D6F4724-65D5-4C18-9E64-8931C7F899E3
 Content-Transfer-Encoding: quoted-printable
 Content-Type: text/plain;
 	charset=utf-8



 > SYSLOG(3) and rfc5424 similarly state:=20
 >=20
 > "If the msgfmt contains UTF-8 characters, then it has to start with=20
 > a Byte Order Mark."
 >=20
 >=20
 > The BOM is unexpected as a prefix for every message logged:
 >=20
 > 2025-09-17T01:59:10.205820+01:00 funcube potato - - - =
 <feff>=C3=81rv=C3=ADzt&#369;r&#337; t=C3=BCk=C3=B6rf=C3=BAr=C3=B3g=C3=A9p
 >> How-To-Repeat:
 > syslogd -o rfc5424 -d
 >=20
 > logger $(printf "\xEF\xBB\xBF%s" "=C3=81rv=C3=ADzt&#369;r&#337; =
 t=C3=BCk=C3=B6rf=C3=BAr=C3=B3g=C3=A9p")
 >=20
 >=20
 >=20
 > tail -n 1 /var/log/messages | xxd
 > 00000000: 3230 3235 2d30 392d 3137 5430 313a 3539  2025-09-17T01:59
 > 00000010: 3a31 302e 3230 3538 3230 2b30 313a 3030  :10.205820+01:00
 > 00000020: 2066 756e 6375 6265 2070 6f74 6174 6f20   funcube potato=20
 > 00000030: 2d20 2d20 2d20 efbb bfc3 8172 76c3 ad7a  - - - .....rv..z
 > 00000040: 74c5 b172 c591 2074 c3bc 6bc3 b672 66c3  t..r.. t..k..rf.
 > 00000050: ba72 c3b3 67c3 a970 0a                   .r..g..p.
 >=20

 Why do you say that? The BNF in the RFC says:

       MSG             =3D MSG-ANY / MSG-UTF8
       MSG-ANY     =3D *OCTET ; not starting with BOM
       MSG-UTF8   =3D BOM UTF-8-STRING
       BOM             =3D %xEF.BB.BF


 Now in practice according to ChatGPT:
 Almost all modern syslog implementations do not emit a BOM, even for =
 UTF-8 content.

 Many receivers are tolerant and just assume UTF-8 without requiring BOM.

 Some parsers can actually get confused if a BOM is present.

 And:
 RFC 5424 says the BOM is required if you send UTF-8 MSG.

 In practice, it=E2=80=99s usually skipped, and interoperability tends to =
 be better without it.

 If your tool (msgfmt) prepends a BOM automatically, you should check the =
 target syslog receiver. If it understands RFC 5424 to the letter, the =
 BOM is technically correct. But if you=E2=80=99re aiming for =
 compatibility with common syslog daemons (rsyslog, syslog-ng, journald =
 forwarders), skipping the BOM is typically safer.

 Perhaps adding a flag to select the behavior? What should the default =
 be?

 christos

 >=20
 >> Fix:
 > Index: ./usr.sbin/syslogd/syslogd.c
 > =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
 > RCS file: /cvsroot/src/usr.sbin/syslogd/syslogd.c,v
 > retrieving revision 1.147
 > diff -u -r1.147 syslogd.c
 > --- ./usr.sbin/syslogd/syslogd.c        9 Nov 2024 16:31:31 -0000      =
  1.147
 > +++ ./usr.sbin/syslogd/syslogd.c        17 Sep 2025 01:08:30 -0000
 > @@ -1243,6 +1243,7 @@
 >                DPRINTF(D_DATA, "UTF-8 BOM\n");
 >                utf8allowed =3D true;
 >                p +=3D 3;
 > +               start +=3D 3;  /* skip BOM in output */
 >        }
 >=20
 >        if (*p !=3D '\0' && !utf8allowed) {


 --Apple-Mail=_8D6F4724-65D5-4C18-9E64-8931C7F899E3
 Content-Transfer-Encoding: quoted-printable
 Content-Type: text/html;
 	charset=utf-8

 <html aria-label=3D"message body"><head><meta http-equiv=3D"content-type" =
 content=3D"text/html; charset=3Dutf-8"></head><body =
 style=3D"overflow-wrap: break-word; -webkit-nbsp-mode: space; =
 line-break: after-white-space;"><br =
 id=3D"lineBreakAtBeginningOfMessage"><br><blockquote =
 type=3D"cite">SYSLOG(3) and rfc5424 similarly state: <br><br>"If the =
 msgfmt contains UTF-8 characters, then it has to start with <br>a Byte =
 Order Mark."<br><br><br>The BOM is unexpected as a prefix for every =
 message logged:<br><br>2025-09-17T01:59:10.205820+01:00 funcube potato - =
 - - &lt;feff&gt;=C3=81rv=C3=ADzt&amp;#369;r&amp;#337; =
 t=C3=BCk=C3=B6rf=C3=BAr=C3=B3g=C3=A9p<br><blockquote =
 type=3D"cite">How-To-Repeat:<br></blockquote>syslogd -o rfc5424 =
 -d<br><br>logger $(printf "\xEF\xBB\xBF%s" =
 "=C3=81rv=C3=ADzt&amp;#369;r&amp;#337; =
 t=C3=BCk=C3=B6rf=C3=BAr=C3=B3g=C3=A9p")<br><br><br><br>tail -n 1 =
 /var/log/messages | xxd<br>00000000: 3230 3235 2d30 392d 3137 5430 313a =
 3539 &nbsp;2025-09-17T01:59<br>00000010: 3a31 302e 3230 3538 3230 2b30 =
 313a 3030 &nbsp;:10.205820+01:00<br>00000020: 2066 756e 6375 6265 2070 =
 6f74 6174 6f20 &nbsp;&nbsp;funcube potato <br>00000030: 2d20 2d20 2d20 =
 efbb bfc3 8172 76c3 ad7a &nbsp;- - - .....rv..z<br>00000040: 74c5 b172 =
 c591 2074 c3bc 6bc3 b672 66c3 &nbsp;t..r.. t..k..rf.<br>00000050: ba72 =
 c3b3 67c3 a970 0a =
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
 bsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;.r..g..p.<br><br></blockquote><br>Why =
 do you say that? The BNF in the RFC says:<div><br><div>&nbsp; &nbsp; =
 &nbsp; MSG &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; =3D MSG-ANY / =
 MSG-UTF8</div><div>&nbsp; &nbsp; &nbsp; MSG-ANY &nbsp; &nbsp; =3D *OCTET =
 ; not starting with BOM</div><div>&nbsp; &nbsp; &nbsp; MSG-UTF8 &nbsp; =3D=
  BOM UTF-8-STRING</div><div>&nbsp; &nbsp; &nbsp; BOM &nbsp; &nbsp; =
 &nbsp; &nbsp; &nbsp; &nbsp; =3D =
 %xEF.BB.BF</div><div><br></div><div><br></div><div>Now in practice =
 according to ChatGPT:</div><div><ul data-start=3D"862" =
 data-end=3D"1092"><li data-start=3D"862" data-end=3D"951"><p =
 data-start=3D"864" data-end=3D"951">Almost all modern syslog =
 implementations&nbsp;do =
 not&nbsp;emit a BOM, even for UTF-8 content.</p></li><li =
 data-start=3D"952" data-end=3D"1028"><p data-start=3D"954" =
 data-end=3D"1028">Many receivers are tolerant and just assume UTF-8 =
 without requiring BOM.</p></li><li data-start=3D"1029" =
 data-end=3D"1092"><p data-start=3D"1031" data-end=3D"1092">Some parsers =
 can actually get confused if a BOM is =
 present.</p></li></ul></div><div>And:</div><div><ul data-start=3D"1129" =
 data-end=3D"1288"><li data-start=3D"1129" data-end=3D"1193"><p =
 data-start=3D"1131" data-end=3D"1193"><strong data-start=3D"1131" =
 data-end=3D"1191">RFC 5424 says the BOM is required if you send UTF-8 =
 MSG.</p></li><li data-start=3D"1194" data-end=3D"1288"><p =
 data-start=3D"1196" data-end=3D"1288"><strong data-start=3D"1196" =
 data-end=3D"1233">In practice, it=E2=80=99s usually skipped, =
 and interoperability tends to be better without it.</p></li></ul><p =
 data-start=3D"1290" data-end=3D"1610">If your tool (<code =
 data-start=3D"1304" data-end=3D"1312">msgfmt) prepends a BOM =
 automatically, you should check the target syslog receiver. If it =
 understands RFC 5424 to the letter, the BOM is technically correct. But =
 if you=E2=80=99re aiming for compatibility with common syslog daemons =
 (rsyslog, syslog-ng, journald forwarders), skipping the BOM is typically =
 safer.</p><p data-start=3D"1290" data-end=3D"1610">Perhaps adding a flag =
 to select the behavior? What should the default be?</p><p =
 data-start=3D"1290" data-end=3D"1610">christos</p></div><blockquote =
 type=3D"cite"><br><blockquote type=3D"cite">Fix:<br></blockquote>Index: =
 ./usr.sbin/syslogd/syslogd.c<br>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
 =3D=3D=3D<br>RCS file: =
 /cvsroot/src/usr.sbin/syslogd/syslogd.c,v<br>retrieving revision =
 1.147<br>diff -u -r1.147 syslogd.c<br>--- ./usr.sbin/syslogd/syslogd.c =
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;9 Nov 2024 16:31:31 -0000 =
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;1.147<br>+++ =
 ./usr.sbin/syslogd/syslogd.c =
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;17 Sep 2025 01:08:30 =
 -0000<br>@@ -1243,6 +1243,7 @@<br> =
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
 bsp;&nbsp;&nbsp;DPRINTF(D_DATA, "UTF-8 BOM\n");<br> =
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
 bsp;&nbsp;&nbsp;utf8allowed =3D true;<br> =
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
 bsp;&nbsp;&nbsp;p +=3D 3;<br>+ =
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&n=
 bsp;&nbsp;start +=3D 3; &nbsp;/* skip BOM in output */<br> =
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;}<br><br> =
 &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;if (*p !=3D '\0' &amp;&amp; =
 !utf8allowed) {<br></blockquote><br></div></body></html>=

 --Apple-Mail=_8D6F4724-65D5-4C18-9E64-8931C7F899E3--

 --Apple-Mail=_E881427E-F680-4FDC-972E-7B2A6D942B37
 Content-Transfer-Encoding: 7bit
 Content-Disposition: attachment;
 	filename=signature.asc
 Content-Type: application/pgp-signature;
 	name=signature.asc
 Content-Description: Message signed with OpenPGP

 -----BEGIN PGP SIGNATURE-----
 Comment: GPGTools - http://gpgtools.org

 iF0EARECAB0WIQS+BJlbqPkO0MDBdsRxESqxbLM7OgUCaM2B5wAKCRBxESqxbLM7
 OvQXAJ9iUiFrauG8L+Ja/OitOkvIlRBy6wCgghpWnOmrsCWaecpLjTUIQHHBxrQ=
 =Uf4G
 -----END PGP SIGNATURE-----

 --Apple-Mail=_E881427E-F680-4FDC-972E-7B2A6D942B37--

From: pr@xn--rvztrtkrfrgp-bbb7j2b8f0b9d7a21oft.com
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: bin/59657: syslogd outputs BOM in the message
Date: Mon, 27 Oct 2025 23:23:28 +0000

 On Sun, Sep 21, 2025 at 09:25:01AM +0000, Christos Zoulas via gnats wrote:
 >  Now in practice according to ChatGPT:
 >  Almost all modern syslog implementations do not emit a BOM, even for =
 >  UTF-8 content.
 >  
 >  Many receivers are tolerant and just assume UTF-8 without requiring BOM.
 >  

 ChatGPT that fount of knowledge.

 I'd call the FreeBSD implementation the opposite of tolerant of UTF-8:

 957 	/*
 958 	 * Removes characters from log messages that are unsafe to display.
 959 	 * TODO: Permit UTF-8 strings that include a BOM per RFC 5424?
 960 	 */
 961 	static void
 962 	parsemsg_remove_unsafe_characters(const char *in, char *out, size_t outlen)

 >  
 >  And:
 >  RFC 5424 says the BOM is required if you send UTF-8 MSG.
 >  

 Yes, that was a terrible idea along with adding an optional BOM to
 UTF-8.

 apropro it's a can of worms:
 https://corp.unicode.org/pipermail/unicode/2020-June/008713.html

 >  
 >  Perhaps adding a flag to select the behavior? What should the default =
 >  be?

 The BOM is the flag, we don't need to show the flag as a prefix for 
 every message. We don't notice the optional UTF-8 BOM "" at the
 beginning of files because editors etc. don't show it.

 Although it has just occured to me I'm only thinking about this
 appearing locally and not being passed onto a remote syslogd. We
 shouldn't strip it in that case (the BOM is now the remote syslogd's
 problem).

 So it'll need to be preserved for F_FORW & F_TLS but not for local 
 likely to be read by a human destinations like F_FILE, F_WALL etc.

 This is actually much messier than I first thought.

 At the end of the day it is just something that can be filtered out by
 whomever is viewing it.

 emacs:
 2025-10-27T23:08:58.611141+00:00 funcube potato - - - dr<U+f8>mst<U+f8>rre
 2025-10-27T23:09:00.741754+00:00 funcube potato - - - _drømstørre
 vim:
 2025-10-27T23:08:58.611141+00:00 funcube potato - - - dr<U+f8>mst<U+f8>rre
 2025-10-27T23:09:00.741754+00:00 funcube potato - - - <feff>drømstørre

 Cheers,
 Ben

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.49 2026/05/14 01:52:41 riastradh Exp $
$NetBSD: gnats_config.sh,v 1.10 2026/05/13 22:00:09 riastradh Exp $
Copyright © 1994-2026 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.