NetBSD Problem Report #58306

From www@netbsd.org  Tue Jun  4 14:55:30 2024
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)
	 key-exchange X25519 server-signature RSA-PSS (2048 bits)
	 client-signature RSA-PSS (2048 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id E9C8B1A9238
	for <gnats-bugs@gnats.NetBSD.org>; Tue,  4 Jun 2024 14:55:29 +0000 (UTC)
Message-Id: <20240604145528.BC7FE1A923A@mollari.NetBSD.org>
Date: Tue,  4 Jun 2024 14:55:28 +0000 (UTC)
From: matyalatte@gmail.com
Reply-To: matyalatte@gmail.com
To: gnats-bugs@NetBSD.org
Subject: procfs does not remove dot segments from executable paths
X-Send-Pr-Version: www-1.0

>Number:         58306
>Category:       kern
>Synopsis:       procfs does not remove dot segments from executable paths
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Tue Jun 04 15:00:00 +0000 2024
>Closed-Date:    
>Last-Modified:  Wed Jun 05 12:31:13 +0000 2024
>Originator:     matyalatte
>Release:        10.0
>Organization:
>Environment:
NetBSD  10.0 NetBSD 10.0 (GENERIC) #0: Thu Mar 28 08:33:33 UTC 2024 mkrepro@mkrepro.NetBSD.org:/usr/src/sys/arch/amd64/compile/GENERIC amd64
>Description:
Hi, I'm making cross-platform utilities to get environment information.
https://github.com/matyalatte/c-env-utils

It uses readlink("/proc/curproc/exe", path, PATH_MAX) to get an executable path on NetBSD.
Then, I noticed the function does not remove dot segments when running the binary with a relative path like "./myexe."

I think the exe path should be normalized.
>How-To-Repeat:
1. Compile the following code.

```
// myexe.c
#include <unistd.h>
#include <limits.h>
#include <stdio.h>

int main(void) {
    char path[PATH_MAX];
    readlink("/proc/curproc/exe", path, PATH_MAX);
    printf("exe path: %s", path);
}
```

2. Run it with relative paths

```
$ ./myexe
exe path: /home/myname/myrepo/build/./myexe
$ ../build/myexe
exe path: /home/myname/myrepo/build/../build/myexe
$ ../build/./myexe
exe path: /home/myname/myrepo/build/../build/./myexe
```

3. What I expected

```
$ ./myexe
exe path: /home/myname/myrepo/build/myexe
$ ../build/myexe
exe path: /home/myname/myrepo/build/myexe
$ ../build/./myexe
exe path: /home/myname/myrepo/build/myexe
```
>Fix:

>Release-Note:

>Audit-Trail:

State-Changed-From-To: open->feedback
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Tue, 04 Jun 2024 22:14:34 +0000
State-Changed-Why:
feedback requested


From: Taylor R Campbell <riastradh@NetBSD.org>
To: matyalatte@gmail.com
Cc: gnats-bugs@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/58306: procfs does not remove dot segments from executable paths
Date: Tue, 4 Jun 2024 22:11:15 +0000

 > From: matyalatte@gmail.com
 > Date: Tue, 4 Jun 2024 15:00:00 +0000
 >=20
 > It uses readlink("/proc/curproc/exe", path, PATH_MAX) to get an executabl=
 e path on NetBSD.
 > Then, I noticed the function does not remove dot segments when running th=
 e binary with a relative path like "./myexe."
 >=20
 > I think the exe path should be normalized.

 Why should the path be normalized like this?  Why can't you just use
 realpath(3) on the result if you want it to be normalized?  Do other
 kernels normalize it in procfs itself, and conceal the actual pathname
 that was passed to execve?  Are there important applications that rely
 on normalization?

 The Linux man page doesn't say anything about normalization:

 https://www.man7.org/linux/man-pages/man5/proc.5.html

        /proc/pid/exe
               Under Linux 2.2 and later, this file is a symbolic link
               containing the actual pathname of the executed command.
               This symbolic link can be dereferenced normally;
               attempting to open it will open the executable.  You can
               even type /proc/pid/exe to run another copy of the same
               executable that is being run by process pid.  If the
               pathname has been unlinked, the symbolic link will contain
               the string ' (deleted)' appended to the original pathname.
               In a multithreaded process, the contents of this symbolic
               link are not available if the main thread has already
               terminated (typically by calling pthread_exit(3)).

               Permission to dereference or read (readlink(2)) this
               symbolic link is governed by a ptrace access mode
               PTRACE_MODE_READ_FSCREDS check; see ptrace(2).

               Under Linux 2.0 and earlier, /proc/pid/exe is a pointer to
               the binary which was executed, and appears as a symbolic
               link.  A readlink(2) call on this file under Linux 2.0
               returns a string in the format:

                   [device]:inode

               For example, [0301]:1502 would be inode 1502 on device
               major 03 (IDE, MFM, etc. drives) minor 01 (first partition
               on the first drive).

               find(1) with the -inum option can be used to locate the
               file.

 The Linux kernel documentation on file systems, which may be more
 authoritative, doesn't say anything about normalization either:

 https://www.kernel.org/doc/html/latest/filesystems/api-summary.html#the-pro=
 c-filesystem

    Table 1-1: Process specific entries in /proc

    File         Content
    ...
    exe          Link to the executable of this process

 Normalization, of course, can change over time, depending on directory
 content and file system mounts.

From: =?UTF-8?B?44G+44Gh44KD44Op44OG?= <matyalatte@gmail.com>
To: gnats-bugs@netbsd.org
Cc: riastradh@netbsd.org
Subject: Re: kern/58306: procfs does not remove dot segments from executablepaths
Date: Wed, 5 Jun 2024 21:26:31 +0900

 --000000000000c70481061a23ade4
 Content-Type: text/plain; charset="UTF-8"

 Thanks for the reply.

 > Why should the path be normalized like this?
 Because other unix variants will do it.
 I've tested on Ubuntu 20.04, FreeBSD 14.0 (with enabled procfs), and
 OpenIndiana 2023.10.
 They returned resolved exe paths from readlink() even if I used symlinks.
 like this
 ```
 $ cd /home/myname/myrepo
 $ ln -s mylink ./build/myexe
 $ ./build/../mylink
 exe path: /home/myname/myrepo/build/myexe
 ```

 > The Linux man page doesn't say anything about normalization
 I think "the actual pathname" in the doc means the normalized path. (I'm
 not sure tho.)

 Well, I just thought it could be a bug because it worked in a different way
 from other unix variants.
 Idk if there is a rational reason for procfs to normalize paths, or if it's
 just a matter of preference.
 It's ok to close this PR if you say it's an expected behavior.

 --000000000000c70481061a23ade4
 Content-Type: text/html; charset="UTF-8"
 Content-Transfer-Encoding: quoted-printable

 <div dir=3D"ltr"><div>Thanks for the reply.</div><div><br></div>&gt;=C2=A0<=
 span style=3D"color:rgb(0,0,0)">Why should the path be normalized like this=
 ?</span><div><span style=3D"color:rgb(0,0,0)">Because other unix variants w=
 ill do it.</span></div><div><span style=3D"color:rgb(0,0,0)">I&#39;ve teste=
 d on Ubuntu 20.04, FreeBSD 14.0 (with enabled procfs), and OpenIndiana 2023=
 .10.</span></div><div><span style=3D"color:rgb(0,0,0)">They returned resolv=
 ed exe paths from readlink() even if I used symlinks.</span></div><div><spa=
 n style=3D"color:rgb(0,0,0)">like this</span></div><div><span style=3D"colo=
 r:rgb(0,0,0)">```</span></div><div><span style=3D"color:rgb(0,0,0)">$ cd=C2=
 =A0</span><span style=3D"color:rgb(0,0,0)">/home/myname/myrepo</span></div>=
 <div><span style=3D"color:rgb(0,0,0)">$ ln -s mylink ./build/myexe</span></=
 div><div><span style=3D"color:rgb(0,0,0)">$ ./build/../mylink</span></div><=
 div><font color=3D"#000000">exe path: /home/myname/myrepo/build/myexe</font=
 ></div><div><span style=3D"color:rgb(0,0,0)">```</span></div><div><br></div=
 ><div><span style=3D"color:rgb(0,0,0)">&gt; The Linux man page doesn&#39;t =
 say anything about normalization</span></div><div>I think &quot;the actual =
 pathname&quot; in the doc means the normalized path. (I&#39;m not sure tho.=
 )</div><div><br></div><div>Well, I just thought it could be a bug because i=
 t worked in a different=C2=A0way from other=C2=A0unix=C2=A0variants.</div><=
 div>Idk if there is a rational reason for procfs to normalize paths, or if =
 it&#39;s just a matter of preference.<br></div><div>It&#39;s ok to close th=
 is PR if you say it&#39;s an expected behavior.</div></div>

 --000000000000c70481061a23ade4--

State-Changed-From-To: feedback->open
State-Changed-By: riastradh@NetBSD.org
State-Changed-When: Wed, 05 Jun 2024 12:31:13 +0000
State-Changed-Why:
feedback received


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.47 2022/09/11 19:34:41 kim Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2024 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.