NetBSD Problem Report #46955

From Wolfgang.Stukenbrock@nagler-company.com  Fri Sep 14 08:09:30 2012
Return-Path: <Wolfgang.Stukenbrock@nagler-company.com>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id BB76463B907
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 14 Sep 2012 08:09:29 +0000 (UTC)
Message-Id: <20120914080920.135791E80A9@test-s0.nagler-company.com>
Date: Fri, 14 Sep 2012 10:09:20 +0200 (CEST)
From: Wolfgang.Stukenbrock@nagler-company.com
Reply-To: Wolfgang.Stukenbrock@nagler-company.com
To: gnats-bugs@gnats.NetBSD.org
Subject: process deadlock (tstile) runing amanda sendsize - rename bug in tmpfs ???
X-Send-Pr-Version: 3.95

>Number:         46955
>Category:       kern
>Synopsis:       process deadlock (tstile) runing amanda sendsize - rename bug in tmpfs ???
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Sep 14 08:10:00 +0000 2012
>Closed-Date:    Mon Nov 26 08:00:54 +0000 2018
>Last-Modified:  Mon Nov 26 08:00:54 +0000 2018
>Originator:     Dr. Wolfgang Stukenbrock
>Release:        NetBSD 5.1.2
>Organization:
Dr. Nagler & Company GmbH
>Environment:


System: NetBSD e003 5.1.2 NetBSD 5.1.2 (NSW-E003) #0: Mon Aug 27 10:33:31 CEST 2012  wgstuken@e003:/usr/src/sys/arch/amd64/compile/NSW-E003 amd64
Architecture: x86_64
Machine: amd64
>Description:
	The system freezes over nigth "sometimes" - no login possible, reset required.
	I've added some cron jobs that will report vmstat, ps and df output to syslog that run every30 minutes.
	They report the following:
Sep 11 03:00:00 e003 vmstat:  procs    memory      page                       disks   faults      cpu 
Sep 11 03:00:00 e003 vmstat:  r b w    avm    fre  flt  re  pi   po   fr   sr c0 w0   in   sy  cs us sy id 
Sep 11 03:00:00 e003 vmstat:  0 0 0 2552356 198788  69   0   0    0    0    0  0  3    6  300  60  0  0 100 
Sep 11 03:00:01 e003 vmstat:  0 0 0 2552384 198760  11   0   0    0    0    0  0  1    3 1081  91  0 0 100 
Sep 11 03:00:02 e003 vmstat:  0 0 0 2552384 198760   0   0   0    0    0    0  0  2    4   36  45  0 0 100 
Sep 11 03:01:01 e003 ps:  UID   PID  PPID NLWP WCHAN   STAT     TIME   RSZ    VSZ COMMAND 
Sep 11 03:01:01 e003 ps:    0     0     0   44 -       OKl  37:52.18 19156      0 [system] 
Sep 11 03:01:01 e003 ps:    0     1     0    1 wait    Is    0:00.01  1044   6520 init  
Sep 11 03:01:01 e003 ps:    0   117     1    5 sigwait Isl   1:40.33 77952 109276 /usr/sbin/named -c /var/named/named.conf -4  
Sep 11 03:01:01 e003 ps:    0   165     1    1 kqueue  Ss    0:02.33  1068   6544 /usr/sbin/syslogd -s  
Sep 11 03:01:01 e003 ps:    0   172     1    1 nanoslp Ss    0:06.33  1004   6504 /usr/sbin/ipmon -Dps  
Sep 11 03:01:01 e003 ps:    0   188     1    1 select  Ss    0:00.59  1228   6532 /usr/sbin/rpcbind -l  
Sep 11 03:01:01 e003 ps:    0   201     1    1 select  Ss    0:08.96  1036   5464 /usr/sbin/ypbind  
Sep 11 03:01:01 e003 ps: 1002   297   775    1 select  I     0:00.01  3372  35760 sshd: wgstuken@ttyp0  
Sep 11 03:01:01 e003 ps:    0   311     1    1 pause   Ss    0:13.74  5420  11320 /usr/sbin/ntpd  
Sep 11 03:01:01 e003 ps:    0   313     1    1 select  Is    0:00.00  2080  24808 /usr/sbin/sshd  
Sep 11 03:01:01 e003 ps:    0   380   313    1 netio   Is    0:00.01  4564  35760 sshd: wgstuken [priv]  
Sep 11 03:01:01 e003 ps:    0   407   313    1 netio   Is    0:00.01  4564  35760 sshd: wgstuken [priv]  
Sep 11 03:01:01 e003 ps:   12   459   462    1 kqueue  I     0:01.46  2540  12144 qmgr -l -t unix -u  
Sep 11 03:01:01 e003 ps: 1002   460   407    1 select  I     0:00.00  3372  35760 sshd: wgstuken@ttyp2  
Sep 11 03:01:01 e003 ps:    0   462     1    1 kqueue  Ss    0:02.53  2368  12028 /usr/libexec/postfix/master  
Sep 11 03:01:01 e003 ps:  500   509     1    1 wait    Is    0:00.00   796   4416 /etc/pkg/sbin/watch_server -b -p /var/run/nswSW.pid -c /etc/pkg/watch-server.cfg  
Sep 11 03:01:01 e003 ps:  500   518   509    1 select  I     0:00.06   956   4416 /etc/pkg/sbin/watch_server -b -p /var/run/nswSW.pid -c /etc/pkg/watch-server.cfg  
Sep 11 03:01:01 e003 ps:    0   542     1    1 kqueue  Is    0:00.01  1544   7616 /usr/sbin/inetd -l  
Sep 11 03:01:01 e003 ps:    0   575     1    1 nanoslp Ss    0:01.64  1056   5468 /usr/sbin/cron  
Sep 11 03:01:01 e003 ps:    0   601   313    1 netio   Is    0:00.01  4564  35760 sshd: wgstuken [priv]  
Sep 11 03:01:01 e003 ps: 1002   666   792    1 select  I     0:00.00  3372  35760 sshd: wgstuken@ttyp1  
Sep 11 03:01:01 e003 ps:    0   712     1    1 select  Ss    0:06.41 17604 176944 /usr/pkg/sbin/httpd -k start  
Sep 11 03:01:01 e003 ps:    0   743     1    3 select  Ssl   0:02.15  5360  35056 /usr/pkg/sbin/bacula-fd -c /etc/pkg/bacula/bacula-fd.conf -g bacula  
Sep 11 03:01:01 e003 ps:    0   775   313    1 netio   Is    0:00.02  4508  35760 sshd: wgstuken [priv]  
Sep 11 03:01:01 e003 ps:    0   792   313    1 netio   Is    0:00.01  4564  35760 sshd: wgstuken [priv]  
Sep 11 03:01:01 e003 ps:    0   875   313    1 netio   Is    0:00.01  4564  35760 sshd: wgstuken [priv]  
Sep 11 03:01:01 e003 ps: 1002   894   380    1 select  I     0:00.00  3372  35760 sshd: wgstuken@ttyp5  
Sep 11 03:01:01 e003 ps: 1002  1005   875    1 select  I     0:00.00  3372  35760 sshd: wgstuken@ttyp4  
Sep 11 03:01:01 e003 ps: 1002  1115   601    1 select  I     0:00.00  3372  35760 sshd: wgstuken@ttyp3  
Sep 11 03:01:01 e003 ps:    0  1214   313    1 netio   Is    0:00.01  4588  35760 sshd: wgstuken [priv]  
Sep 11 03:01:01 e003 ps: 1002  1360  1214    1 select  I     0:00.01  3372  35760 sshd: wgstuken@ttyp6  
Sep 11 03:01:01 e003 ps:  802  1440   712    1 semwait I     0:00.73 10000 177968 /usr/pkg/sbin/httpd -k start  
Sep 11 03:01:01 e003 ps:  802  2426   712    1 semwait I     0:01.13  9944 177968 /usr/pkg/sbin/httpd -k start  
Sep 11 03:01:01 e003 ps:  802  2585   712    1 semwait I     0:00.34  9804 177968 /usr/pkg/sbin/httpd -k start  
Sep 11 03:01:01 e003 ps:  802  2607   712    1 socket  I     0:00.00  4440 115164 /usr/pkg/sbin/httpd -k start  
Sep 11 03:01:01 e003 ps:  802  3683   712    1 semwait I     0:01.00  9920 177968 /usr/pkg/sbin/httpd -k start  
Sep 11 03:01:01 e003 ps:  802  4193   712    1 semwait I     0:00.88 10048 177968 /usr/pkg/sbin/httpd -k start  
Sep 11 03:01:01 e003 ps:  802  6004   712    1 semwait I     0:00.90  9968 177968 /usr/pkg/sbin/httpd -k start  
Sep 11 03:01:01 e003 ps:  802  7014   712    1 semwait I     0:01.02 10004 177968 /usr/pkg/sbin/httpd -k start  
Sep 11 03:01:01 e003 ps:  802  7061   712    1 semwait I     0:01.00  9948 177968 /usr/pkg/sbin/httpd -k start  
Sep 11 03:01:01 e003 ps:  802  7202   712    1 semwait I     0:01.11  9940 177968 /usr/pkg/sbin/httpd -k start  
Sep 11 03:01:01 e003 ps:    0  7543 15040    1 tstile  D     0:00.03  1424  10132 runtar NSW-backup /usr/pkg/bin/gtar --create --file /dev/null --directory /var/log --one-file-system --listed-incremental /var/amanda/gnutar-lists/e003_var_log_0.new --sparse --ignore-failed-read --totals --exclude-from /tmp/amanda/sendsize._var_log.20120910233003.exclude .  
Sep 11 03:01:01 e003 ps:  802  7768   712    1 semwait I     0:00.98 10024 177968 /usr/pkg/sbin/httpd -k start  
Sep 11 03:01:01 e003 ps:  802  7929   712    1 semwait I     0:00.41  9960 177968 /usr/pkg/sbin/httpd -k start  
Sep 11 03:01:01 e003 ps:    0 10744  7543    1 -       ZW    0:00.00     0      0 (sh) 
Sep 11 03:01:01 e003 ps:  802 11307   712    1 semwait I     0:00.41  9924 177968 /usr/pkg/sbin/httpd -k start  
Sep 11 03:01:01 e003 ps:  800 11876 26376    1 wait    I     0:00.00  1576  10160 /usr/pkg/libexec/sendsize amandad bsdtcp  
Sep 11 03:01:01 e003 ps:  802 11955   712    1 kqueue  S     0:00.28  9936 177968 /usr/pkg/sbin/httpd -k start  
Sep 11 03:01:01 e003 ps:    0 12002   575    1 piperd  S     0:00.00  1200   5468 cron: running job  
Sep 11 03:01:01 e003 ps:  802 13211   712    1 semwait I     0:00.95  9944 177968 /usr/pkg/sbin/httpd -k start  
Sep 11 03:01:01 e003 ps:  800 13326 11876    1 piperd  I     0:00.00  1364  10192 /usr/pkg/libexec/sendsize amandad bsdtcp  
Sep 11 03:01:01 e003 ps:    0 13686 15453    1 -       ZW    0:00.00     0      0 (sh) 
Sep 11 03:01:01 e003 ps:  802 14111   712    1 semwait I     0:01.02 10000 177968 /usr/pkg/sbin/httpd -k start  
Sep 11 03:01:01 e003 ps:    0 15029 25894    1 -       ZW    0:00.00     0      0 (sh) 
Sep 11 03:01:01 e003 ps:  800 15040 11876    1 piperd  I     0:00.00  1364  10192 /usr/pkg/libexec/sendsize amandad bsdtcp  
Sep 11 03:01:01 e003 ps:  802 15174   712    1 semwait I     0:01.07  9992 177968 /usr/pkg/sbin/httpd -k start  
Sep 11 03:01:01 e003 ps:    0 15453 19855    1 tstile  D     0:00.03  1424  10132 runtar NSW-backup /usr/pkg/bin/gtar --create --file /dev/null --directory / --one-file-system --listed-incremental /var/amanda/gnutar-lists/e003__0.new --sparse --ignore-failed-read --totals --exclude-from /tmp/amanda/sendsize._.20120910233003.exclude .  
Sep 11 03:01:01 e003 ps:  802 15908   712    1 semwait I     0:00.89  9920 177968 /usr/pkg/sbin/httpd -k start  
Sep 11 03:01:01 e003 ps:  500 16327 12002    1 wait    Ss    0:00.00  1140   6664 /bin/sh -c /bin/ps -axwwo "uid pid ppid nlwp wchan state time rsz vsz command" | logger -t ps -p local1.info  
Sep 11 03:01:01 e003 ps:  802 16924   712    1 semwait I     0:00.97  9972 177968 /usr/pkg/sbin/httpd -k start  
Sep 11 03:01:01 e003 ps:  802 19410   712    1 semwait I     0:00.36  9928 177968 /usr/pkg/sbin/httpd -k start  
Sep 11 03:01:01 e003 ps:  800 19855 11876    1 piperd  I     0:00.00  1352  10192 /usr/pkg/libexec/sendsize amandad bsdtcp  
Sep 11 03:01:01 e003 ps:   12 21041   462    1 kqueue  S     0:00.00  2432  12084 pickup -l -t fifo -u  
Sep 11 03:01:01 e003 ps:  500 25050 16327    1 piperd  S     0:00.00   804   4328 logger -t ps -p local1.info  
Sep 11 03:01:01 e003 ps:    0 25894 13326    1 tstile  D     0:00.00  1420  10132 runtar NSW-backup /usr/pkg/bin/gtar --create --file /dev/null --directory /home.stand --one-file-system --listed-incremental /var/amanda/gnutar-lists/e003_home.stand_0.new --sparse --ignore-failed-read --totals --exclude-from /tmp/amanda/sendsize._home.stand.20120910233003.exclude .  
Sep 11 03:01:01 e003 ps:  800 26376   542    1 select  I     0:00.00  1588   9156 amandad bsdtcp amdump amindexd amidxtaped  
Sep 11 03:01:01 e003 ps:  802 28274   712    1 semwait I     0:00.91 10068 177968 /usr/pkg/sbin/httpd -k start  
Sep 11 03:01:01 e003 ps:  500 28676 16327    1 -       O     0:00.00   932   6552 /bin/ps -axwwo uid pid ppid nlwp wchan state time rsz vsz command  
Sep 11 03:01:01 e003 ps:    0   404   530    1 pause   I     0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps:    0   530   656    1 pause   I     0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps:    0   558   874    1 pause   I     0:00.01  1204   4508 -csh -m  
Sep 11 03:01:01 e003 ps:    0   656   558    1 pause   I     0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps:    0   662   404    1 ttyraw  I+    0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps: 1002   874   297    1 pause   Is    0:00.00  1232   4508 -csh  
Sep 11 03:01:01 e003 ps:    0   268   872    1 pause   I     0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps:    0   591   268    1 pause   I     0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps:    0   607   891    1 pause   I     0:00.01  1204   4508 -csh -m  
Sep 11 03:01:01 e003 ps:    0   851   591    1 ttyraw  I+    0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps:    0   872   607    1 pause   I     0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps: 1002   891   666    1 pause   Is    0:00.00  1232   4508 -csh  
Sep 11 03:01:01 e003 ps:    0   469   915    1 pause   I     0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps:    0   599   469    1 ttyraw  I+    0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps:    0   849  1037    1 pause   I     0:00.01  1204   4508 -csh -m  
Sep 11 03:01:01 e003 ps:    0   915   849    1 pause   I     0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps: 1002  1037   460    1 pause   Is    0:00.00  1232   4508 -csh  
Sep 11 03:01:01 e003 ps:    0   161   476    1 pause   I     0:00.01  1204   4508 -csh -m  
Sep 11 03:01:01 e003 ps:    0   357   803    1 pause   I     0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps: 1002   476  1115    1 pause   Is    0:00.00  1232   4508 -csh  
Sep 11 03:01:01 e003 ps:    0   489   807    1 ttyraw  I+    0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps:    0   803   161    1 pause   I     0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps:    0   807   357    1 pause   I     0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps:    0   436   562    1 pause   I     0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps:    0   562   750    1 pause   I     0:00.01  1204   4508 -csh -m  
Sep 11 03:01:01 e003 ps:    0   694   436    1 pause   I     0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps:    0   698   824    1 ttyraw  I+    0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps: 1002   750  1005    1 pause   Is    0:00.00  1232   4508 -csh  
Sep 11 03:01:01 e003 ps:    0   824   694    1 pause   I     0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps:    0   321   767    1 pause   I     0:00.01  1204   4508 -csh -m  
Sep 11 03:01:01 e003 ps:    0   393  1101    1 pause   I     0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps: 1002   767   894    1 pause   Is    0:00.00  1232   4508 -csh  
Sep 11 03:01:01 e003 ps:    0   785   321    1 pause   I     0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps:    0   836   393    1 ttyraw  I+    0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps:    0  1101   785    1 pause   I     0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps: 1002   706  1360    1 pause   Is    0:00.00  1232   4508 -csh  
Sep 11 03:01:01 e003 ps:    0  1181   706    1 pause   I     0:00.01  1204   4508 -csh -m  
Sep 11 03:01:01 e003 ps:    0  1200  1181    1 pause   I     0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps:    0  1448  1450    1 ttyraw  I+    0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps:    0  1450  1200    1 pause   I     0:00.00  1204   4508 -sh (csh) 
Sep 11 03:01:01 e003 ps:    0   497     1    1 wait    I     0:00.00  1180   6664 /bin/sh /usr/pkg/bin/mysqld_safe --user=mysql --datadir=/var/mysql --pid-file=/var/mysql/e003.pid --skip-thread-priority  
Sep 11 03:01:01 e003 ps:  801   755   497   10 select  Il    2:57.45 20124  82068 /usr/pkg/libexec/mysqld --basedir=/usr/pkg --datadir=/var/mysql --user=mysql --skip-thread-priority --log-error=/var/mysql/e003.err --pid-file=/var/mysql/e003.pid  
Sep 11 03:01:01 e003 ps:    0   778     1    1 ttyraw  Is+   0:00.00  1136   6500 /usr/libexec/getty std.38400 console  
Sep 11 03:02:00 e003 df: Filesystem   1K-blocks       Used      Avail %Cap Mounted on 
Sep 11 03:02:00 e003 df: /dev/raid0a       96247      46238      45197  50% / 
Sep 11 03:02:00 e003 df: /dev/raid0f    16264494      59196   15392074   0% /var 
Sep 11 03:02:00 e003 df: /dev/raid0e     8132206    5899618    1825978  76% /usr 
Sep 11 03:02:00 e003 df: /dev/raid0g    30497948        164   28972888   0% /nc-www 
Sep 11 03:02:00 e003 df: /dev/raid0h    18737934         58   17800980   0% /home.stand 
Sep 11 03:02:00 e003 df: procfs                4          4          0 100% /proc 
Sep 11 03:02:00 e003 df: kernfs                1          1          0 100% /kern 
Sep 11 03:02:00 e003 df: tmpfs           1048576        736    1047840   0% /tmp 

	Next output is at Sep 12 09:00:01 from vmstat.
	The backup is started between 23:00 and 23:30 - the listing from 23:00 does not show 
	the amanda process, At 23:30 the state shown above is already reached.
	At 03:15 the daily cronjob is started and I think that this one will finaly lock up everything.

	The runtar command is a binary from amanda software.
	It will finaly execve to the gnutar binary and this has still not happen.
	In front of that it uses popen("<gnutar> --version 2>&1", "r") to determine
	the gnutar binary version.
	It does a fgets from the stream and than simply forget FILE* - no pclose() done on it - this looks
	like a bug in amanda-client software ... (I've send a bug report for too.)
	It then does a rename() of the debug-output file - so the rename-bug may also be relevant.
	The rename() happens in /tmp - this is tmpfs.

	The rename-bug-fix from 5.1_STABLE has been integrated into this kernel.
	updated files:
	  src/sys/ufs/lfs/lfs_vnops.c
	  src/usr.bin/pmap/pmap.h
	  src/sys/ufs/ufs/inode.h
	  src/sys/ufs/ufs/ufs_extern.h
	  src/sys/ufs/ufs/ufs_lookup.c
	  src/sys/ufs/ufs/ufs_vnops.c
	  src/sys/ufs/ufs/ufs_wapbl.c

	So either it does not fix the problem for 5.1.2 or there is another bug related popen() if
	there is no pclose().
	The zombi processes are there due to the fact that the call to pclose() is missing.
	I tend to the assumption, that the rename fix for 5.1_STABLE does not fix the problem in 5.1.2.
	(or perhaps does not fix the problem at all ...)
	Is it possible that the rename bug is still present in tmpfs?
	No patch has been applied to any tmpfs-related files.

	Accedently the debug files of amanda are in /tmp and are lost after reboot, so I don't know the
	famaous last debug-words of runtar.

>How-To-Repeat:
	Setup amanda client as backup system, try to backup some filesystems and wait.
	It will happen not all the time, but - at least for us - once a month.
>Fix:
	Not known till now. Event not realy known who to go on with further debugging.

>Release-Note:

>Audit-Trail:
From: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/46955: process deadlock (tstile) runing amanda sendsize - rename bug in tmpfs ???
Date: Fri, 14 Sep 2012 10:37:42 +0200

 Hi again,

 I've forgot to look into the Bug-database prior sending this one - sorry.

 In -current there seems to be a patch for the rename problem that I've 
 still not integrated in my 5.1.2. (PR 36681).

 I'm gooing to do this as next step.

 dholland stated that this fix will solve the rename problem in tmpfs.
 So I think it will solve our problem here too.

 The patch for PR 36681 seems to be missing in 5.1_STABLE too - at least 
 the line count of the source from 5.1.2 and 5.1_STABLE are both 1490 and 
 the patch adds around 1000 lines according to the cvs-web information to 
 the diff.
 It would be great to have a pullup of it to 5.x.

 best regards

 W. Stukenbrock

 gnats-admin@NetBSD.org wrote:

 > Thank you very much for your problem report.
 > It has the internal identification `kern/46955'.
 > The individual assigned to look at your
 > report is: kern-bug-people. 
 > 
 > 
 >>Category:       kern
 >>Responsible:    kern-bug-people
 >>Synopsis:       process deadlock (tstile) runing amanda sendsize - rename bug in tmpfs ???
 >>Arrival-Date:   Fri Sep 14 08:10:00 +0000 2012
 >>
 > 
 > 


From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/46955: process deadlock (tstile) runing amanda sendsize -
 rename bug in tmpfs ???
Date: Sat, 15 Sep 2012 16:57:53 +0000

 On Fri, Sep 14, 2012 at 08:40:04AM +0000, Wolfgang Stukenbrock wrote:
  >  I've forgot to look into the Bug-database prior sending this one - sorry.
  >  
  >  In -current there seems to be a patch for the rename problem that I've 
  >  still not integrated in my 5.1.2. (PR 36681).
  >  
  >  I'm gooing to do this as next step.

 That is probably a good idea, but the available patches only fix ffs.

  >  dholland stated that this fix will solve the rename problem in tmpfs.
  >  So I think it will solve our problem here too.

 I don't think there's a netbsd-5 version of the tmpfs rename patches,
 but I could have missed it going by.

 (Unfortunately, each filesystem's rename needs to be fixed separately,
 and they have historically been *all* wrong.)

 -- 
 David A. Holland
 dholland@netbsd.org

From: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/46955: process deadlock (tstile) runing amanda sendsize - rename bug in tmpfs ???
Date: Mon, 17 Sep 2012 11:01:37 +0200

 Hi,

 it looks like that I've a working version for 5.1_STABLE now, but the 
 integration I've done is (very) ugly ...

 - I took the whole tmpfs code from current.
 - Need to modify two other makefiles (rump and ...) because on file in 
 tmpfs is gone and another (tmpfs_rename.c) came up.
 - need to "import" genfs_rename.c, because most of the implementation is 
 now there. (Done by a copy to the tmpfs-folder and an include statement 
   in the tmpfs source file.)
 - borrow some other new routines from genfs, because they are needed 
 now. (simply copied into the tmpfs sources in order to avoid changes in 
 other parts)
 - add a second param to VOP_UNLOCK() - added 0 as it was in the old code
 - need to add the INTERLOCK flag to the call of vget() - call semantic 
 has changed in -current.
 - revert KERN_NAME_MAX change by adding a define to NAME_MAX instead of 
 the assert "MAXNAMELEN == NAME_MAX"
 - borrow some new routines from kern_auth.c by copying into the tmpfs 
 sources ...

 And here is the main problem with my "fast-and-dirty" integration:
 It looks like nearly everything changed in kern_auth.c and some new 
 listener lists have been added that are not present in 5.1.x. I think 
 they cannot be simply used because the other code does not know then. To 
 add this knowledge everything kauth-related must be changed too in order 
 to work correctly - to much work ...
 So I've took the main auth-methods from the old implemantation useing 
 the "gereric" auth routine as done in 5.1.x , but I additionaly needed 
 to allow root (uid == 0) explicitly to allow chown calls to get the 
 install-root filesystem of the install-kernels up correctly.
 (Does this work in -current ???? Is there a listener that returns ALLOW 
 if root is trying to chown something? Or has the createion of the device 
 nodes in the install-kernels has been changed?)
 The auth-part is still not realy tested, but it seems to work as before 
 till now.

 If you are interested in this work, let me know, I will send it.

 But the auth-integration should be reworked again!
 Any hints how to revert the new-stuff into the old 5.1.x semantics 
 correctly?
 The "problem" is the auth-vnode part and all related definitions.


 W. Stukenbrock


 David Holland wrote:

 > The following reply was made to PR kern/46955; it has been noted by GNATS.
 > 
 > From: David Holland <dholland-bugs@netbsd.org>
 > To: gnats-bugs@NetBSD.org
 > Cc: 
 > Subject: Re: kern/46955: process deadlock (tstile) runing amanda sendsize -
 >  rename bug in tmpfs ???
 > Date: Sat, 15 Sep 2012 16:57:53 +0000
 > 
 >  On Fri, Sep 14, 2012 at 08:40:04AM +0000, Wolfgang Stukenbrock wrote:
 >   >  I've forgot to look into the Bug-database prior sending this one - sorry.
 >   >  
 >   >  In -current there seems to be a patch for the rename problem that I've 
 >   >  still not integrated in my 5.1.2. (PR 36681).
 >   >  
 >   >  I'm gooing to do this as next step.
 >  
 >  That is probably a good idea, but the available patches only fix ffs.
 >  
 >   >  dholland stated that this fix will solve the rename problem in tmpfs.
 >   >  So I think it will solve our problem here too.
 >  
 >  I don't think there's a netbsd-5 version of the tmpfs rename patches,
 >  but I could have missed it going by.
 >  
 >  (Unfortunately, each filesystem's rename needs to be fixed separately,
 >  and they have historically been *all* wrong.)
 >  
 >  -- 
 >  David A. Holland
 >  dholland@netbsd.org
 >  
 > 
 > 


 -- 


 Dr. Nagler & Company GmbH
 Hauptstraße 9
 92253 Schnaittenbach

 Tel. +49 9622/71 97-42
 Fax +49 9622/71 97-50

 Wolfgang.Stukenbrock@nagler-company.com
 http://www.nagler-company.com


 Hauptsitz: Schnaittenbach
 Handelregister: Amberg HRB
 Gerichtsstand: Amberg
 Steuernummer: 201/118/51825
 USt.-ID-Nummer: DE 273143997
 Geschäftsführer: Dr. Martin Nagler, Prof. Dr. Dr. Karl-Kuno Kunze


State-Changed-From-To: open->feedback
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Thu, 05 Dec 2013 15:49:25 +0000
State-Changed-Why:
rename() issues were fixed in -current.  I am not sure if anybody is
working on backporting them to netbsd-6 (or netbsd-5). 

Volunteers welcome :)


From: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/46955: process deadlock (tstile) runing amanda sendsize - rename bug in tmpfs ???
Date: Mon, 20 Jan 2014 11:59:45 +0100

 Hi, I'm getting a reminder for this PR from time to time.
 But I don't know what I should do with it ...

 The last state (known to me) is, that there is a fix in -current and 
 there is no plan to port it to 6.x or 5.1.x.

 So this PR should either be set back to open or should be closed 
 (because there will be no patch/fix for 5.1.x at any time in the future).

 best regards

 W. Stukenbrock



State-Changed-From-To: feedback->open
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 26 Apr 2014 05:46:03 +0000
State-Changed-Why:
I think rmind was suggesting you should finish your patches to merge the
fix into -5 :-)
but I'll set it so it stops nagging you.


From: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
        dholland@NetBSD.org
Subject: Re: kern/46955 (process deadlock (tstile) runing amanda sendsize - rename bug in tmpfs ???)
Date: Sun, 27 Apr 2014 10:37:06 +0200

 Thank you,

 I've had a try with tmpfs changes into 5.x, and it seems to fix most 
 issues, but accedently not all - I've had one dead lock in tmpfs again 
 some time ago.
 Due to my limited time and the massive changes in the fs-code and the 
 way process notifications takes place between 5.x and 6.x, I've stopped 
 working on it and started migration of all of our systems to 6.x. 
 (Currently suspended due to the dead-lock-problem described in PR 48733.)
 My ufs patch to 5.x seems to be stable (of cause, it is ugly ...), but I 
 think I've already send that one long time ago.

 best regards

 W. Stukenbrock

 dholland@NetBSD.org wrote:

 > Synopsis: process deadlock (tstile) runing amanda sendsize - rename bug in tmpfs ???
 > 
 > State-Changed-From-To: feedback->open
 > State-Changed-By: dholland@NetBSD.org
 > State-Changed-When: Sat, 26 Apr 2014 05:46:03 +0000
 > State-Changed-Why:
 > I think rmind was suggesting you should finish your patches to merge the
 > fix into -5 :-)
 > but I'll set it so it stops nagging you.
 > 
 > 
 > 
 > 



From: David Holland <dholland@netbsd.org>
To: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/46955 (process deadlock (tstile) runing amanda sendsize -
 rename bug in tmpfs ???)
Date: Sun, 27 Apr 2014 21:11:45 +0000

 On Sun, Apr 27, 2014 at 10:37:06AM +0200, Wolfgang Stukenbrock wrote:
  > I've had a try with tmpfs changes into 5.x, and it seems to fix most
  > issues, but accedently not all - I've had one dead lock in tmpfs again some
  > time ago.

 There may be other issues. There were a bunch of stability fixes to
 tmpfs last fall; I thought many of them had gotten into 6.x but I'm
 told not. (rmind@ says he doesn't have time but can help out...)

 However, none of them AFAIK have been merged into 5.x, so there are a
 bunch of things that could have bitten you.

  > Due to my limited time and the massive changes in the fs-code and the way
  > process notifications takes place between 5.x and 6.x, I've stopped working
  > on it and started migration of all of our systems to 6.x. (Currently
  > suspended due to the dead-lock-problem described in PR 48733.)

 That seems reasonable. We are hoping to get 7.x branched any month
 now; it's definitely getting to the point where working on 5.x isn't
 all that worthwhile.

  > My ufs patch to 5.x seems to be stable (of cause, it is ugly ...), but I
  > think I've already send that one long time ago.

 Yup.

 -- 
 David A. Holland
 dholland@netbsd.org

State-Changed-From-To: open->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 26 Nov 2018 08:00:54 +0000
State-Changed-Why:
-6 is now EOL so everything in here is fixed.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.