NetBSD Problem Report #46955
From Wolfgang.Stukenbrock@nagler-company.com Fri Sep 14 08:09:30 2012
Return-Path: <Wolfgang.Stukenbrock@nagler-company.com>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
by www.NetBSD.org (Postfix) with ESMTP id BB76463B907
for <gnats-bugs@gnats.NetBSD.org>; Fri, 14 Sep 2012 08:09:29 +0000 (UTC)
Message-Id: <20120914080920.135791E80A9@test-s0.nagler-company.com>
Date: Fri, 14 Sep 2012 10:09:20 +0200 (CEST)
From: Wolfgang.Stukenbrock@nagler-company.com
Reply-To: Wolfgang.Stukenbrock@nagler-company.com
To: gnats-bugs@gnats.NetBSD.org
Subject: process deadlock (tstile) runing amanda sendsize - rename bug in tmpfs ???
X-Send-Pr-Version: 3.95
>Number: 46955
>Category: kern
>Synopsis: process deadlock (tstile) runing amanda sendsize - rename bug in tmpfs ???
>Confidential: no
>Severity: serious
>Priority: high
>Responsible: kern-bug-people
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Fri Sep 14 08:10:00 +0000 2012
>Closed-Date: Mon Nov 26 08:00:54 +0000 2018
>Last-Modified: Mon Nov 26 08:00:54 +0000 2018
>Originator: Dr. Wolfgang Stukenbrock
>Release: NetBSD 5.1.2
>Organization:
Dr. Nagler & Company GmbH
>Environment:
System: NetBSD e003 5.1.2 NetBSD 5.1.2 (NSW-E003) #0: Mon Aug 27 10:33:31 CEST 2012 wgstuken@e003:/usr/src/sys/arch/amd64/compile/NSW-E003 amd64
Architecture: x86_64
Machine: amd64
>Description:
The system freezes over nigth "sometimes" - no login possible, reset required.
I've added some cron jobs that will report vmstat, ps and df output to syslog that run every30 minutes.
They report the following:
Sep 11 03:00:00 e003 vmstat: procs memory page disks faults cpu
Sep 11 03:00:00 e003 vmstat: r b w avm fre flt re pi po fr sr c0 w0 in sy cs us sy id
Sep 11 03:00:00 e003 vmstat: 0 0 0 2552356 198788 69 0 0 0 0 0 0 3 6 300 60 0 0 100
Sep 11 03:00:01 e003 vmstat: 0 0 0 2552384 198760 11 0 0 0 0 0 0 1 3 1081 91 0 0 100
Sep 11 03:00:02 e003 vmstat: 0 0 0 2552384 198760 0 0 0 0 0 0 0 2 4 36 45 0 0 100
Sep 11 03:01:01 e003 ps: UID PID PPID NLWP WCHAN STAT TIME RSZ VSZ COMMAND
Sep 11 03:01:01 e003 ps: 0 0 0 44 - OKl 37:52.18 19156 0 [system]
Sep 11 03:01:01 e003 ps: 0 1 0 1 wait Is 0:00.01 1044 6520 init
Sep 11 03:01:01 e003 ps: 0 117 1 5 sigwait Isl 1:40.33 77952 109276 /usr/sbin/named -c /var/named/named.conf -4
Sep 11 03:01:01 e003 ps: 0 165 1 1 kqueue Ss 0:02.33 1068 6544 /usr/sbin/syslogd -s
Sep 11 03:01:01 e003 ps: 0 172 1 1 nanoslp Ss 0:06.33 1004 6504 /usr/sbin/ipmon -Dps
Sep 11 03:01:01 e003 ps: 0 188 1 1 select Ss 0:00.59 1228 6532 /usr/sbin/rpcbind -l
Sep 11 03:01:01 e003 ps: 0 201 1 1 select Ss 0:08.96 1036 5464 /usr/sbin/ypbind
Sep 11 03:01:01 e003 ps: 1002 297 775 1 select I 0:00.01 3372 35760 sshd: wgstuken@ttyp0
Sep 11 03:01:01 e003 ps: 0 311 1 1 pause Ss 0:13.74 5420 11320 /usr/sbin/ntpd
Sep 11 03:01:01 e003 ps: 0 313 1 1 select Is 0:00.00 2080 24808 /usr/sbin/sshd
Sep 11 03:01:01 e003 ps: 0 380 313 1 netio Is 0:00.01 4564 35760 sshd: wgstuken [priv]
Sep 11 03:01:01 e003 ps: 0 407 313 1 netio Is 0:00.01 4564 35760 sshd: wgstuken [priv]
Sep 11 03:01:01 e003 ps: 12 459 462 1 kqueue I 0:01.46 2540 12144 qmgr -l -t unix -u
Sep 11 03:01:01 e003 ps: 1002 460 407 1 select I 0:00.00 3372 35760 sshd: wgstuken@ttyp2
Sep 11 03:01:01 e003 ps: 0 462 1 1 kqueue Ss 0:02.53 2368 12028 /usr/libexec/postfix/master
Sep 11 03:01:01 e003 ps: 500 509 1 1 wait Is 0:00.00 796 4416 /etc/pkg/sbin/watch_server -b -p /var/run/nswSW.pid -c /etc/pkg/watch-server.cfg
Sep 11 03:01:01 e003 ps: 500 518 509 1 select I 0:00.06 956 4416 /etc/pkg/sbin/watch_server -b -p /var/run/nswSW.pid -c /etc/pkg/watch-server.cfg
Sep 11 03:01:01 e003 ps: 0 542 1 1 kqueue Is 0:00.01 1544 7616 /usr/sbin/inetd -l
Sep 11 03:01:01 e003 ps: 0 575 1 1 nanoslp Ss 0:01.64 1056 5468 /usr/sbin/cron
Sep 11 03:01:01 e003 ps: 0 601 313 1 netio Is 0:00.01 4564 35760 sshd: wgstuken [priv]
Sep 11 03:01:01 e003 ps: 1002 666 792 1 select I 0:00.00 3372 35760 sshd: wgstuken@ttyp1
Sep 11 03:01:01 e003 ps: 0 712 1 1 select Ss 0:06.41 17604 176944 /usr/pkg/sbin/httpd -k start
Sep 11 03:01:01 e003 ps: 0 743 1 3 select Ssl 0:02.15 5360 35056 /usr/pkg/sbin/bacula-fd -c /etc/pkg/bacula/bacula-fd.conf -g bacula
Sep 11 03:01:01 e003 ps: 0 775 313 1 netio Is 0:00.02 4508 35760 sshd: wgstuken [priv]
Sep 11 03:01:01 e003 ps: 0 792 313 1 netio Is 0:00.01 4564 35760 sshd: wgstuken [priv]
Sep 11 03:01:01 e003 ps: 0 875 313 1 netio Is 0:00.01 4564 35760 sshd: wgstuken [priv]
Sep 11 03:01:01 e003 ps: 1002 894 380 1 select I 0:00.00 3372 35760 sshd: wgstuken@ttyp5
Sep 11 03:01:01 e003 ps: 1002 1005 875 1 select I 0:00.00 3372 35760 sshd: wgstuken@ttyp4
Sep 11 03:01:01 e003 ps: 1002 1115 601 1 select I 0:00.00 3372 35760 sshd: wgstuken@ttyp3
Sep 11 03:01:01 e003 ps: 0 1214 313 1 netio Is 0:00.01 4588 35760 sshd: wgstuken [priv]
Sep 11 03:01:01 e003 ps: 1002 1360 1214 1 select I 0:00.01 3372 35760 sshd: wgstuken@ttyp6
Sep 11 03:01:01 e003 ps: 802 1440 712 1 semwait I 0:00.73 10000 177968 /usr/pkg/sbin/httpd -k start
Sep 11 03:01:01 e003 ps: 802 2426 712 1 semwait I 0:01.13 9944 177968 /usr/pkg/sbin/httpd -k start
Sep 11 03:01:01 e003 ps: 802 2585 712 1 semwait I 0:00.34 9804 177968 /usr/pkg/sbin/httpd -k start
Sep 11 03:01:01 e003 ps: 802 2607 712 1 socket I 0:00.00 4440 115164 /usr/pkg/sbin/httpd -k start
Sep 11 03:01:01 e003 ps: 802 3683 712 1 semwait I 0:01.00 9920 177968 /usr/pkg/sbin/httpd -k start
Sep 11 03:01:01 e003 ps: 802 4193 712 1 semwait I 0:00.88 10048 177968 /usr/pkg/sbin/httpd -k start
Sep 11 03:01:01 e003 ps: 802 6004 712 1 semwait I 0:00.90 9968 177968 /usr/pkg/sbin/httpd -k start
Sep 11 03:01:01 e003 ps: 802 7014 712 1 semwait I 0:01.02 10004 177968 /usr/pkg/sbin/httpd -k start
Sep 11 03:01:01 e003 ps: 802 7061 712 1 semwait I 0:01.00 9948 177968 /usr/pkg/sbin/httpd -k start
Sep 11 03:01:01 e003 ps: 802 7202 712 1 semwait I 0:01.11 9940 177968 /usr/pkg/sbin/httpd -k start
Sep 11 03:01:01 e003 ps: 0 7543 15040 1 tstile D 0:00.03 1424 10132 runtar NSW-backup /usr/pkg/bin/gtar --create --file /dev/null --directory /var/log --one-file-system --listed-incremental /var/amanda/gnutar-lists/e003_var_log_0.new --sparse --ignore-failed-read --totals --exclude-from /tmp/amanda/sendsize._var_log.20120910233003.exclude .
Sep 11 03:01:01 e003 ps: 802 7768 712 1 semwait I 0:00.98 10024 177968 /usr/pkg/sbin/httpd -k start
Sep 11 03:01:01 e003 ps: 802 7929 712 1 semwait I 0:00.41 9960 177968 /usr/pkg/sbin/httpd -k start
Sep 11 03:01:01 e003 ps: 0 10744 7543 1 - ZW 0:00.00 0 0 (sh)
Sep 11 03:01:01 e003 ps: 802 11307 712 1 semwait I 0:00.41 9924 177968 /usr/pkg/sbin/httpd -k start
Sep 11 03:01:01 e003 ps: 800 11876 26376 1 wait I 0:00.00 1576 10160 /usr/pkg/libexec/sendsize amandad bsdtcp
Sep 11 03:01:01 e003 ps: 802 11955 712 1 kqueue S 0:00.28 9936 177968 /usr/pkg/sbin/httpd -k start
Sep 11 03:01:01 e003 ps: 0 12002 575 1 piperd S 0:00.00 1200 5468 cron: running job
Sep 11 03:01:01 e003 ps: 802 13211 712 1 semwait I 0:00.95 9944 177968 /usr/pkg/sbin/httpd -k start
Sep 11 03:01:01 e003 ps: 800 13326 11876 1 piperd I 0:00.00 1364 10192 /usr/pkg/libexec/sendsize amandad bsdtcp
Sep 11 03:01:01 e003 ps: 0 13686 15453 1 - ZW 0:00.00 0 0 (sh)
Sep 11 03:01:01 e003 ps: 802 14111 712 1 semwait I 0:01.02 10000 177968 /usr/pkg/sbin/httpd -k start
Sep 11 03:01:01 e003 ps: 0 15029 25894 1 - ZW 0:00.00 0 0 (sh)
Sep 11 03:01:01 e003 ps: 800 15040 11876 1 piperd I 0:00.00 1364 10192 /usr/pkg/libexec/sendsize amandad bsdtcp
Sep 11 03:01:01 e003 ps: 802 15174 712 1 semwait I 0:01.07 9992 177968 /usr/pkg/sbin/httpd -k start
Sep 11 03:01:01 e003 ps: 0 15453 19855 1 tstile D 0:00.03 1424 10132 runtar NSW-backup /usr/pkg/bin/gtar --create --file /dev/null --directory / --one-file-system --listed-incremental /var/amanda/gnutar-lists/e003__0.new --sparse --ignore-failed-read --totals --exclude-from /tmp/amanda/sendsize._.20120910233003.exclude .
Sep 11 03:01:01 e003 ps: 802 15908 712 1 semwait I 0:00.89 9920 177968 /usr/pkg/sbin/httpd -k start
Sep 11 03:01:01 e003 ps: 500 16327 12002 1 wait Ss 0:00.00 1140 6664 /bin/sh -c /bin/ps -axwwo "uid pid ppid nlwp wchan state time rsz vsz command" | logger -t ps -p local1.info
Sep 11 03:01:01 e003 ps: 802 16924 712 1 semwait I 0:00.97 9972 177968 /usr/pkg/sbin/httpd -k start
Sep 11 03:01:01 e003 ps: 802 19410 712 1 semwait I 0:00.36 9928 177968 /usr/pkg/sbin/httpd -k start
Sep 11 03:01:01 e003 ps: 800 19855 11876 1 piperd I 0:00.00 1352 10192 /usr/pkg/libexec/sendsize amandad bsdtcp
Sep 11 03:01:01 e003 ps: 12 21041 462 1 kqueue S 0:00.00 2432 12084 pickup -l -t fifo -u
Sep 11 03:01:01 e003 ps: 500 25050 16327 1 piperd S 0:00.00 804 4328 logger -t ps -p local1.info
Sep 11 03:01:01 e003 ps: 0 25894 13326 1 tstile D 0:00.00 1420 10132 runtar NSW-backup /usr/pkg/bin/gtar --create --file /dev/null --directory /home.stand --one-file-system --listed-incremental /var/amanda/gnutar-lists/e003_home.stand_0.new --sparse --ignore-failed-read --totals --exclude-from /tmp/amanda/sendsize._home.stand.20120910233003.exclude .
Sep 11 03:01:01 e003 ps: 800 26376 542 1 select I 0:00.00 1588 9156 amandad bsdtcp amdump amindexd amidxtaped
Sep 11 03:01:01 e003 ps: 802 28274 712 1 semwait I 0:00.91 10068 177968 /usr/pkg/sbin/httpd -k start
Sep 11 03:01:01 e003 ps: 500 28676 16327 1 - O 0:00.00 932 6552 /bin/ps -axwwo uid pid ppid nlwp wchan state time rsz vsz command
Sep 11 03:01:01 e003 ps: 0 404 530 1 pause I 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 0 530 656 1 pause I 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 0 558 874 1 pause I 0:00.01 1204 4508 -csh -m
Sep 11 03:01:01 e003 ps: 0 656 558 1 pause I 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 0 662 404 1 ttyraw I+ 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 1002 874 297 1 pause Is 0:00.00 1232 4508 -csh
Sep 11 03:01:01 e003 ps: 0 268 872 1 pause I 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 0 591 268 1 pause I 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 0 607 891 1 pause I 0:00.01 1204 4508 -csh -m
Sep 11 03:01:01 e003 ps: 0 851 591 1 ttyraw I+ 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 0 872 607 1 pause I 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 1002 891 666 1 pause Is 0:00.00 1232 4508 -csh
Sep 11 03:01:01 e003 ps: 0 469 915 1 pause I 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 0 599 469 1 ttyraw I+ 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 0 849 1037 1 pause I 0:00.01 1204 4508 -csh -m
Sep 11 03:01:01 e003 ps: 0 915 849 1 pause I 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 1002 1037 460 1 pause Is 0:00.00 1232 4508 -csh
Sep 11 03:01:01 e003 ps: 0 161 476 1 pause I 0:00.01 1204 4508 -csh -m
Sep 11 03:01:01 e003 ps: 0 357 803 1 pause I 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 1002 476 1115 1 pause Is 0:00.00 1232 4508 -csh
Sep 11 03:01:01 e003 ps: 0 489 807 1 ttyraw I+ 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 0 803 161 1 pause I 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 0 807 357 1 pause I 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 0 436 562 1 pause I 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 0 562 750 1 pause I 0:00.01 1204 4508 -csh -m
Sep 11 03:01:01 e003 ps: 0 694 436 1 pause I 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 0 698 824 1 ttyraw I+ 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 1002 750 1005 1 pause Is 0:00.00 1232 4508 -csh
Sep 11 03:01:01 e003 ps: 0 824 694 1 pause I 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 0 321 767 1 pause I 0:00.01 1204 4508 -csh -m
Sep 11 03:01:01 e003 ps: 0 393 1101 1 pause I 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 1002 767 894 1 pause Is 0:00.00 1232 4508 -csh
Sep 11 03:01:01 e003 ps: 0 785 321 1 pause I 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 0 836 393 1 ttyraw I+ 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 0 1101 785 1 pause I 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 1002 706 1360 1 pause Is 0:00.00 1232 4508 -csh
Sep 11 03:01:01 e003 ps: 0 1181 706 1 pause I 0:00.01 1204 4508 -csh -m
Sep 11 03:01:01 e003 ps: 0 1200 1181 1 pause I 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 0 1448 1450 1 ttyraw I+ 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 0 1450 1200 1 pause I 0:00.00 1204 4508 -sh (csh)
Sep 11 03:01:01 e003 ps: 0 497 1 1 wait I 0:00.00 1180 6664 /bin/sh /usr/pkg/bin/mysqld_safe --user=mysql --datadir=/var/mysql --pid-file=/var/mysql/e003.pid --skip-thread-priority
Sep 11 03:01:01 e003 ps: 801 755 497 10 select Il 2:57.45 20124 82068 /usr/pkg/libexec/mysqld --basedir=/usr/pkg --datadir=/var/mysql --user=mysql --skip-thread-priority --log-error=/var/mysql/e003.err --pid-file=/var/mysql/e003.pid
Sep 11 03:01:01 e003 ps: 0 778 1 1 ttyraw Is+ 0:00.00 1136 6500 /usr/libexec/getty std.38400 console
Sep 11 03:02:00 e003 df: Filesystem 1K-blocks Used Avail %Cap Mounted on
Sep 11 03:02:00 e003 df: /dev/raid0a 96247 46238 45197 50% /
Sep 11 03:02:00 e003 df: /dev/raid0f 16264494 59196 15392074 0% /var
Sep 11 03:02:00 e003 df: /dev/raid0e 8132206 5899618 1825978 76% /usr
Sep 11 03:02:00 e003 df: /dev/raid0g 30497948 164 28972888 0% /nc-www
Sep 11 03:02:00 e003 df: /dev/raid0h 18737934 58 17800980 0% /home.stand
Sep 11 03:02:00 e003 df: procfs 4 4 0 100% /proc
Sep 11 03:02:00 e003 df: kernfs 1 1 0 100% /kern
Sep 11 03:02:00 e003 df: tmpfs 1048576 736 1047840 0% /tmp
Next output is at Sep 12 09:00:01 from vmstat.
The backup is started between 23:00 and 23:30 - the listing from 23:00 does not show
the amanda process, At 23:30 the state shown above is already reached.
At 03:15 the daily cronjob is started and I think that this one will finaly lock up everything.
The runtar command is a binary from amanda software.
It will finaly execve to the gnutar binary and this has still not happen.
In front of that it uses popen("<gnutar> --version 2>&1", "r") to determine
the gnutar binary version.
It does a fgets from the stream and than simply forget FILE* - no pclose() done on it - this looks
like a bug in amanda-client software ... (I've send a bug report for too.)
It then does a rename() of the debug-output file - so the rename-bug may also be relevant.
The rename() happens in /tmp - this is tmpfs.
The rename-bug-fix from 5.1_STABLE has been integrated into this kernel.
updated files:
src/sys/ufs/lfs/lfs_vnops.c
src/usr.bin/pmap/pmap.h
src/sys/ufs/ufs/inode.h
src/sys/ufs/ufs/ufs_extern.h
src/sys/ufs/ufs/ufs_lookup.c
src/sys/ufs/ufs/ufs_vnops.c
src/sys/ufs/ufs/ufs_wapbl.c
So either it does not fix the problem for 5.1.2 or there is another bug related popen() if
there is no pclose().
The zombi processes are there due to the fact that the call to pclose() is missing.
I tend to the assumption, that the rename fix for 5.1_STABLE does not fix the problem in 5.1.2.
(or perhaps does not fix the problem at all ...)
Is it possible that the rename bug is still present in tmpfs?
No patch has been applied to any tmpfs-related files.
Accedently the debug files of amanda are in /tmp and are lost after reboot, so I don't know the
famaous last debug-words of runtar.
>How-To-Repeat:
Setup amanda client as backup system, try to backup some filesystems and wait.
It will happen not all the time, but - at least for us - once a month.
>Fix:
Not known till now. Event not realy known who to go on with further debugging.
>Release-Note:
>Audit-Trail:
From: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/46955: process deadlock (tstile) runing amanda sendsize - rename bug in tmpfs ???
Date: Fri, 14 Sep 2012 10:37:42 +0200
Hi again,
I've forgot to look into the Bug-database prior sending this one - sorry.
In -current there seems to be a patch for the rename problem that I've
still not integrated in my 5.1.2. (PR 36681).
I'm gooing to do this as next step.
dholland stated that this fix will solve the rename problem in tmpfs.
So I think it will solve our problem here too.
The patch for PR 36681 seems to be missing in 5.1_STABLE too - at least
the line count of the source from 5.1.2 and 5.1_STABLE are both 1490 and
the patch adds around 1000 lines according to the cvs-web information to
the diff.
It would be great to have a pullup of it to 5.x.
best regards
W. Stukenbrock
gnats-admin@NetBSD.org wrote:
> Thank you very much for your problem report.
> It has the internal identification `kern/46955'.
> The individual assigned to look at your
> report is: kern-bug-people.
>
>
>>Category: kern
>>Responsible: kern-bug-people
>>Synopsis: process deadlock (tstile) runing amanda sendsize - rename bug in tmpfs ???
>>Arrival-Date: Fri Sep 14 08:10:00 +0000 2012
>>
>
>
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc:
Subject: Re: kern/46955: process deadlock (tstile) runing amanda sendsize -
rename bug in tmpfs ???
Date: Sat, 15 Sep 2012 16:57:53 +0000
On Fri, Sep 14, 2012 at 08:40:04AM +0000, Wolfgang Stukenbrock wrote:
> I've forgot to look into the Bug-database prior sending this one - sorry.
>
> In -current there seems to be a patch for the rename problem that I've
> still not integrated in my 5.1.2. (PR 36681).
>
> I'm gooing to do this as next step.
That is probably a good idea, but the available patches only fix ffs.
> dholland stated that this fix will solve the rename problem in tmpfs.
> So I think it will solve our problem here too.
I don't think there's a netbsd-5 version of the tmpfs rename patches,
but I could have missed it going by.
(Unfortunately, each filesystem's rename needs to be fixed separately,
and they have historically been *all* wrong.)
--
David A. Holland
dholland@netbsd.org
From: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/46955: process deadlock (tstile) runing amanda sendsize - rename bug in tmpfs ???
Date: Mon, 17 Sep 2012 11:01:37 +0200
Hi,
it looks like that I've a working version for 5.1_STABLE now, but the
integration I've done is (very) ugly ...
- I took the whole tmpfs code from current.
- Need to modify two other makefiles (rump and ...) because on file in
tmpfs is gone and another (tmpfs_rename.c) came up.
- need to "import" genfs_rename.c, because most of the implementation is
now there. (Done by a copy to the tmpfs-folder and an include statement
in the tmpfs source file.)
- borrow some other new routines from genfs, because they are needed
now. (simply copied into the tmpfs sources in order to avoid changes in
other parts)
- add a second param to VOP_UNLOCK() - added 0 as it was in the old code
- need to add the INTERLOCK flag to the call of vget() - call semantic
has changed in -current.
- revert KERN_NAME_MAX change by adding a define to NAME_MAX instead of
the assert "MAXNAMELEN == NAME_MAX"
- borrow some new routines from kern_auth.c by copying into the tmpfs
sources ...
And here is the main problem with my "fast-and-dirty" integration:
It looks like nearly everything changed in kern_auth.c and some new
listener lists have been added that are not present in 5.1.x. I think
they cannot be simply used because the other code does not know then. To
add this knowledge everything kauth-related must be changed too in order
to work correctly - to much work ...
So I've took the main auth-methods from the old implemantation useing
the "gereric" auth routine as done in 5.1.x , but I additionaly needed
to allow root (uid == 0) explicitly to allow chown calls to get the
install-root filesystem of the install-kernels up correctly.
(Does this work in -current ???? Is there a listener that returns ALLOW
if root is trying to chown something? Or has the createion of the device
nodes in the install-kernels has been changed?)
The auth-part is still not realy tested, but it seems to work as before
till now.
If you are interested in this work, let me know, I will send it.
But the auth-integration should be reworked again!
Any hints how to revert the new-stuff into the old 5.1.x semantics
correctly?
The "problem" is the auth-vnode part and all related definitions.
W. Stukenbrock
David Holland wrote:
> The following reply was made to PR kern/46955; it has been noted by GNATS.
>
> From: David Holland <dholland-bugs@netbsd.org>
> To: gnats-bugs@NetBSD.org
> Cc:
> Subject: Re: kern/46955: process deadlock (tstile) runing amanda sendsize -
> rename bug in tmpfs ???
> Date: Sat, 15 Sep 2012 16:57:53 +0000
>
> On Fri, Sep 14, 2012 at 08:40:04AM +0000, Wolfgang Stukenbrock wrote:
> > I've forgot to look into the Bug-database prior sending this one - sorry.
> >
> > In -current there seems to be a patch for the rename problem that I've
> > still not integrated in my 5.1.2. (PR 36681).
> >
> > I'm gooing to do this as next step.
>
> That is probably a good idea, but the available patches only fix ffs.
>
> > dholland stated that this fix will solve the rename problem in tmpfs.
> > So I think it will solve our problem here too.
>
> I don't think there's a netbsd-5 version of the tmpfs rename patches,
> but I could have missed it going by.
>
> (Unfortunately, each filesystem's rename needs to be fixed separately,
> and they have historically been *all* wrong.)
>
> --
> David A. Holland
> dholland@netbsd.org
>
>
>
--
Dr. Nagler & Company GmbH
Hauptstraße 9
92253 Schnaittenbach
Tel. +49 9622/71 97-42
Fax +49 9622/71 97-50
Wolfgang.Stukenbrock@nagler-company.com
http://www.nagler-company.com
Hauptsitz: Schnaittenbach
Handelregister: Amberg HRB
Gerichtsstand: Amberg
Steuernummer: 201/118/51825
USt.-ID-Nummer: DE 273143997
Geschäftsführer: Dr. Martin Nagler, Prof. Dr. Dr. Karl-Kuno Kunze
State-Changed-From-To: open->feedback
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Thu, 05 Dec 2013 15:49:25 +0000
State-Changed-Why:
rename() issues were fixed in -current. I am not sure if anybody is
working on backporting them to netbsd-6 (or netbsd-5).
Volunteers welcome :)
From: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@NetBSD.org, gnats-admin@NetBSD.org, netbsd-bugs@NetBSD.org
Subject: Re: kern/46955: process deadlock (tstile) runing amanda sendsize - rename bug in tmpfs ???
Date: Mon, 20 Jan 2014 11:59:45 +0100
Hi, I'm getting a reminder for this PR from time to time.
But I don't know what I should do with it ...
The last state (known to me) is, that there is a fix in -current and
there is no plan to port it to 6.x or 5.1.x.
So this PR should either be set back to open or should be closed
(because there will be no patch/fix for 5.1.x at any time in the future).
best regards
W. Stukenbrock
State-Changed-From-To: feedback->open
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Sat, 26 Apr 2014 05:46:03 +0000
State-Changed-Why:
I think rmind was suggesting you should finish your patches to merge the
fix into -5 :-)
but I'll set it so it stops nagging you.
From: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
To: gnats-bugs@NetBSD.org
Cc: kern-bug-people@netbsd.org, netbsd-bugs@netbsd.org, gnats-admin@netbsd.org,
dholland@NetBSD.org
Subject: Re: kern/46955 (process deadlock (tstile) runing amanda sendsize - rename bug in tmpfs ???)
Date: Sun, 27 Apr 2014 10:37:06 +0200
Thank you,
I've had a try with tmpfs changes into 5.x, and it seems to fix most
issues, but accedently not all - I've had one dead lock in tmpfs again
some time ago.
Due to my limited time and the massive changes in the fs-code and the
way process notifications takes place between 5.x and 6.x, I've stopped
working on it and started migration of all of our systems to 6.x.
(Currently suspended due to the dead-lock-problem described in PR 48733.)
My ufs patch to 5.x seems to be stable (of cause, it is ugly ...), but I
think I've already send that one long time ago.
best regards
W. Stukenbrock
dholland@NetBSD.org wrote:
> Synopsis: process deadlock (tstile) runing amanda sendsize - rename bug in tmpfs ???
>
> State-Changed-From-To: feedback->open
> State-Changed-By: dholland@NetBSD.org
> State-Changed-When: Sat, 26 Apr 2014 05:46:03 +0000
> State-Changed-Why:
> I think rmind was suggesting you should finish your patches to merge the
> fix into -5 :-)
> but I'll set it so it stops nagging you.
>
>
>
>
From: David Holland <dholland@netbsd.org>
To: Wolfgang Stukenbrock <wolfgang.stukenbrock@nagler-company.com>
Cc: gnats-bugs@NetBSD.org
Subject: Re: kern/46955 (process deadlock (tstile) runing amanda sendsize -
rename bug in tmpfs ???)
Date: Sun, 27 Apr 2014 21:11:45 +0000
On Sun, Apr 27, 2014 at 10:37:06AM +0200, Wolfgang Stukenbrock wrote:
> I've had a try with tmpfs changes into 5.x, and it seems to fix most
> issues, but accedently not all - I've had one dead lock in tmpfs again some
> time ago.
There may be other issues. There were a bunch of stability fixes to
tmpfs last fall; I thought many of them had gotten into 6.x but I'm
told not. (rmind@ says he doesn't have time but can help out...)
However, none of them AFAIK have been merged into 5.x, so there are a
bunch of things that could have bitten you.
> Due to my limited time and the massive changes in the fs-code and the way
> process notifications takes place between 5.x and 6.x, I've stopped working
> on it and started migration of all of our systems to 6.x. (Currently
> suspended due to the dead-lock-problem described in PR 48733.)
That seems reasonable. We are hoping to get 7.x branched any month
now; it's definitely getting to the point where working on 5.x isn't
all that worthwhile.
> My ufs patch to 5.x seems to be stable (of cause, it is ugly ...), but I
> think I've already send that one long time ago.
Yup.
--
David A. Holland
dholland@netbsd.org
State-Changed-From-To: open->closed
State-Changed-By: dholland@NetBSD.org
State-Changed-When: Mon, 26 Nov 2018 08:00:54 +0000
State-Changed-Why:
-6 is now EOL so everything in here is fixed.
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2007
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.