NetBSD Problem Report #46235

From dholland@netbsd.org  Thu Mar 22 00:36:26 2012
Return-Path: <dholland@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	by www.NetBSD.org (Postfix) with ESMTP id 0799563E458
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 22 Mar 2012 00:36:25 +0000 (UTC)
Message-Id: <20120322003635.9A2A514A179@mail.netbsd.org>
Date: Thu, 22 Mar 2012 00:36:35 +0000 (UTC)
From: dholland@netbsd.org
Reply-To: dholland@netbsd.org
To: gnats-bugs@gnats.NetBSD.org
Subject: beating on ftruncate kills the system
X-Send-Pr-Version: 3.95

>Number:         46235
>Category:       kern
>Synopsis:       beating on ftruncate kills the system
>Confidential:   no
>Severity:       critical
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Mar 22 00:40:00 +0000 2012
>Last-Modified:  Thu Mar 22 17:20:02 +0000 2012
>Originator:     David A. Holland
>Release:        NetBSD 5.99.65
>Organization:
>Environment:
System: NetBSD  5.99.65 NetBSD 5.99.65 (GENERIC) #65: Mon Feb 13 01:38:30 EST 2012  dholland@valkyrie:/disk/disk1/home/dholland/netbsd/quotafix/src/sys/arch/i386/compile/obj.i386/GENERIC i386
Architecture: i386
Machine: i386
>Description:

I wrote a simple test program to beat on ftruncate. It fairly reliably
kills the system in any of a number of ways. I have seen the following
failures, using a 5.99.65 qemu VM that I had lying around from testing
quota patches. I expect testing on real hardware (faster) will make it
die harder and faster; also I expect using a multiprocessor machine to
have a similar effect.

(1)
fatal double fault in supervisor mode
trap type 13 code 80000000 eip c05656ae cs 8 eflags 202 cr2 c41e4fd0 ilevel 6
panic: trap
cpu0: Begin traceback...
printf_nolog(c0c2be4c,c3576f7c,c3576f7c,c05656ae,8,202,c41e4fd0,6,0,0) at netbsd:printf_nolog
trap_tss() at netbsd:trap_tss
--- trap via task gate ---
netbsd:static_vmems+0xe7c:
cpu0: End traceback...

(2)
system hangs; breaking into ddb reveals all or nearly all trunq
processes tstiled. not clear why.

(3)
system hangs; breaking into ddb reveals most trunq processes waiting
on uvn_fp1; a small number each on flt_noram1, tstile, flt_pmfail2.  I
saw one each waiting on vnode and flt_pmfail1. mount_mfs waits on
flt_noram1. cron, syslogd wait on uvn_fp1. (this one was done on an
mfs volume.)

(4)
system hangs; breaking into ddb shows infinite recursion starting
   uvm_fault_internal -> VOP_GETPAGES -> genfs_getpages ->
   uvm_pagermapout -> uvm_unmap_detach -> uvm_mapent_free ->
   pool_cache_put_paddr
and then looping through
   pool_cache_put_paddr -> pool_cache_put_slow -> pool_get ->
   pool_grow -> pool_page_alloc_meta -> vmem_alloc -> vmem_xalloc ->
   uvm_km_kmem_alloc -> pool_cache_put_paddr

(5)
system hangs; breaking into ddb shows many trunq processes waiting on
uvn_fp2, most of them showing no wchan and apparently runnable but not
running. Apparently the scheduler got wedged but it wasn't obvious
what broke.


The thing I was attempting to tickle is the lock release/reacquire
that's at line 354 of uvm_vnode.c in uvm_vnp_setsize(). ISTM that the
window that creates (though small) is dangerous. I'm not sure if I've
managed to hit it yet though.

In the qemu environment i've been using the most reliable way to
trigger a crash is to run trunq on a mfs volume with the default 8M
size. A 16M mfs crashes less but still crashes. (Note that I'm using
only 32M of RAM with qemu. I don't know if that's important, or what
happens with normal RAM sizes.) It is much less crashable on an ffs
volume; I'm not sure if this is because on-disk ffs is slower, or
because it's hitting mfs-specific wobblies, or what. Note that you
can't currently use tmpfs; tmpfs apparently objects to large sparse
files.

I'm also somewhat concerned about how slow trunq runs. ftruncate on
large files (sparse or not) has to do a fair amount of work, but even
so it seems to me that 128 ftruncates (the unit in which trunq prints
progress) should take milliseconds, not seconds. I suspect something
silly is going on, poassibly large thundering herds.

>How-To-Repeat:

trunq -f testfile [-p numprocs] [-w]

You can also change the (otherwise fixed) random seed and the number
of loops done by each process, but there doesn't seem to be much need
to.

It works by forking off a bunch of processes, each of which loops
ftruncating the test file to a random length.

Large numbers of procs (400-800) tend to trigger crash (3); smaller
numbers (30-60ish) tend to trigger crash (2), but not always. I've
only seen crashes (1), (4), and (5) once each so far.

Adding -w, to fork a second batch of processes to write into the test
file at random offsets, doesn't seem to make it noticeably crashier,
at least so far.

   --- trunq.c ---
/*
 * Fork N subprocesses to beat on ftruncate.
 * With -w option, fork N more subprocesses that write into the file.
 */

#include <sys/types.h>
#include <sys/wait.h>
#include <err.h>
#include <fcntl.h>
#include <signal.h>
#include <stdbool.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>

static void loop_ftruncate(int fd, unsigned numloops, unsigned myid) {
   unsigned i;
   off_t len;

   for (i=0; i<numloops; i++) {
      if (i % 256 == 0) {
	 write(STDOUT_FILENO, "t", 1);
      }
      len = random();
      if (ftruncate(fd, len) != 0) {
	 warn("truncater %d: ftruncate to %lld", myid, (long long)len);
      }
   }
}

static void loop_write(int fd, unsigned numloops, unsigned myid) {
   char buf[64];
   unsigned i;
   off_t pos;
   size_t len;
   ssize_t result;

   for (i=0; i<sizeof(buf); i++) {
      buf[i] = '@' + myid;
   }

   for (i=0; i<numloops; i++) {
      if (i % 256 == 0) {
	 write(STDOUT_FILENO, "w", 1);
      }
      pos = random();
      len = random() % sizeof(buf);
      result = pwrite(fd, buf, len, pos);
      if (result < 0) {
	 warn("writer #%d: pwrite %zu at %lld", myid, len, (long long)pos);
      }
      else if ((size_t)result != len) {
	 warnx("writer #%d: pwrite %zu at %lld: short count %zd of %zu",
	      myid, len, (long long)pos, result, len);
      }
   }
}

static void dowait(pid_t pid, const char *what, unsigned id, bool failed) {
   int ret, status;

   ret = waitpid(pid, &status, 0);
   if (ret < 0) {
      warn("%s %u (pid %d): waitpid", what, id, pid);
      return;
   }
   if (WIFEXITED(status)) {
      if (WEXITSTATUS(status) != 0) {
	 warnx("%s %u: exit %d", what, id, WEXITSTATUS(status));
      }
   }
   else if (WIFSIGNALED(status)) {
      if (!failed || WTERMSIG(status) != SIGTERM) {
	 warnx("%s %u: signal %d", what, id, WTERMSIG(status));
      }
   }
   else {
      warnx("%s %u: weird exit status %d", what, id, status);
   }
}

static void run_processes(int fd, unsigned numloops,
			  unsigned numprocs, bool dowriters) {
   unsigned i;
   pid_t tpids[numprocs];
   pid_t wpids[numprocs];
   unsigned long tseeds[numprocs];
   unsigned long wseeds[numprocs];
   bool failed;

   failed = false;

   for (i=0; i<numprocs; i++) {
      tpids[i] = -1;
      wpids[i] = -1;
   }

   /*
    * Do it this way so adding writers doesn't change the sequence of
    * sizes used by the truncaters.
    */
   for (i=0; i<numprocs; i++) {
      tseeds[i] = random();
   }
   for (i=0; i<numprocs; i++) {
      wseeds[i] = random();
   }

   for (i=0; i<numprocs; i++) {
      tpids[i] = fork();
      if (tpids[i] < 0) {
	 failed = true;
	 break;
      }
      if (tpids[i] == 0) {
	 /* child */
	 srandom(tseeds[i]);
	 loop_ftruncate(fd, numloops, i);
	 exit(0);
      }

      if (dowriters) {
	 wpids[i] = fork();
	 if (wpids[i] < 0) {
	    failed = true;
	    break;
	 }
	 if (wpids[i] == 0) {
	    /* child */
	    srandom(wseeds[i]);
	    loop_write(fd, numloops, i);
	    exit(0);
	 }
      }
   }

   if (failed) {
      for (i=0; i<numprocs; i++) {
	 if (tpids[i] != -1) {
	    kill(tpids[i], SIGTERM);
	 }
	 if (wpids[i] != -1) {
	    kill(wpids[i], SIGTERM);
	 }
      }
   }

   for (i=0; i<numprocs; i++) {
      dowait(tpids[i], "truncater", i, failed);
      if (dowriters) {
	 dowait(wpids[i], "writer", i, failed);
      }
   }
}

static void dotest(const char *file, unsigned numloops,
		   unsigned numprocs, bool dowriters) {
   int fd;

   fd = open(file, O_RDWR|O_CREAT|O_TRUNC, 0664);
   if (fd < 0) {
      err(EXIT_FAILURE, "%s", file);
   }

   run_processes(fd, numloops, numprocs, dowriters);

   if (close(fd)) {
      warn("%s: close", file);
   }
   if (unlink(file)) {
      warn("%s: unlink", file);
   }
}

static void usage(void) {
   fprintf(stderr, "usage: %s [options]\n", getprogname());
   fprintf(stderr, "   -f file      filename to use for testing [testfile]\n");
   fprintf(stderr, "   -l numloops  number of truncate loops to run [4096]\n");
   fprintf(stderr, "   -p numprocs  number of processes to fork [16]\n");
   fprintf(stderr, "   -r randseed  random seed [0x4159471]\n");
   fprintf(stderr, "   -w           write to file as well [false]\n");
}

int main(int argc, char *argv[]) {
   const char *file = "testfile";
   unsigned numloops = 4096;
   unsigned numprocs = 16;
   unsigned long seed = 0x4159471;
   bool dowriters = false;

   int ch;

   while ((ch = getopt(argc, argv, "f:l:r:p:w")) != -1) {
      switch (ch) {
       case 'f': file = optarg; break;
       case 'l': numloops = atoi(optarg); break;
       case 'p': numprocs = atoi(optarg); break;
       case 'r': seed = atol(optarg); break;
       case 'w': dowriters = true; break;
       default: usage(); return EXIT_FAILURE;
      }
   }
   if (optind != argc) {
      usage();
      return EXIT_FAILURE;
   }

   srandom(seed);
   dotest(file, numloops, numprocs, dowriters);
   return EXIT_SUCCESS;
}
   ------

>Fix:

No clue.

If the thing in uvm_vnode.c turns out to be the problem, fixing the
VOP_PUTPAGES interface to not release locks it didn't take (why is it
doing that?) would take care of it, but would probably be kind of
invasive for the -6 branch.

But, who knows if that's the problem... the observed symptoms could be
anything.

>Audit-Trail:
From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/46235: beating on ftruncate kills the system
Date: Thu, 22 Mar 2012 17:17:08 +0000

 On Thu, Mar 22, 2012 at 12:40:00AM +0000, dholland@netbsd.org wrote:
  > The thing I was attempting to tickle is the lock release/reacquire
  > that's at line 354 of uvm_vnode.c in uvm_vnp_setsize(). ISTM that the
  > window that creates (though small) is dangerous. I'm not sure if I've
  > managed to hit it yet though.

 I don't know if the crashes described in the original PR are connected
 to this particular issue or have deeper causes, but if you run trunq
 on msdosfs it dies almost instantly with:

 panic: kernel diagnostic assertion "vp->v_size <= newsize" failed: file "/disk/disk1/home/dholland/netbsd/quotafix/src/sys/uvm/uvm_vnode.c", line 369 

 which seems very likely to be a result of hitting that window. 

  > If the thing in uvm_vnode.c turns out to be the problem, fixing the
  > VOP_PUTPAGES interface to not release locks it didn't take (why is it
  > doing that?) would take care of it, but would probably be kind of
  > invasive for the -6 branch.

 This will not work. It seems that genfs_putpages cycles in and out of
 that lock many, many times. This seems unhealthy in itself, perhaps,
 but in any event it means no such quick fix is possible.

 I have no idea what would happen if the size assignment after that
 window were moved to before it, to make it atomic, but I expect
 nothing good as then genfs_putpages will be working past EOF. Probably
 the size model needs to be extended so it's possible to assign the
 size before calling uvn_put(), but I don't know what this entails.

 -- 
 David A. Holland
 dholland@netbsd.org

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.