NetBSD Problem Report #45129

From tron@zhadum.org.uk  Sat Jul  9 23:27:40 2011
Return-Path: <tron@zhadum.org.uk>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id B799263B8E2
	for <gnats-bugs@gnats.NetBSD.org>; Sat,  9 Jul 2011 23:27:40 +0000 (UTC)
Message-Id: <20110709232736.2F814F9A18@lyssa.zhadum.org.uk>
Date: Sun, 10 Jul 2011 00:27:36 +0100 (BST)
From: tron@zhadum.org.uk
Reply-To: tron@zhadum.org.uk
To: gnats-bugs@gnats.NetBSD.org
Subject: Write handling in puffs(4) broken
X-Send-Pr-Version: 3.95

>Number:         45129
>Category:       lib
>Synopsis:       Write handling in refuse(3) broken
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    tron
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Jul 09 23:30:00 +0000 2011
>Closed-Date:    Sun Jan 13 18:01:08 +0000 2013
>Last-Modified:  Sun Jan 13 18:01:08 +0000 2013
>Originator:     tron@zhadum.org.uk
>Release:        NetBSD 5.99.54 2011-07-09 sources
>Organization:
Matthias Scheler                                  http://zhadum.org.uk/
>Environment:
System: NetBSD lyssa.zhadum.org.uk 5.99.54 NetBSD 5.99.54 (LYSSA) #1: Sat Jul 9 20:32:01 BST 2011 tron@lyssa.zhadum.org.uk:/src/sys/compile/LYSSA i386
Architecture: i386
Machine: i386
>Description:
Writing to "fuse-ext2" ("pkgsrc/filesystems/fuse-ext2") works fine
under NetBSD/amd64 5.1_STABLE but fails with a "Protocol Error"
under NetBSD/i385 5.99.54.

Using debugging output I found out that the EPROTO error is return by this
bit of code in puffs_vnop_write() in "src/sys/fs/puffs/puffs_vnops.c":

                        if (write_msg->pvnr_resid > tomove) {
                                puffs_senderr(pmp, PUFFS_ERR_WRITE,
                                    E2BIG, "resid grew", VPTOPNC(ap->a_vp));
                                error = EPROTO;
                                break;
                        }

With more debugging output I found out that write_msg->pvnr_resid
gets clobbered by this call:

                        PUFFS_MSG_ENQUEUEWAIT2(pmp, park_write, vp->v_data,
                            NULL, error);

PUFFS_MSG_ENQUEUEWAIT2() is defined as follows:

#define PUFFS_MSG_ENQUEUEWAIT2(pmp, park, vp1, vp2, var)                \
do {                                                                    \
        puffs_msg_enqueue(pmp, park);                                   \
        var = puffs_msg_wait2(pmp, park, vp1, vp2);                     \
} while (/*CONSTCOND*/0)

I've changed the code in puffs_vnop_write() to call both functions
seperately and found out that it is puffs_msg_wait2() which clobbers
write_msg->pvnr_resid. puffs_msg_wait2() is relatively short
function that calls puffs_msg_wait() and never writes to park_write->preq
(which points to write_msg). So I guess the bug is in puffs_msg_wait().

Here are the difference to this function between "netbsd-5" (which
works) and HEAD (which doesn't work):

--- old	2011-07-10 00:10:04.000000000 +0100
+++ new	2011-07-10 00:09:43.000000000 +0100
@@ -1,23 +1,35 @@
 int
 puffs_msg_wait(struct puffs_mount *pmp, struct puffs_msgpark *park)
 {
+	lwp_t *l = curlwp;
+	proc_t *p = l->l_proc;
 	struct puffs_req *preq = park->park_preq; /* XXX: hmmm */
-	struct mount *mp = PMPTOMP(pmp);
+	sigset_t ss;
+	sigset_t oss;
 	int error = 0;
 	int rv;

+	/*
+	 * block unimportant signals.
+	 *
+	 * The set of "important" signals here was chosen to be same as
+	 * nfs interruptible mount.
+	 */
+	sigfillset(&ss);
+	sigdelset(&ss, SIGINT);
+	sigdelset(&ss, SIGTERM);
+	sigdelset(&ss, SIGKILL);
+	sigdelset(&ss, SIGHUP);
+	sigdelset(&ss, SIGQUIT);
+	mutex_enter(p->p_lock);
+	sigprocmask1(l, SIG_BLOCK, &ss, &oss);
+	mutex_exit(p->p_lock);
+
 	mutex_enter(&pmp->pmp_lock);
 	puffs_mp_reference(pmp);
 	mutex_exit(&pmp->pmp_lock);

 	mutex_enter(&park->park_mtx);
-	if ((park->park_flags & PARKFLAG_WANTREPLY) == 0
-	    || (park->park_flags & PARKFLAG_CALL)) {
-		mutex_exit(&park->park_mtx);
-		rv = 0;
-		goto skipwait;
-	}
-
 	/* did the response beat us to the wait? */
 	if (__predict_false((park->park_flags & PARKFLAG_DONE)
 	    || (park->park_flags & PARKFLAG_HASERROR))) {
@@ -26,6 +38,13 @@
 		goto skipwait;
 	}

+	if ((park->park_flags & PARKFLAG_WANTREPLY) == 0
+	    || (park->park_flags & PARKFLAG_CALL)) {
+		mutex_exit(&park->park_mtx);
+		rv = 0;
+		goto skipwait;
+	}
+
 	error = cv_wait_sig(&park->park_cv, &park->park_mtx);
 	DPRINTF(("puffs_touser: waiter for %p woke up with %d\n",
 	    park, error));
@@ -74,24 +93,14 @@
 		mutex_exit(&park->park_mtx);
 	}

-	/*
-	 * retake the lock and release.  This makes sure (haha,
-	 * I'm humorous) that we don't process the same vnode in
-	 * multiple threads due to the locks hacks we have in
-	 * puffs_lock().  In reality this is well protected by
-	 * the biglock, but once that's gone, well, hopefully
-	 * this will be fixed for real.  (and when you read this
-	 * comment in 2017 and subsequently barf, my condolences ;).
-	 */
-	if (rv == 0 && !fstrans_is_owner(mp)) {
-		fstrans_start(mp, FSTRANS_NORMAL);
-		fstrans_done(mp);
-	}
-
  skipwait:
 	mutex_enter(&pmp->pmp_lock);
 	puffs_mp_release(pmp);
 	mutex_exit(&pmp->pmp_lock);

+	mutex_enter(p->p_lock);
+	sigprocmask1(l, SIG_SETMASK, &oss, NULL);
+	mutex_exit(p->p_lock);
+
 	return rv;
 }

I can unfortunately not find the bug in it.

>How-To-Repeat:
1.) Mount "fuse-ext2".
2.) Try to copy a file to it. You'll get a "Protocol Error" message
    from "cp".

>Fix:
Not known.

>Release-Note:

>Audit-Trail:
From: Matthias Scheler <tron@zhadum.org.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/45129: Write handling in puffs(4) broken
Date: Sat, 23 Jul 2011 20:55:28 +0100

 On Sat, Jul 09, 2011 at 11:30:01PM +0000, gnats-admin@netbsd.org wrote:
 > Thank you very much for your problem report.
 > It has the internal identification `kern/45129'.
 > The individual assigned to look at your
 > report is: kern-bug-people. 
 > 
 > >Category:       kern
 > >Responsible:    kern-bug-people
 > >Synopsis:       Write handling in puffs(4) broken
 > >Arrival-Date:   Sat Jul 09 23:30:00 +0000 2011

 I've tracked the corruption down to this location in the code in line 817
 in function puffsop_msg() in "src/sys/fs/puffs/puffs_msgif.c":

 			memcpy(park->park_preq, preq, pth->pth_framelen);

 This line is probably supposed to copy the request back into the caller's
 structure after handling it. But for some reason the passed in copy
 is incomplete or corrupt.

 	Kind regards

 -- 
 Matthias Scheler                                  http://zhadum.org.uk/

Responsible-Changed-From-To: kern-bug-people->tron
Responsible-Changed-By: tron@NetBSD.org
Responsible-Changed-When: Sun, 30 Dec 2012 10:02:25 +0000
Responsible-Changed-Why:
I'll handle this PR.


From: "Matthias Scheler" <tron@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/45129 CVS commit: src/lib/librefuse
Date: Sun, 30 Dec 2012 10:04:22 +0000

 Module Name:	src
 Committed By:	tron
 Date:		Sun Dec 30 10:04:22 UTC 2012

 Modified Files:
 	src/lib/librefuse: refuse.c

 Log Message:
 FUSE seems to allow short writes without errors but PUFFS doesn't. Work
 around this by returning ENOSPC in case of a short write to avoid protocol
 errors. This change is based on problem analysis provided by Antti Kantee.

 This fixes PR lib/45129 by myself.


 To generate a diff of this commit:
 cvs rdiff -u -r1.95 -r1.96 src/lib/librefuse/refuse.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->pending-pullups
State-Changed-By: tron@NetBSD.org
State-Changed-When: Sun, 30 Dec 2012 10:52:12 +0000
State-Changed-Why:
I've requested a pullup into the "netbsd-5" and "netbsd-6" branch.


From: "Jeff Rizzo" <riz@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/45129 CVS commit: [netbsd-6] src/lib/librefuse
Date: Mon, 7 Jan 2013 15:54:01 +0000

 Module Name:	src
 Committed By:	riz
 Date:		Mon Jan  7 15:54:01 UTC 2013

 Modified Files:
 	src/lib/librefuse [netbsd-6]: refuse.c

 Log Message:
 Pull up following revision(s) (requested by tron in ticket #772):
 	lib/librefuse/refuse.c: revision 1.96
 FUSE seems to allow short writes without errors but PUFFS doesn't. Work
 around this by returning ENOSPC in case of a short write to avoid protocol
 errors. This change is based on problem analysis provided by Antti Kantee.
 This fixes PR lib/45129 by myself.


 To generate a diff of this commit:
 cvs rdiff -u -r1.95 -r1.95.2.1 src/lib/librefuse/refuse.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: "Manuel Bouyer" <bouyer@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/45129 CVS commit: [netbsd-5] src/lib/librefuse
Date: Sun, 13 Jan 2013 17:57:48 +0000

 Module Name:	src
 Committed By:	bouyer
 Date:		Sun Jan 13 17:57:48 UTC 2013

 Modified Files:
 	src/lib/librefuse [netbsd-5]: refuse.c

 Log Message:
 Pull up following revision(s) (requested by tron in ticket #1836):
 	lib/librefuse/refuse.c: revision 1.96 via patch
 FUSE seems to allow short writes without errors but PUFFS doesn't. Work
 around this by returning ENOSPC in case of a short write to avoid protocol
 errors. This change is based on problem analysis provided by Antti Kantee.
 This fixes PR lib/45129 by myself.


 To generate a diff of this commit:
 cvs rdiff -u -r1.89.4.2 -r1.89.4.3 src/lib/librefuse/refuse.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: pending-pullups->closed
State-Changed-By: tron@NetBSD.org
State-Changed-When: Sun, 13 Jan 2013 18:01:08 +0000
State-Changed-Why:
The bug has been fixed.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.