NetBSD Problem Report #53032

From www@NetBSD.org  Thu Feb 15 08:18:26 2018
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 971A67A16B
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 15 Feb 2018 08:18:26 +0000 (UTC)
Message-Id: <20180215081825.9F3F77A1FB@mollari.NetBSD.org>
Date: Thu, 15 Feb 2018 08:18:25 +0000 (UTC)
From: alexander.boettcher@genode-labs.com
Reply-To: alexander.boettcher@genode-labs.com
To: gnats-bugs@NetBSD.org
Subject: Missing error handling in rump vfs layer (sys/rump/librump/rumpvfs/vm_vfs.c)
X-Send-Pr-Version: www-1.0

>Number:         53032
>Category:       kern
>Synopsis:       Missing error handling in rump vfs layer (sys/rump/librump/rumpvfs/vm_vfs.c)
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Feb 15 08:20:00 +0000 2018
>Originator:     Alexander Boettcher
>Release:        CVS version: -D"20141216 2100UTC"
>Organization:
Genode Labs GmbH
>Environment:
Genode with ported rump kernel, using rump_vfs+ext2 implementation of NetBSD.



>Description:
The rump kernel issue got rerported on the rump kernel user mailing list [3], which told me to report it here.

Original post:
--------------

Hello,

on Genode we use Rump since a while [0] to leverage the file system
implementation. Lately we encountered rarely (not ever reproducible)
corruption in written files of some few corrupted kB when we heavily
stress/write data of several GB during installation of a VM.

After some weeks of debugging and ruling out various reasons (and fixing also issues on Genode's side) I finally could track down the root cause to missing functionality in src/sys/rump/librump/rumpvfs/vm_vfs.c.

After following the hint in the vm_vfs.c file, that the code is similar/based on src/sys/uvm/uvm_pager.c, I extended vm_vfs.c accordingly properly (at least I hope so) by the error handling part.

It seems, correct me if I'm wrong, that pages got freed which hold not
yet written data (indicated by b_error member of struct buf).

With the error handling change we finally could avoid the rarely
corruption. (original Genode issue [1], just the patch [2])

We would like to contribute this change back, if applicable and after
some proper review by you. Since the relevant Rump kernel code did not
change for a longer time, we hope the patch should be easily applicable
to upstream.

[0]
https://genode.org/documentation/release-notes/14.02#NetBSD_file_systems_using_rump_kernels
[1] https://github.com/genodelabs/genode/issues/2677
[2]
https://github.com/genodelabs/genode/blob/7c5552a4412e42527c6bbc1a347b8aa2237a941c/repos/dde_rump/patches/vm_vfs.patch
[3] https://www.freelists.org/post/rumpkernel-users/Missing-error-handling-in-rumpvfsvm-vfsc

>How-To-Repeat:
Running Genode and installing Debian in a VM. As file system server we used the ported rump_fs server (rump kernel + vfs + ext2) to store the VM image. The VM guests will start reporting after installing of packages about corrupted data of packages (due to not written data to the VM image, as we found out).
>Fix:
With the error handling change we finally could avoid the rarely
corruption. (original Genode issue [1], just the patch [2])

[1] https://github.com/genodelabs/genode/issues/2677
[2]
https://github.com/genodelabs/genode/blob/7c5552a4412e42527c6bbc1a347b8aa2237a941c/repos/dde_rump/patches/vm_vfs.patch

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.43 2018/01/16 07:36:43 maya Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2017 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.