NetBSD Problem Report #56332

From www@netbsd.org  Wed Jul 28 01:59:32 2021
Return-Path: <www@netbsd.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	(Client CN "mail.NetBSD.org", Issuer "mail.NetBSD.org CA" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id EEE6C1A921F
	for <gnats-bugs@gnats.NetBSD.org>; Wed, 28 Jul 2021 01:59:31 +0000 (UTC)
Message-Id: <20210728015931.05FEB1A9245@mollari.NetBSD.org>
Date: Wed, 28 Jul 2021 01:59:30 +0000 (UTC)
From: tnn@nygren.pp.se
Reply-To: tnn@nygren.pp.se
To: gnats-bugs@NetBSD.org
Subject: swap on 4k sector device uses only 1/8 of the configured capacity
X-Send-Pr-Version: www-1.0

>Number:         56332
>Category:       kern
>Synopsis:       swap on 4k sector device uses only 1/8 of the configured capacity
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Jul 28 02:00:00 +0000 2021
>Last-Modified:  Wed Aug 04 16:10:01 +0000 2021
>Originator:     Tobias Nygren
>Release:        9.99.87
>Organization:
>Environment:
>Description:
swap assumes the block size is 512 bytes.

>How-To-Repeat:
Set up 32 GiB of swap area on gpt on a 4k sector device:

ld4 at nvme0 nsid 1
ld4: 238 GB, 7752 cyl, 128 head, 63 sec, 4096 bytes/sect x 62514774 sectors
dk3 at ld4: 8388608 blocks at 50397440, type: swap

... but only 4 GiB is available.

# swapctl -lm
Device      1M-blocks     Used    Avail Capacity  Priority
/dev/dk3         4096        0     4096     0%    0

>Fix:
kernel should query the device for the correct block size when configuring swap.

>Audit-Trail:
From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56332: swap on 4k sector device uses only 1/8 of the configured capacity
Date: Wed, 28 Jul 2021 05:31:57 -0000 (UTC)

 tnn@nygren.pp.se writes:

 >ld4 at nvme0 nsid 1
 >ld4: 238 GB, 7752 cyl, 128 head, 63 sec, 4096 bytes/sect x 62514774 sectors
 >dk3 at ld4: 8388608 blocks at 50397440, type: swap
 >... but only 4 GiB is available.

 That looks like a bug in the dk driver.

 swap/dump are somewhat magic. Drivers have their own entry points to
 handle a swap (and dump) partition:

 DEVsize -> return the number of blocks
 DEVdump -> dump that many bytes to a given block number.

 Users of the driver will use DEV_BSIZE blocks, like for regular I/O.

 But dksize() returns sc_size and dkdump() checks block numbers against
 sc_size and offsets them by sc_offset. Both use the physical sector
 sizes (sc_size is 8388608, sc_offset is 50397440).

 So it's not only reporting the wrong size, but also writes to the
 wrong position on the disk if the physical sector size is not DEV_BSIZE.

 The regular I/O code does the right translation. So maybe (untested):

 Index: dk.c
 ===================================================================
 RCS file: /cvsroot/src/sys/dev/dkwedge/dk.c,v
 retrieving revision 1.105
 diff -p -u -r1.105 dk.c
 --- dk.c	2 Jun 2021 17:56:40 -0000	1.105
 +++ dk.c	28 Jul 2021 05:31:14 -0000
 @@ -1639,6 +1639,7 @@ static int
  dksize(dev_t dev)
  {
  	struct dkwedge_softc *sc = dkwedge_lookup(dev);
 +	uint64_t p_size;
  	int rv = -1;

  	if (sc == NULL)
 @@ -1651,12 +1652,13 @@ dksize(dev_t dev)

  	/* Our content type is static, no need to open the device. */

 +	p_size   = sc->sc_size << sc->sc_parent->dk_blkshift;
  	if (strcmp(sc->sc_ptype, DKW_PTYPE_SWAP) == 0) {
  		/* Saturate if we are larger than INT_MAX. */
 -		if (sc->sc_size > INT_MAX)
 +		if (p_size > INT_MAX)
  			rv = INT_MAX;
  		else
 -			rv = (int) sc->sc_size;
 +			rv = (int) p_size;
  	}

  	mutex_exit(&sc->sc_parent->dk_rawlock);
 @@ -1675,6 +1677,7 @@ dkdump(dev_t dev, daddr_t blkno, void *v
  {
  	struct dkwedge_softc *sc = dkwedge_lookup(dev);
  	const struct bdevsw *bdev;
 +	uint64_t p_size, p_offset;
  	int rv = 0;

  	if (sc == NULL)
 @@ -1697,16 +1700,20 @@ dkdump(dev_t dev, daddr_t blkno, void *v
  		rv = EINVAL;
  		goto out;
  	}
 -	if (blkno < 0 || blkno + size / DEV_BSIZE > sc->sc_size) {
 +
 +	p_offset = sc->sc_offset << sc->sc_parent->dk_blkshift;
 +	p_size   = sc->sc_size << sc->sc_parent->dk_blkshift;
 +
 +	if (blkno < 0 || blkno + size / DEV_BSIZE > p_size) {
  		printf("%s: blkno (%" PRIu64 ") + size / DEV_BSIZE (%zu) > "
 -		    "sc->sc_size (%" PRIu64 ")\n", __func__, blkno,
 -		    size / DEV_BSIZE, sc->sc_size);
 +		    "p_size (%" PRIu64 ")\n", __func__, blkno,
 +		    size / DEV_BSIZE, p_size);
  		rv = EINVAL;
  		goto out;
  	}

  	bdev = bdevsw_lookup(sc->sc_pdev);
 -	rv = (*bdev->d_dump)(sc->sc_pdev, blkno + sc->sc_offset, va, size);
 +	rv = (*bdev->d_dump)(sc->sc_pdev, blkno + p_offset, va, size);

  out:
  	mutex_exit(&sc->sc_parent->dk_rawlock);

From: Tobias Nygren <tnn@nygren.pp.se>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56332: swap on 4k sector device uses only 1/8 of the
 configured capacity
Date: Wed, 28 Jul 2021 13:27:10 +0200

 On Wed, 28 Jul 2021 05:35:01 +0000 (UTC)
 Michael van Elst <mlelstv@serpens.de> wrote:

 >  So it's not only reporting the wrong size, but also writes to the
 >  wrong position on the disk if the physical sector size is not DEV_BSIZE.

 Uff, that sounds bad. I will deconfigure swap ...

 >  The regular I/O code does the right translation. So maybe (untested):

 ... and set up another machine to test this fix.

From: mlelstv@serpens.de (Michael van Elst)
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/56332: swap on 4k sector device uses only 1/8 of the configured capacity
Date: Wed, 28 Jul 2021 20:33:40 -0000 (UTC)

 tnn@nygren.pp.se (Tobias Nygren) writes:

 > On Wed, 28 Jul 2021 05:35:01 +0000 (UTC)
 > Michael van Elst <mlelstv@serpens.de> wrote:
 > 
 > >  So it's not only reporting the wrong size, but also writes to the
 > >  wrong position on the disk if the physical sector size is not DEV_BSIZE.
 > 
 > Uff, that sounds bad. I will deconfigure swap ...

 Only a cash dump will write the wrong offset, swap uses regular I/O routines
 that are correct.

From: Tobias Nygren <tnn@nygren.pp.se>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: kern/56332: swap on 4k sector device uses only 1/8 of the
 configured capacity
Date: Wed, 4 Aug 2021 18:05:20 +0200

 Seems there are some issues with the patch.
 I created 8 GiB swap on a 4k sector NVMe device on 4 GiB RAM aarch64.
 swapctl does reports correct size now.
 Then I started untaring multiple copies of pkgsrc to tmpfs.
 Eventually the system grinds to near halt and top shows this:

 Memory: 28K Inact, 4K Wired, 256K Exec, 28K File, 1380K Free
 Swap: 8192M Total, 3652M Used, 4540M Free

 tmpfs          8.2G   3.8G   4.4G  46% /tmp

 There is clearly more swap available but all of the system
 RAM pages have leaked somewhere and are no longer accounted for.
 Unmounting /tmp released the all of the used swap but
 only 50 MiB of system memory came back to the free list.

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.46 2020/01/03 16:35:01 leot Exp $
$NetBSD: gnats_config.sh,v 1.9 2014/08/02 14:16:04 spz Exp $
Copyright © 1994-2020 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.