NetBSD Problem Report #31944

From yamt@mwd.biglobe.ne.jp  Sat Oct 29 05:46:41 2005
Return-Path: <yamt@mwd.biglobe.ne.jp>
Received: from yamt.dyndns.org (FLA1Abm205.kng.mesh.ad.jp [219.107.215.205])
	by narn.netbsd.org (Postfix) with ESMTP id C214863BA03
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 29 Oct 2005 05:46:40 +0000 (UTC)
Message-Id: <1130564753.193205.10137.nullmailer@yamt.dyndns.org>
Date: Sat, 29 Oct 2005 14:45:53 +0900
From: yamt@mwd.biglobe.ne.jp
Reply-To: yamt@mwd.biglobe.ne.jp
To: gnats-bugs@netbsd.org
Subject: tmpfs keeps too much memory
X-Send-Pr-Version: 3.95

>Number:         31944
>Category:       kern
>Synopsis:       tmpfs keeps too much memory
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    kern-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Oct 29 05:47:00 +0000 2005
>Closed-Date:    Tue Jun 22 19:19:38 +0000 2010
>Last-Modified:  Tue Jun 22 19:19:38 +0000 2010
>Originator:     YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
>Release:        NetBSD 3.99.10
>Organization:

>Environment:


System: NetBSD kaeru 3.99.10 NetBSD 3.99.10 (build.kaeru.xen.nodebug) #11: Fri Oct 28 12:14:04 JST 2005 takashi@kaeru:/usr/home/takashi/work/kernel/build.kaeru.xen.nodebug i386
Architecture: i386
Machine: i386
>Description:
	1. tmpfs doesn't free tmpfs_nodes until unmount.

	2. tmpfs ends up to use malloc heavily.  (mainly via uvm_aobj)
	   malloc doesn't return small allocations back to vm.

	3. tmpfs uses too much wired memory.
	   tmpfs_node, uvm_aobj, directory entries, etc.
>How-To-Repeat:
	code inspection.
>Fix:
	1. free them.

	2. don't use malloc.  or modify malloc.

	3. make them pageable.

>Release-Note:

>Audit-Trail:
From: "Julio M. Merino Vidal" <jmmv84@gmail.com>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: pkg/31944
Date: Tue, 7 Mar 2006 09:47:21 +0100

 > 1. tmpfs doesn't free tmpfs_nodes until unmount.

 This is because, once a node has been created, tmpfs needs to keep its id
 and generation number so that if the same node is reused it carries the same
 generation number.

 Maybe the memory needs could be lowered by keeping a id->gen map?  This
 way, when a node was deleted, tmpfs could insert this pair in the map for
 later reuses, avoiding the need to keep the whole node in memory.  The
 problem could not disappear completely (and it looks like it cannot), but it
 would take less memory.

 Does this sound reasonable?

 -- 
 Julio M. Merino Vidal <jmmv84@gmail.com>
 The Julipedia - http://julipedia.blogspot.com/

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
	yamt@mwd.biglobe.ne.jp
Cc: 
Subject: Re: pkg/31944
Date: Tue, 7 Mar 2006 09:38:47 -0500

 On Mar 7,  8:50am, jmmv84@gmail.com ("Julio M. Merino Vidal") wrote:
 -- Subject: Re: pkg/31944

 | The following reply was made to PR kern/31944; it has been noted by GNATS.
 | 
 | From: "Julio M. Merino Vidal" <jmmv84@gmail.com>
 | To: gnats-bugs@netbsd.org
 | Cc: 
 | Subject: Re: pkg/31944
 | Date: Tue, 7 Mar 2006 09:47:21 +0100
 | 
 |  > 1. tmpfs doesn't free tmpfs_nodes until unmount.
 |  
 |  This is because, once a node has been created, tmpfs needs to keep its id
 |  and generation number so that if the same node is reused it carries the same
 |  generation number.
 |  
 |  Maybe the memory needs could be lowered by keeping a id->gen map?  This
 |  way, when a node was deleted, tmpfs could insert this pair in the map for
 |  later reuses, avoiding the need to keep the whole node in memory.  The
 |  problem could not disappear completely (and it looks like it cannot), but it
 |  would take less memory.
 |  
 |  Does this sound reasonable?

 just keep a global generation number counter and use it to allocate generation
 numbers.

 christos

From: Jason Thorpe <thorpej@shagadelic.org>
To: christos@zoulas.com (Christos Zoulas)
Cc: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
	yamt@mwd.biglobe.ne.jp
Subject: Re: pkg/31944
Date: Tue, 7 Mar 2006 07:02:59 -0800

 On Mar 7, 2006, at 6:38 AM, Christos Zoulas wrote:

 > just keep a global generation number counter and use it to allocate  
 > generation
 > numbers.

 Except you would have to increment it each time an inode is  
 allocated, because you don't know what the inode's last generation  
 was.  That will run out pretty quickly, because it's a 32-bit counter.

 The map sounds like a good enough idea for now.

 OTOH, if the metadata were pageable.... :-)

 -- thorpej

From: christos@zoulas.com (Christos Zoulas)
To: Jason Thorpe <thorpej@shagadelic.org>
Cc: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
	yamt@mwd.biglobe.ne.jp
Subject: Re: pkg/31944
Date: Tue, 7 Mar 2006 10:36:38 -0500

 On Mar 7,  7:02am, thorpej@shagadelic.org (Jason Thorpe) wrote:
 -- Subject: Re: pkg/31944

 | 
 | On Mar 7, 2006, at 6:38 AM, Christos Zoulas wrote:
 | 
 | > just keep a global generation number counter and use it to allocate  
 | > generation
 | > numbers.
 | 
 | Except you would have to increment it each time an inode is  
 | allocated, because you don't know what the inode's last generation  
 | was.  That will run out pretty quickly, because it's a 32-bit counter.
 | 
 | The map sounds like a good enough idea for now.
 | 
 | OTOH, if the metadata were pageable.... :-)
 | 
 | -- thorpej

 Right, but it does not matter. The probability that the same generation
 number gets allocated to the same inode is miniscule.

 christos

From: Jason Thorpe <thorpej@shagadelic.org>
To: christos@zoulas.com (Christos Zoulas)
Cc: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
	yamt@mwd.biglobe.ne.jp
Subject: Re: pkg/31944
Date: Tue, 7 Mar 2006 08:33:24 -0800

 On Mar 7, 2006, at 7:36 AM, Christos Zoulas wrote:

 > Right, but it does not matter. The probability that the same  
 > generation
 > number gets allocated to the same inode is miniscule.

 Would using a random generation number be better?

 -- thorpej

From: christos@zoulas.com (Christos Zoulas)
To: Jason Thorpe <thorpej@shagadelic.org>
Cc: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
	yamt@mwd.biglobe.ne.jp
Subject: Re: pkg/31944
Date: Tue, 7 Mar 2006 13:20:56 -0500

 On Mar 7,  8:33am, thorpej@shagadelic.org (Jason Thorpe) wrote:
 -- Subject: Re: pkg/31944

 | 
 | On Mar 7, 2006, at 7:36 AM, Christos Zoulas wrote:
 | 
 | > Right, but it does not matter. The probability that the same  
 | > generation
 | > number gets allocated to the same inode is miniscule.
 | 
 | Would using a random generation number be better?

 Using a linear congruent random jenerator would be cheap and easy, but I
 not sure if it is going to be a lot better (and you have to make sure that
 the cycle frequency is > 2^32 - 1). But the real issue is why does tmpfs
 need a generation number? It should be able to do:

 	nnode->tn_id = tmp->tm_nodes_last++;

 in the inode-reuse case to get a unique inode # anyway. With inodes being
 64_bit this will take a long time to run out. Even if it does run out,
 you can start using the generation number then. Is 2^96 good enough now?

 	nnode->tn_id = tmp->tm_nodes_last++;
 	if (tmp->tm_nodes_last == 0)
 		tmp->tm_gen_last++;

 and in tempfs_getattr:

 	va.va_gen = tmp->tm_gen_last;

 christos

From: David Laight <david@l8s.co.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: pkg/31944
Date: Wed, 8 Mar 2006 00:10:13 +0000

 On Tue, Mar 07, 2006 at 08:50:02AM +0000, Julio M. Merino Vidal wrote:
 > The following reply was made to PR kern/31944; it has been noted by GNATS.
 > 
 > From: "Julio M. Merino Vidal" <jmmv84@gmail.com>
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: pkg/31944
 > Date: Tue, 7 Mar 2006 09:47:21 +0100
 > 
 >  > 1. tmpfs doesn't free tmpfs_nodes until unmount.
 >  
 >  This is because, once a node has been created, tmpfs needs to keep its id
 >  and generation number so that if the same node is reused it carries the same
 >  generation number.

 Maybe use 64bit inode numbers ?
 Or do something similar to the way process ids are allocated to avoid reusing
 number quickly.

 After all only NFS exports can (or rather will) open files using the file-id
 based on inode+generation number.

 	David

 -- 
 David Laight: david@l8s.co.uk

From: David Laight <david@l8s.co.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: pkg/31944
Date: Wed, 8 Mar 2006 00:10:13 +0000

 On Tue, Mar 07, 2006 at 08:50:02AM +0000, Julio M. Merino Vidal wrote:
 > The following reply was made to PR kern/31944; it has been noted by GNATS.
 > 
 > From: "Julio M. Merino Vidal" <jmmv84@gmail.com>
 > To: gnats-bugs@netbsd.org
 > Cc: 
 > Subject: Re: pkg/31944
 > Date: Tue, 7 Mar 2006 09:47:21 +0100
 > 
 >  > 1. tmpfs doesn't free tmpfs_nodes until unmount.
 >  
 >  This is because, once a node has been created, tmpfs needs to keep its id
 >  and generation number so that if the same node is reused it carries the same
 >  generation number.

 Maybe use 64bit inode numbers ?
 Or do something similar to the way process ids are allocated to avoid reusing
 number quickly.

 After all only NFS exports can (or rather will) open files using the file-id
 based on inode+generation number.

 	David

 -- 
 David Laight: david@l8s.co.uk

From: "Julio M. Merino Vidal" <jmmv84@gmail.com>
To: tech-kern@netbsd.org, gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/31944 - Fix to reduce tmpfs memory usage
Date: Fri, 14 Apr 2006 19:51:14 +0200

 ------=_Part_7760_25248237.1145037074927
 Content-Type: text/plain; charset=ISO-8859-1
 Content-Transfer-Encoding: quoted-printable
 Content-Disposition: inline

 Hello,

 PR kern/31944 lists three issues that make tmpfs consume too much
 memory.  The attached patch addresses the first of them.  It does the
 following:

 - Only keep the node identifier/generation number when deleting a node.
   The size of tmpfs_nid is much smaller than that of tmpfs_node.
 - Avoid keeping nodes that were initialized but then discarded due to some
   error.  See the keepit parameter to the tmpfs_free_node function.

 I have read the replies to that PR but the proposed solutions seem to be
 "not right".  Also, one question that arises... can there be two live files=
  in a
 file system with the same node number?  If not, all those solutions do not
 seem to address this...

 tmpfs passes the regression test suite after the changes.

 Please raise your comments.

 --
 Julio M. Merino Vidal <jmmv84@gmail.com>
 The Julipedia - http://julipedia.blogspot.com/

 ------=_Part_7760_25248237.1145037074927
 Content-Type: text/x-patch; name=patch.diff; charset=us-ascii
 Content-Transfer-Encoding: 7bit
 X-Attachment-Id: f_em0tgium
 Content-Disposition: attachment; filename="patch.diff"

 Index: tmpfs.h
 ===================================================================
 RCS file: /cvsroot/src/sys/fs/tmpfs/tmpfs.h,v
 retrieving revision 1.18
 diff -u -p -r1.18 tmpfs.h
 --- tmpfs.h	31 Mar 2006 20:27:49 -0000	1.18
 +++ tmpfs.h	14 Apr 2006 17:42:54 -0000
 @@ -93,6 +93,25 @@ TAILQ_HEAD(tmpfs_dir, tmpfs_dirent);
  /* --------------------------------------------------------------------- */

  /*
 + * Node identifier (node number/generation number pair).
 + *
 + * This information is kept separate from the tmpfs_node structure so
 + * that it can easily be left alive when an existing node is deleted.
 + * This is because the system needs to reuse node numbers when it reaches
 + * the node limit and therefore it has to assign a different generation
 + * number.
 + */
 +struct tmpfs_nid {
 +	LIST_ENTRY(tmpfs_nid)	tni_entries;
 +
 +	ino_t			tni_id;
 +	unsigned long		tni_gen;
 +};
 +LIST_HEAD(tmpfs_nid_list, tmpfs_nid);
 +
 +/* --------------------------------------------------------------------- */
 +
 +/*
   * Internal representation of a tmpfs file system node.
   *
   * This structure is splitted in two parts: one holds attributes common
 @@ -112,8 +131,8 @@ struct tmpfs_node {
  	 * and faster, as we do not need to convert between two types. */
  	enum vtype		tn_type;

 -	/* Node identifier. */
 -	ino_t			tn_id;
 +	/* Node identifier and generation number. */
 +	struct tmpfs_nid *	tn_nid;

  	/* Node's internal status.  This is used by several file system
  	 * operations to do modifications to the node in a delayed
 @@ -137,7 +156,6 @@ struct tmpfs_node {
  	struct timespec		tn_mtime;
  	struct timespec		tn_ctime;
  	struct timespec		tn_birthtime;
 -	unsigned long		tn_gen;

  	/* Head of byte-level lock list (used by tmpfs_advlock). */
  	struct lockf *		tn_lockf;
 @@ -258,21 +276,23 @@ struct tmpfs_mount {
  	 * i.e., they refer to existing files.  The available list contains
  	 * all nodes that are currently available for use by new files.
  	 * Nodes must be kept in this list (instead of deleting them)
 -	 * because we need to keep track of their generation number (tn_gen
 -	 * field).
 +	 * because we need to keep track of their generation number and
 +	 * because we do not want to use the same node number for two live
 +	 * files.
  	 *
  	 * Note that nodes are lazily allocated: if the available list is
  	 * empty and we have enough space to create more nodes, they will be
  	 * created and inserted in the used list.  Once these are released,
  	 * they will go into the available list, remaining alive until the
  	 * file system is unmounted. */
 +	struct tmpfs_nid_list	tm_nodes_avail;
  	struct tmpfs_node_list	tm_nodes_used;
 -	struct tmpfs_node_list	tm_nodes_avail;

  	/* Pools used to store file system meta data.  These are not shared
  	 * across several instances of tmpfs for the reasons described in
  	 * tmpfs_pool.c. */
  	struct tmpfs_pool	tm_dirent_pool;
 +	struct tmpfs_pool	tm_nid_pool;
  	struct tmpfs_pool	tm_node_pool;
  	struct tmpfs_str_pool	tm_str_pool;
  };
 @@ -300,7 +320,7 @@ struct tmpfs_fid {
  int	tmpfs_alloc_node(struct tmpfs_mount *, enum vtype,
  	    uid_t uid, gid_t gid, mode_t mode, struct tmpfs_node *,
  	    char *, dev_t, struct proc *, struct tmpfs_node **);
 -void	tmpfs_free_node(struct tmpfs_mount *, struct tmpfs_node *);
 +void	tmpfs_free_node(struct tmpfs_mount *, struct tmpfs_node *, boolean_t);
  int	tmpfs_alloc_dirent(struct tmpfs_mount *, struct tmpfs_node *,
  	    const char *, uint16_t, struct tmpfs_dirent **);
  void	tmpfs_free_dirent(struct tmpfs_mount *, struct tmpfs_dirent *,
 Index: tmpfs_subr.c
 ===================================================================
 RCS file: /cvsroot/src/sys/fs/tmpfs/tmpfs_subr.c,v
 retrieving revision 1.18
 diff -u -p -r1.18 tmpfs_subr.c
 --- tmpfs_subr.c	16 Feb 2006 14:57:50 -0000	1.18
 +++ tmpfs_subr.c	14 Apr 2006 17:42:54 -0000
 @@ -93,6 +93,7 @@ tmpfs_alloc_node(struct tmpfs_mount *tmp
      uid_t uid, gid_t gid, mode_t mode, struct tmpfs_node *parent,
      char *target, dev_t rdev, struct proc *p, struct tmpfs_node **node)
  {
 +	struct tmpfs_nid *nid;
  	struct tmpfs_node *nnode;

  	/* If the root directory of the 'tmp' file system is not yet
 @@ -104,24 +105,35 @@ tmpfs_alloc_node(struct tmpfs_mount *tmp

  	KASSERT(uid != VNOVAL && gid != VNOVAL && mode != VNOVAL);

 -	nnode = NULL;
 +	nnode = (struct tmpfs_node *)TMPFS_POOL_GET(&tmp->tm_node_pool, 0);
 +	if (nnode == NULL)
 +		return ENOSPC;
 +	KASSERT(nnode != NULL);
 +
 +	nid = NULL;
  	if (LIST_EMPTY(&tmp->tm_nodes_avail)) {
  		KASSERT(tmp->tm_nodes_last <= tmp->tm_nodes_max);
 -		if (tmp->tm_nodes_last == tmp->tm_nodes_max)
 +		if (tmp->tm_nodes_last == tmp->tm_nodes_max) {
 +			TMPFS_POOL_PUT(&tmp->tm_node_pool, nnode);
  			return ENOSPC;
 +		}

 -		nnode =
 -		    (struct tmpfs_node *)TMPFS_POOL_GET(&tmp->tm_node_pool, 0);
 -		if (nnode == NULL)
 +		nid = (struct tmpfs_nid *)TMPFS_POOL_GET(&tmp->tm_nid_pool, 0);
 +		if (nid == NULL) {
 +			TMPFS_POOL_PUT(&tmp->tm_node_pool, nnode);
  			return ENOSPC;
 -		nnode->tn_id = tmp->tm_nodes_last++;
 -		nnode->tn_gen = 0;
 +		}
 +
 +		nid->tni_id = tmp->tm_nodes_last++;
 +		nid->tni_gen = 0;
  	} else {
 -		nnode = LIST_FIRST(&tmp->tm_nodes_avail);
 -		LIST_REMOVE(nnode, tn_entries);
 -		nnode->tn_gen++;
 +		nid = LIST_FIRST(&tmp->tm_nodes_avail);
 +		LIST_REMOVE(nid, tni_entries);
 +		nid->tni_gen++;
  	}
 -	KASSERT(nnode != NULL);
 +	KASSERT(nid != NULL);
 +	nnode->tn_nid = nid;
 +
  	LIST_INSERT_HEAD(&tmp->tm_nodes_used, nnode, tn_entries);

  	/* Generic initialization. */
 @@ -168,7 +180,7 @@ tmpfs_alloc_node(struct tmpfs_mount *tmp
  		    tmpfs_str_pool_get(&tmp->tm_str_pool, nnode->tn_size, 0);
  		if (nnode->tn_spec.tn_lnk.tn_link == NULL) {
  			nnode->tn_type = VNON;
 -			tmpfs_free_node(tmp, nnode);
 +			tmpfs_free_node(tmp, nnode, FALSE);
  			return ENOSPC;
  		}
  		memcpy(nnode->tn_spec.tn_lnk.tn_link, target, nnode->tn_size);
 @@ -202,17 +214,17 @@ tmpfs_alloc_node(struct tmpfs_mount *tmp
   * to the user are the removal of empty directories and the deletion of
   * individual files.
   *
 - * Note that nodes are not really deleted; in fact, when a node has been
 - * allocated, it cannot be deleted during the whole life of the file
 - * system.  Instead, they are moved to the available list and remain there
 - * until reused.
 + * If keepit is TRUE, the node is not really deleted because, when a node
 + * has been allocated, it cannot be deleted during the whole life of the
 + * file system.  Instead, its identifier is moved to the available list
 + * and remain there in case it needs to be reused.
   */
  void
 -tmpfs_free_node(struct tmpfs_mount *tmp, struct tmpfs_node *node)
 +tmpfs_free_node(struct tmpfs_mount *tmp, struct tmpfs_node *node,
 +    boolean_t keepit)
  {
 -	ino_t id;
 -	unsigned long gen;
  	size_t pages;
 +	struct tmpfs_nid *nid;

  	switch (node->tn_type) {
  	case VNON:
 @@ -253,13 +265,12 @@ tmpfs_free_node(struct tmpfs_mount *tmp,
  	tmp->tm_pages_used -= pages;

  	LIST_REMOVE(node, tn_entries);
 -	id = node->tn_id;
 -	gen = node->tn_gen;
 -	memset(node, 0, sizeof(struct tmpfs_node));
 -	node->tn_id = id;
 -	node->tn_type = VNON;
 -	node->tn_gen = gen;
 -	LIST_INSERT_HEAD(&tmp->tm_nodes_avail, node, tn_entries);
 +	nid = node->tn_nid;
 +	if (keepit)
 +		LIST_INSERT_HEAD(&tmp->tm_nodes_avail, nid, tni_entries);
 +	else
 +		TMPFS_POOL_PUT(&tmp->tm_nid_pool, nid);
 +	TMPFS_POOL_PUT(&tmp->tm_node_pool, node);
  }

  /* --------------------------------------------------------------------- */
 @@ -505,7 +516,7 @@ tmpfs_alloc_file(struct vnode *dvp, stru
  	error = tmpfs_alloc_dirent(tmp, node, cnp->cn_nameptr, cnp->cn_namelen,
  	    &de);
  	if (error != 0) {
 -		tmpfs_free_node(tmp, node);
 +		tmpfs_free_node(tmp, node, FALSE);
  		goto out;
  	}

 @@ -513,7 +524,7 @@ tmpfs_alloc_file(struct vnode *dvp, stru
  	error = tmpfs_alloc_vp(dvp->v_mount, node, vpp);
  	if (error != 0) {
  		tmpfs_free_dirent(tmp, de, TRUE);
 -		tmpfs_free_node(tmp, node);
 +		tmpfs_free_node(tmp, node, FALSE);
  		goto out;
  	}

 @@ -637,7 +648,7 @@ tmpfs_dir_getdotdent(struct tmpfs_node *
  	TMPFS_VALIDATE_DIR(node);
  	KASSERT(uio->uio_offset == TMPFS_DIRCOOKIE_DOT);

 -	dent.d_fileno = node->tn_id;
 +	dent.d_fileno = node->tn_nid->tni_id;
  	dent.d_type = DT_DIR;
  	dent.d_namlen = 1;
  	dent.d_name[0] = '.';
 @@ -675,7 +686,7 @@ tmpfs_dir_getdotdotdent(struct tmpfs_nod
  	TMPFS_VALIDATE_DIR(node);
  	KASSERT(uio->uio_offset == TMPFS_DIRCOOKIE_DOTDOT);

 -	dent.d_fileno = node->tn_spec.tn_dir.tn_parent->tn_id;
 +	dent.d_fileno = node->tn_spec.tn_dir.tn_parent->tn_nid->tni_id;
  	dent.d_type = DT_DIR;
  	dent.d_namlen = 2;
  	dent.d_name[0] = '.';
 @@ -767,7 +778,7 @@ tmpfs_dir_getdents(struct tmpfs_node *no

  		/* Create a dirent structure representing the current
  		 * tmpfs_node and fill it. */
 -		d.d_fileno = de->td_node->tn_id;
 +		d.d_fileno = de->td_node->tn_nid->tni_id;
  		switch (de->td_node->tn_type) {
  		case VBLK:
  			d.d_type = DT_BLK;
 Index: tmpfs_vfsops.c
 ===================================================================
 RCS file: /cvsroot/src/sys/fs/tmpfs/tmpfs_vfsops.c,v
 retrieving revision 1.11
 diff -u -p -r1.11 tmpfs_vfsops.c
 --- tmpfs_vfsops.c	16 Feb 2006 14:57:50 -0000	1.11
 +++ tmpfs_vfsops.c	14 Apr 2006 17:42:55 -0000
 @@ -168,6 +168,8 @@ tmpfs_mount(struct mount *mp, const char
  	tmp->tm_pages_used = 0;
  	tmpfs_pool_init(&tmp->tm_dirent_pool, sizeof(struct tmpfs_dirent),
  	    "dirent", tmp);
 +	tmpfs_pool_init(&tmp->tm_nid_pool, sizeof(struct tmpfs_nid),
 +	    "nid", tmp);
  	tmpfs_pool_init(&tmp->tm_node_pool, sizeof(struct tmpfs_node),
  	    "node", tmp);
  	tmpfs_str_pool_init(&tmp->tm_str_pool, tmp);
 @@ -206,6 +208,7 @@ tmpfs_unmount(struct mount *mp, int mntf
  	int error;
  	int flags = 0;
  	struct tmpfs_mount *tmp;
 +	struct tmpfs_nid *nid;
  	struct tmpfs_node *node;

  	/* Handle forced unmounts. */
 @@ -243,20 +246,21 @@ tmpfs_unmount(struct mount *mp, int mntf
  		}

  		next = LIST_NEXT(node, tn_entries);
 -		tmpfs_free_node(tmp, node);
 +		tmpfs_free_node(tmp, node, FALSE);
  		node = next;
  	}
 -	node = LIST_FIRST(&tmp->tm_nodes_avail);
 -	while (node != NULL) {
 -		struct tmpfs_node *next;
 -
 -		next = LIST_NEXT(node, tn_entries);
 -		LIST_REMOVE(node, tn_entries);
 -		TMPFS_POOL_PUT(&tmp->tm_node_pool, node);
 -		node = next;
 +	nid = LIST_FIRST(&tmp->tm_nodes_avail);
 +	while (nid != NULL) {
 +		struct tmpfs_nid *next;
 +
 +		next = LIST_NEXT(nid, tni_entries);
 +		LIST_REMOVE(nid, tni_entries);
 +		TMPFS_POOL_PUT(&tmp->tm_nid_pool, nid);
 +		nid = next;
  	}

  	tmpfs_pool_destroy(&tmp->tm_dirent_pool);
 +	tmpfs_pool_destroy(&tmp->tm_nid_pool);
  	tmpfs_pool_destroy(&tmp->tm_node_pool);
  	tmpfs_str_pool_destroy(&tmp->tm_str_pool);

 @@ -320,8 +324,8 @@ tmpfs_fhtovp(struct mount *mp, struct fi

  	found = FALSE;
  	LIST_FOREACH(node, &tmp->tm_nodes_used, tn_entries) {
 -		if (node->tn_id == tfhp->tf_id &&
 -		    node->tn_gen == tfhp->tf_gen) {
 +		if (node->tn_nid->tni_id == tfhp->tf_id &&
 +		    node->tn_nid->tni_gen == tfhp->tf_gen) {
  			found = TRUE;
  			break;
  		}
 @@ -342,8 +346,8 @@ tmpfs_vptofh(struct vnode *vp, struct fi
  	node = VP_TO_TMPFS_NODE(vp);

  	tfhp->tf_len = sizeof(struct tmpfs_fid);
 -	tfhp->tf_id = node->tn_id;
 -	tfhp->tf_gen = node->tn_gen;
 +	tfhp->tf_id = node->tn_nid->tni_id;
 +	tfhp->tf_gen = node->tn_nid->tni_gen;

  	return 0;
  }
 @@ -356,6 +360,7 @@ tmpfs_statvfs(struct mount *mp, struct s
  {
  	fsfilcnt_t freenodes, usednodes;
  	struct tmpfs_mount *tmp;
 +	struct tmpfs_nid *dummyid;
  	struct tmpfs_node *dummy;

  	tmp = VFS_TO_TMPFS(mp);
 @@ -368,7 +373,7 @@ tmpfs_statvfs(struct mount *mp, struct s

  	freenodes = MIN(tmp->tm_nodes_max - tmp->tm_nodes_last,
  	    TMPFS_PAGES_AVAIL(tmp) * PAGE_SIZE / sizeof(struct tmpfs_node));
 -	LIST_FOREACH(dummy, &tmp->tm_nodes_avail, tn_entries)
 +	LIST_FOREACH(dummyid, &tmp->tm_nodes_avail, tni_entries)
  		freenodes++;

  	usednodes = 0;
 Index: tmpfs_vnops.c
 ===================================================================
 RCS file: /cvsroot/src/sys/fs/tmpfs/tmpfs_vnops.c,v
 retrieving revision 1.22
 diff -u -p -r1.22 tmpfs_vnops.c
 --- tmpfs_vnops.c	21 Feb 2006 03:19:45 -0000	1.22
 +++ tmpfs_vnops.c	14 Apr 2006 17:42:55 -0000
 @@ -430,14 +430,14 @@ tmpfs_getattr(void *v)
  	vap->va_uid = node->tn_uid;
  	vap->va_gid = node->tn_gid;
  	vap->va_fsid = vp->v_mount->mnt_stat.f_fsidx.__fsid_val[0];
 -	vap->va_fileid = node->tn_id;
 +	vap->va_fileid = node->tn_nid->tni_id;
  	vap->va_size = node->tn_size;
  	vap->va_blocksize = PAGE_SIZE;
  	vap->va_atime = node->tn_atime;
  	vap->va_mtime = node->tn_mtime;
  	vap->va_ctime = node->tn_ctime;
  	vap->va_birthtime = node->tn_birthtime;
 -	vap->va_gen = node->tn_gen;
 +	vap->va_gen = node->tn_nid->tni_gen;
  	vap->va_flags = node->tn_flags;
  	vap->va_rdev = (vp->v_type == VBLK || vp->v_type == VCHR) ?
  		node->tn_spec.tn_dev.tn_rdev : VNOVAL;
 @@ -1266,7 +1266,7 @@ tmpfs_reclaim(void *v)
  	 * we must free its associated data structures (now that the vnode
  	 * is being reclaimed). */
  	if (node->tn_links == 0)
 -		tmpfs_free_node(tmp, node);
 +		tmpfs_free_node(tmp, node, TRUE);

  	KASSERT(!VOP_ISLOCKED(vp));
  	KASSERT(vp->v_data == NULL);


 ------=_Part_7760_25248237.1145037074927--

From: "Steven M. Bellovin" <smb@cs.columbia.edu>
To: "Julio M. Merino Vidal" <jmmv84@gmail.com>
Cc: tech-kern@NetBSD.org, gnats-bugs@netbsd.org
Subject: Re: kern/31944 - Fix to reduce tmpfs memory usage
Date: Fri, 14 Apr 2006 14:13:35 -0400

 On Fri, 14 Apr 2006 19:51:14 +0200, "Julio M. Merino Vidal"
 <jmmv84@gmail.com> wrote:

 > Hello,
 > 
 > PR kern/31944 lists three issues that make tmpfs consume too much
 > memory.  The attached patch addresses the first of them.  It does the
 > following:
 > 
 > - Only keep the node identifier/generation number when deleting a node.
 >   The size of tmpfs_nid is much smaller than that of tmpfs_node.
 > - Avoid keeping nodes that were initialized but then discarded due to some
 >   error.  See the keepit parameter to the tmpfs_free_node function.
 > 
 > I have read the replies to that PR but the proposed solutions seem to be
 > "not right".  Also, one question that arises... can there be two live files in a
 > file system with the same node number?  If not, all those solutions do not
 > seem to address this...
 > 
 There's a lot of code that assumes that a <device,inode> pair uniquely
 identify files.  For example, in src/bin/pax/tables.h, there's this
 comment:

 /*
  * file hard link structure (hashed by dev/ino and chained) used to find the
  * hard links in a file system or with some archive formats (cpio)
  */


 		--Steven M. Bellovin, http://www.cs.columbia.edu/~smb

From: Bill Studenmund <wrstuden@netbsd.org>
To: "Julio M. Merino Vidal" <jmmv84@gmail.com>
Cc: tech-kern@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: kern/31944 - Fix to reduce tmpfs memory usage
Date: Fri, 14 Apr 2006 14:10:36 -0700

 --gBBFr7Ir9EOA20Yy
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline
 Content-Transfer-Encoding: quoted-printable

 On Fri, Apr 14, 2006 at 07:51:14PM +0200, Julio M. Merino Vidal wrote:
 > Hello,
 >=20
 > PR kern/31944 lists three issues that make tmpfs consume too much
 > memory.  The attached patch addresses the first of them.  It does the
 > following:
 >=20
 > - Only keep the node identifier/generation number when deleting a node.
 >   The size of tmpfs_nid is much smaller than that of tmpfs_node.
 > - Avoid keeping nodes that were initialized but then discarded due to some
 >   error.  See the keepit parameter to the tmpfs_free_node function.
 >=20
 > I have read the replies to that PR but the proposed solutions seem to be
 > "not right".  Also, one question that arises... can there be two live fil=
 es in a
 > file system with the same node number?  If not, all those solutions do not
 > seem to address this...

 As Steve noted, lots of code assumes that <dev,ino> pairs ID unique nodes.
 Thus all files that stat to the same inode numbers in a given filesystem
 are hardlinks to the same thing.

 Thus there should not be two live, distinct files that have the same inode
 #.

 The only reason I see for keeping track of old generation #s is so that if=
 =20
 we re-use an inode #, we won't give it the same generation # again.

 But the thing about generation #s is that we make them up. We export them=
 =20
 in file handles, and nothing is really supposed to tamper with them. :-)

 So one solution is to shove 64-bit generation #s in our file handles. Then=
 =20
 just have one 64-bit counter we incriment each time we create a node. Add=
 =20
 that to someting similar to how we reuse PIDs safely, and we should be=20
 set.

 Take care,

 Bill

 --gBBFr7Ir9EOA20Yy
 Content-Type: application/pgp-signature
 Content-Disposition: inline

 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.2.3 (NetBSD)

 iD8DBQFEQA/MWz+3JHUci9cRAierAJ40d7xSDkBra16QLV/TLTLV6KIpHACfbBLR
 2/c40gVwmVTu8kj6X6lnqhs=
 =eRTT
 -----END PGP SIGNATURE-----

 --gBBFr7Ir9EOA20Yy--

From: YAMAMOTO Takashi <yamt@mwd.biglobe.ne.jp>
To: jmmv84@gmail.com
Cc: tech-kern@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: kern/31944 - Fix to reduce tmpfs memory usage
Date: Sat, 22 Apr 2006 16:02:25 +0900

 hi,

 > Hello,
 > 
 > PR kern/31944 lists three issues that make tmpfs consume too much
 > memory.  The attached patch addresses the first of them.  It does the
 > following:
 > 
 > - Only keep the node identifier/generation number when deleting a node.
 >   The size of tmpfs_nid is much smaller than that of tmpfs_node.
 > - Avoid keeping nodes that were initialized but then discarded due to some
 >   error.  See the keepit parameter to the tmpfs_free_node function.
 > 
 > I have read the replies to that PR but the proposed solutions seem to be
 > "not right".  Also, one question that arises... can there be two live files in a
 > file system with the same node number?  If not, all those solutions do not
 > seem to address this...
 > 
 > tmpfs passes the regression test suite after the changes.
 > 
 > Please raise your comments.

 what's wrong with global generation number christos suggested?

 YAMAMOTO Takashi

From: David Laight <david@l8s.co.uk>
To: "Julio M. Merino Vidal" <jmmv84@gmail.com>
Cc: tech-kern@netbsd.org, gnats-bugs@netbsd.org
Subject: Re: kern/31944 - Fix to reduce tmpfs memory usage
Date: Sun, 23 Apr 2006 19:26:28 +0100

 On Fri, Apr 14, 2006 at 07:51:14PM +0200, Julio M. Merino Vidal wrote:
 > Hello,
 > 
 > PR kern/31944 lists three issues that make tmpfs consume too much
 > memory.  The attached patch addresses the first of them.  It does the
 > following:
 > 
 > - Only keep the node identifier/generation number when deleting a node.
 >   The size of tmpfs_nid is much smaller than that of tmpfs_node.
 > - Avoid keeping nodes that were initialized but then discarded due to some
 >   error.  See the keepit parameter to the tmpfs_free_node function.
 > 
 > I have read the replies to that PR but the proposed solutions seem to be
 > "not right".  Also, one question that arises... can there be two live files in
 > a file system with the same node number?  If not, all those solutions do not
 > seem to address this...

 As pointed out by others, POSIX requires that a file be uniquelly identifyable
 by its device/inode pair.  This is visible to userland via stat().

 The NetBSD kernel was recently changed to support 64bit inode numbers, so
 maybe you just use increment a 64bit inode number on every create?
 (with a random(ish) generation number).

 Alternatively you just need to keep the last used generation number for
 each inode number - which should just cost 4 bytes/inode, and ought to
 be pagable.

 	David

 -- 
 David Laight: david@l8s.co.uk

From: "Julio M. Merino Vidal" <jmmv84@gmail.com>
To: tech-kern@netbsd.org, gnats-bugs@netbsd.org
Cc: 
Subject: Re: kern/31944 - Fix to reduce tmpfs memory usage
Date: Sun, 23 Apr 2006 23:04:29 +0200

 On 4/23/06, David Laight <david@l8s.co.uk> wrote:
 > Alternatively you just need to keep the last used generation number for
 > each inode number - which should just cost 4 bytes/inode, and ought to
 > be pagable.

 Which is what the attached patch did, isn't it?  The structure is a
 bit bigger because it is stored in a linked list (an extra pointer to
 do the mapping), and it'd be pageable if we solved all the pageability
 issues in tmpfs (a custom pool allocator?)...

 --
 Julio M. Merino Vidal <jmmv84@gmail.com>
 The Julipedia - http://julipedia.blogspot.com/

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@netbsd.org, kern-bug-people@netbsd.org,
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org,
	yamt@mwd.biglobe.ne.jp
Cc: 
Subject: Re: kern/31944 - Fix to reduce tmpfs memory usage
Date: Sun, 23 Apr 2006 19:57:27 -0400

 On Apr 23,  6:20pm, david@l8s.co.uk (David Laight) wrote:
 -- Subject: Re: kern/31944 - Fix to reduce tmpfs memory usage

 |  The NetBSD kernel was recently changed to support 64bit inode numbers, so
 |  maybe you just use increment a 64bit inode number on every create?
 |  (with a random(ish) generation number).

 That was my proposal (without the random generation number, but keeping
 it sequental).

 |  Alternatively you just need to keep the last used generation number for
 |  each inode number - which should just cost 4 bytes/inode, and ought to
 |  be pagable.

 And you only need to start doing this once you exceed 2^63 files...

 christos

From: Mindaugas Rasiukevicius <rmind@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/31944 CVS commit: src/sys
Date: Tue, 22 Jun 2010 18:32:08 +0000

 Module Name:	src
 Committed By:	rmind
 Date:		Tue Jun 22 18:32:08 UTC 2010

 Modified Files:
 	src/sys/fs/tmpfs: TODO files.tmpfs tmpfs.h tmpfs_subr.c tmpfs_vfsops.c
 	    tmpfs_vnops.c
 	src/sys/modules/tmpfs: Makefile
 	src/sys/rump/fs/lib/libtmpfs: Makefile
 Added Files:
 	src/sys/fs/tmpfs: tmpfs_mem.c
 Removed Files:
 	src/sys/fs/tmpfs: tmpfs_pool.c tmpfs_pool.h

 Log Message:
 Replace tmpfs_pool custom allocator code with a simpler layer for memory
 accounting.  Use wired memory (which can be limited) for meta-data, and
 kmem(9) for string allocations.

 Close PR/31944.  Fix PR/38361 while here.  OK ad@.


 To generate a diff of this commit:
 cvs rdiff -u -r1.6 -r1.7 src/sys/fs/tmpfs/TODO
 cvs rdiff -u -r1.3 -r1.4 src/sys/fs/tmpfs/files.tmpfs
 cvs rdiff -u -r1.37 -r1.38 src/sys/fs/tmpfs/tmpfs.h
 cvs rdiff -u -r0 -r1.1 src/sys/fs/tmpfs/tmpfs_mem.c
 cvs rdiff -u -r1.14 -r0 src/sys/fs/tmpfs/tmpfs_pool.c
 cvs rdiff -u -r1.7 -r0 src/sys/fs/tmpfs/tmpfs_pool.h
 cvs rdiff -u -r1.56 -r1.57 src/sys/fs/tmpfs/tmpfs_subr.c
 cvs rdiff -u -r1.44 -r1.45 src/sys/fs/tmpfs/tmpfs_vfsops.c
 cvs rdiff -u -r1.69 -r1.70 src/sys/fs/tmpfs/tmpfs_vnops.c
 cvs rdiff -u -r1.1 -r1.2 src/sys/modules/tmpfs/Makefile
 cvs rdiff -u -r1.3 -r1.4 src/sys/rump/fs/lib/libtmpfs/Makefile

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: open->closed
State-Changed-By: rmind@NetBSD.org
State-Changed-When: Tue, 22 Jun 2010 19:19:38 +0000
State-Changed-Why:
Close, all three points were solved (in one ot other way) in time.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.