NetBSD Problem Report #18614

Received: (qmail 1754 invoked by uid 605); 10 Oct 2002 23:25:48 -0000
Message-Id: <20021010232547.18561.qmail@kechara.flame.org>
Date: 10 Oct 2002 23:25:47 -0000
From: explorer@flame.org
Sender: gnats-bugs-owner@netbsd.org
Reply-To: explorer@flame.org
To: gnats-bugs@gnats.netbsd.org
Subject: sort in -current 10/10/2002 (multiple issues)
X-Send-Pr-Version: 3.95

>Number:         18614
>Category:       bin
>Synopsis:       sort in -current 10/10/2002 (multiple issues)
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    jdolecek
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Oct 10 23:26:00 +0000 2002
>Closed-Date:    Fri Oct 09 20:36:30 +0000 2009
>Last-Modified:  Wed Oct 14 20:45:00 +0000 2009
>Originator:     User explorer
>Release:        NetBSD 1.6I
>Organization:
flame.org:  yes, we do know everything
>Environment:


System: NetBSD speedy.flame.org 1.6I NetBSD 1.6I (GENERIC.MP) #0: Thu Oct 3 22:57:42 PDT 2002 explorer@speedy.flame.org:/u0/OS/src/sys/arch/i386/compile/GENERIC.MP i386
Architecture: i386
Machine: i386
>Description:
When running sort on a large file (310,199,825 bytes, 46,701,436 lines)
the default file descriptor limits are not sufficient:

	explorer@here-there-be-dragons> sort -u -o foo fename.txt 
	sort: ftmp: mkstemp("/tmp/sort.003822ch"): Too many open files

Running "unlimit" first, which increases my open file litmit
to 12,560 files, produces:

	explorer@here-there-be-dragons> sort -u -o foo fename.txt
	sort: ftmp: mkstemp("/sort.003824xz"): Permission denied

(Note that the variable "tmpdir" is getting smashed here!)

If I put a "char unused_padding[10240];" line in sort.c, between
toutpath and tmpdir, I no longer get this message.  I then get
the "too many open files" again.

sort is quite clearly using too many of these temporary files.  It should
try to limit how many it uses to something that the system-default value
will allow -- 64 is, last I checked, the system default pretty much everywhere.
>How-To-Repeat:
If nothing else works, I can provide a file that I used.
>Fix:
None known.

>Release-Note:
>Audit-Trail:
Responsible-Changed-From-To: bin-bug-people->jdolecek 
Responsible-Changed-By: jdolecek 
Responsible-Changed-When: Sun Dec 8 04:10:33 PST 2002 
Responsible-Changed-Why:  
sort(1) is mine. 
State-Changed-From-To: open->feedback 
State-Changed-By: jdolecek 
State-Changed-When: Mon Dec 23 02:32:57 PST 2002 
State-Changed-Why:  
Can you please make the file available somewhere? I cannot repeat 
the latter problem. 
Thanks. 
State-Changed-From-To: feedback->analyzed 
State-Changed-By: jdolecek 
State-Changed-When: Mon Dec 23 11:19:14 PST 2002 
State-Changed-Why:  
I can repeat also the second problem with file >2GB (on i386). 
From: David Laight <dsl@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/18614 CVS commit: src/usr.bin/sort
Date: Sat, 22 Aug 2009 10:53:29 +0000

 Module Name:	src
 Committed By:	dsl
 Date:		Sat Aug 22 10:53:28 UTC 2009

 Modified Files:
 	src/usr.bin/sort: append.c fields.c files.c fsort.c init.c msort.c
 	    sort.c sort.h

 Log Message:
 Rework the way sort generates sort keys:
 - If we generate a key, it is always sortable using memcmp()
 - If we are sorting the whole record, then a weight-table must be used
   during compares.
 - Major surgery to encoding of numbers to ensure unique keys for equal
   numeric values.  Reverse numerics are handled by inverting the sign.
 - Case folding (-f) is handled when the sort keys are generated. No other
   code has to care at all.
 - Key uniqueness (-u) is done during merge for large datasets. It only
   has to be done when writing the output file for small files.
   Since the file is in key order this is simple!
 Probably fixes all of: PR/27257 PR/25551 PR/22182 PR/31095 PR/30504
 PR/36816 PR/37860 PR/39308
 Also PR/18614 should no longer die, but a little more work needs to be
 done on the merging for very large files.


 To generate a diff of this commit:
 cvs rdiff -u -r1.19 -r1.20 src/usr.bin/sort/append.c src/usr.bin/sort/init.c
 cvs rdiff -u -r1.24 -r1.25 src/usr.bin/sort/fields.c src/usr.bin/sort/sort.h
 cvs rdiff -u -r1.34 -r1.35 src/usr.bin/sort/files.c
 cvs rdiff -u -r1.38 -r1.39 src/usr.bin/sort/fsort.c
 cvs rdiff -u -r1.22 -r1.23 src/usr.bin/sort/msort.c
 cvs rdiff -u -r1.51 -r1.52 src/usr.bin/sort/sort.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: David Laight <david@l8s.co.uk>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: PR/18614 sort using too many open files
Date: Sat, 22 Aug 2009 12:31:19 +0100

 Recent changes mean that sort will always do a merge when there
 are 16 temporary files (16 is the limit for the merge code).

 This isn't currently ideal since for large datasets all but the
 first file fed into the merge will be about 8M (or 128k lines).
 (That might drop the sort to O(n^2) from O(nlogn) !)

 The merge code needs to support more input files, and a 2 (or 3)
 level merge done my fsort().

 	David

 -- 
 David Laight: david@l8s.co.uk

State-Changed-From-To: analyzed->closed
State-Changed-By: dsl@NetBSD.org
State-Changed-When: Fri, 09 Oct 2009 20:36:30 +0000
State-Changed-Why:
Rev 1.28 of msort.c implements a multi-level merge.


From: Stephen Borrill <sborrill@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/18614 CVS commit: [netbsd-5] src/usr.bin/sort
Date: Wed, 14 Oct 2009 20:41:53 +0000

 Module Name:	src
 Committed By:	sborrill
 Date:		Wed Oct 14 20:41:53 UTC 2009

 Modified Files:
 	src/usr.bin/sort [netbsd-5]: Makefile append.c fields.c files.c fsort.c
 	    fsort.h init.c msort.c sort.1 sort.c sort.h tmp.c
 Added Files:
 	src/usr.bin/sort [netbsd-5]: radix_sort.c

 Log Message:
 Pull up the following revisions(s) (requested by dsl in ticket #1084):
 	usr.bin/sort/Makefile:	revision 1.6-1.8
 	usr.bin/sort/append.c:	revision 1.15-1.22
 	usr.bin/sort/fields.c:	revision 1.20-1.30
 	usr.bin/sort/files.c:	revision 1.27-1.40
 	usr.bin/sort/fsort.c:	revision 1.33-1.45
 	usr.bin/sort/fsort.h:	revision 1.14-1.17
 	usr.bin/sort/init.c:	revision 1.19-1.23
 	usr.bin/sort/msort.c:	revision 1.19-1.28
 	usr.bin/sort/radix_sort.c:	revision 1.1-1.4
 	usr.bin/sort/sort.1:	revision 1.27-1.29
 	usr.bin/sort/sort.c:	revision 1.47-1.56
 	usr.bin/sort/sort.h:	revision 1.20-1.30
 	usr.bin/sort/tmp.c:	revision 1.14-1.15

 Only use radix sort for in-memory sort, always merge temporary files.
 Use a local radixsort() function so we can pass record length.
 Avoid use of weight tables for key compares.
 Fix generation of keys for numbers, negate value for reverse sort.
 Write file in reverse-key order for 'sort -n'.
 'sort -S' now does a posix sort (sort matching keys by record data).
 Ensure merge sort doesn't have too many temporary files open.
 Fixes: PR#18614 PR#27257 PR#25551 PR#22182 PR#31095 PR#30504 PR#36816
 PR#37860 PR#39308 PR#42094


 To generate a diff of this commit:
 cvs rdiff -u -r1.5 -r1.5.40.1 src/usr.bin/sort/Makefile
 cvs rdiff -u -r1.14 -r1.14.6.1 src/usr.bin/sort/append.c
 cvs rdiff -u -r1.19 -r1.19.6.1 src/usr.bin/sort/fields.c \
     src/usr.bin/sort/sort.h
 cvs rdiff -u -r1.26 -r1.26.6.1 src/usr.bin/sort/files.c \
     src/usr.bin/sort/sort.1
 cvs rdiff -u -r1.32 -r1.32.6.1 src/usr.bin/sort/fsort.c
 cvs rdiff -u -r1.13 -r1.13.6.1 src/usr.bin/sort/fsort.h \
     src/usr.bin/sort/tmp.c
 cvs rdiff -u -r1.18 -r1.18.6.1 src/usr.bin/sort/init.c \
     src/usr.bin/sort/msort.c
 cvs rdiff -u -r0 -r1.4.2.2 src/usr.bin/sort/radix_sort.c
 cvs rdiff -u -r1.46 -r1.46.4.1 src/usr.bin/sort/sort.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.