NetBSD Problem Report #43192

From www@NetBSD.org  Thu Apr 22 02:27:39 2010
Return-Path: <www@NetBSD.org>
Received: from mail.netbsd.org (mail.netbsd.org [204.152.190.11])
	by www.NetBSD.org (Postfix) with ESMTP id 56E0163B8BC
	for <gnats-bugs@gnats.NetBSD.org>; Thu, 22 Apr 2010 02:27:39 +0000 (UTC)
Message-Id: <20100422022738.D5EA763B873@www.NetBSD.org>
Date: Thu, 22 Apr 2010 02:27:38 +0000 (UTC)
From: lhf@impa.br
Reply-To: lhf@impa.br
To: gnats-bugs@NetBSD.org
Subject: games/factor takes forever  on some large inputs
X-Send-Pr-Version: www-1.0

>Number:         43192
>Category:       misc
>Synopsis:       games/factor takes forever  on some large inputs
>Confidential:   no
>Severity:       serious
>Priority:       low
>Responsible:    misc-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Thu Apr 22 02:30:00 +0000 2010
>Closed-Date:    Wed Apr 28 18:06:13 +0000 2010
>Last-Modified:  Thu Apr 29 00:45:01 +0000 2010
>Originator:     Luiz Henrique de Figueiredo
>Release:        http://cvsweb.netbsd.org/bsdweb.cgi/src/games/factor/
>Organization:
IMPA
>Environment:
I'm running factor(6) in a Linux machine
>Description:
I'm running this version of factor:
/*	$NetBSD: factor.c,v 1.19 2009/08/12 05:54:31 dholland Exp $     */
but compiled in a Linux box with HAVE_OPENSSL so that it can handle big numbers.

While this works fine
        factor 123456789012345678901234567890
        123456789012345678901234567890: 2 3 3 3 5 7 13 31 37 211 241 2161 3607 3803 2906161

this goes on forever:
        factor 1234567890123456789012345678901
(32 hours and no output yet)

Pollard p-1 seems to have a hard time with 1234567890123456789012345678901...
>How-To-Repeat:
factor 1234567890123456789012345678901

>Fix:
No fix known

>Release-Note:

>Audit-Trail:
From: Matthias Drochner <drochner@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/43192 CVS commit: src/games/factor
Date: Thu, 22 Apr 2010 14:28:48 +0000

 Module Name:	src
 Committed By:	drochner
 Date:		Thu Apr 22 14:28:48 UTC 2010

 Modified Files:
 	src/games/factor: factor.c

 Log Message:
 fix an obvious flaw in bounds check: the array of precomputed primes
 could be overrun if its last entry (65537) was a factor of the input
 (this does not affect PR misc/43192 -- the factors are much larger
 here: 7742394596501*159455563099482401)


 To generate a diff of this commit:
 cvs rdiff -u -r1.19 -r1.20 src/games/factor/factor.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: David Holland <dholland-bugs@netbsd.org>
To: gnats-bugs@NetBSD.org
Cc: 
Subject: Re: misc/43192: games/factor takes forever  on some large inputs
Date: Thu, 22 Apr 2010 17:32:46 +0000

 On Thu, Apr 22, 2010 at 02:30:01AM +0000, lhf@impa.br wrote:
  > While this works fine
  >         factor 123456789012345678901234567890
  >         123456789012345678901234567890: 2 3 3 3 5 7 13 31 37 211 241 2161 3607 3803 2906161
  > 
  > this goes on forever:
  >         factor 1234567890123456789012345678901
  > (32 hours and no output yet)
  > 
  > Pollard p-1 seems to have a hard time with 1234567890123456789012345678901...

 It looks to me like the code we have is a mixture of the Pollard p-1
 algorithm and the Pollard rho algorithm, and it appears to be, if not
 actually incorrect, at least suboptimal. Compare for example

 http://modular.math.washington.edu/edu/2007/spring/ent/ent-html/node81.html

 I am not sufficiently up on the finer points to want to try to correct
 it without putting in some time on the math, which is not likely to
 happen anytime soon.

 ISTM also, though, that we ought to identify values that are labeled
 prime based on statistical primality testing rather than on exhaustive
 (non-)factoring. The error rate is very small, but it's not zero, and
 the people running this program are nonspecialists who won't in
 general be already aware of this issue.

 -- 
 David A. Holland
 dholland@netbsd.org

From: Matthias Drochner <M.Drochner@fz-juelich.de>
To: <dholland-bugs@netbsd.org>, <lhf@impa.br>
Cc: <gnats-bugs@netbsd.org>
Subject: Re: misc/43192: games/factor takes forever on some large inputs
Date: Fri, 23 Apr 2010 19:59:19 +0200

 --==_Exmh_6376895038440
 Content-Type: text/plain; charset="us-ascii"
 Content-Transfer-Encoding: quoted-printable


 David Holland wrote:
 > It looks to me like the code we have is a mixture of the Pollard p-1
 > algorithm and the Pollard rho algorithm

 For me it looks like p-1, only that the exponent ("i" in the code,
 "m" in the reference you cited) is not calculated by the Least-
 Common-Multiple algorithm but just tried from 2 with increments
 of 1. This way, useful exponents are hit eventually as well, but
 it is extremely wasteful. In particular, since prime factors
 up to ~65000 are already ruled out at this point, it doesn't make
 much sense to start with such small numbers. (Also, since we
 have a table of prime numbers in the code already, it would be
 easy to reuse it for the L.C.M. calculation.)
 Unfortunately, the L.C.M. exponent will grow very quickly for
 useful Power Smoothness values ("B" in the reference). For
 a B of 100000 which is in the order of magnitude of the prime
 factors already considered, log(m) is 100051.6. So it would
 be a number with tenthousends of digits.

 Luiz Henrique de Figueiredo wrote:
 > Pollard p-1 seems to have a hard time with
 > 1234567890123456789012345678901...

 Indeed. The factors are 7742394596501 and 159455563099482401.
 The p-1 values factor into
 2^2*5^3*15484789193 and 2^5*5^2*29*283*24286518079.
 If I understood that stuff correctly, this means that one
 would need an exponent calculated for a "B" of >1.5e10.
 That's certainly too much to be practical.
 There is of course the statistical possibility that a factor
 is found with smaller exponents, but I still got the impression
 that p-1 is not the right algorithm here.

 As a test, I've just modified the pollard_pminus1() function
 within factor.c to do the rho algorithm. The code isn't complete,
 it just uses 2^x+1 as a pseudorandom generator polynom, with a
 fixed start value of 1. It factors Luiz's number within 70s
 on a 2GHz CPU. I'll append the code, if you want to play with it.
 (just replace the function, should just plug in)

 best regards
 Matthias


 ---------------------------------------------------------------------------=
 ---------------------
 ---------------------------------------------------------------------------=
 ---------------------
 Forschungszentrum Juelich GmbH
 52425 Juelich
 Sitz der Gesellschaft: Juelich
 Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
 Vorsitzende des Aufsichtsrats: MinDir'in Baerbel Brumme-Bothe
 Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
 Dr. Ulrich Krafft (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
 Prof. Dr. Sebastian M. Schmidt
 ---------------------------------------------------------------------------=
 ---------------------
 ---------------------------------------------------------------------------=
 ---------------------

 --==_Exmh_6376895038440
 Content-Type: text/plain; name="rho.c"; charset="us-ascii"
 Content-Description: rho.c
 Content-Disposition: attachment; filename="rho.c"

 static void
 pollard_pminus1(BIGNUM *val)
 {
 	unsigned int steps_taken, steps_limit, a;
 	BIGNUM *x, *y, *tmp, *diff, *num;

 	x = BN_new();
 	y = BN_new();
 	tmp = BN_new();
 	diff = BN_new();
 	num = BN_new();

 	if (!BN_mod_word(val, 2)) {
 		printf(" 2");
 		BN_div_word(val, 2);
 		if (!BN_is_one(val))
 			pollard_pminus1(val);
 		return;
 	}

 startover:
 	steps_taken = 0;
 	steps_limit = 2;
 	a = 1; /* XXX random */
 	BN_set_word(x, 1); /* XXX random */
 	BN_copy(y, x);

 	for (;;) {
 		BN_sqr(x, x, ctx);
 		BN_add_word(x, a);
 		BN_mod(tmp, x, val, ctx);
 		BN_copy(x, tmp);
 		if (BN_cmp(x, y) > 0)
 			BN_sub(diff, x, y);
 		else
 			BN_sub(diff, y, x);
 		if (BN_is_zero(diff)) {
 			printf("loop\n"); /* XXX retry with other a and x0 */
 			return;
 		}

 		BN_gcd(tmp, diff, val, ctx);

 		if (!BN_is_one(tmp)) {
 			if (BN_is_prime(tmp, PRIME_CHECKS, NULL, NULL,
 			    NULL) == 1) {
 				putchar(' ');
 				BN_print_dec_fp(stdout, tmp);
 			} else
 				pollard_pminus1(tmp);
 			fflush(stdout);

 			BN_div(num, NULL, val, tmp, ctx);
 			if (BN_is_one(num))
 				return;
 			if (BN_is_prime(num, PRIME_CHECKS, NULL, NULL,
 			    NULL) == 1) {
 				putchar(' ');
 				BN_print_dec_fp(stdout, num);
 				fflush(stdout);
 				return;
 			}
 			BN_copy(val, num);
 			goto startover;
 		}

 		steps_taken++;
 		if (steps_taken == steps_limit) {
 			BN_copy(y, x);
 			steps_taken = 0;
 			steps_limit *= 2;
 		}
 	}
 }

 --==_Exmh_6376895038440--

State-Changed-From-To: open->feedback
State-Changed-By: drochner@NetBSD.org
State-Changed-When: Tue, 27 Apr 2010 18:19:05 +0000
State-Changed-Why:
Are you happy with the fix just committed to -current? factor(6) is not
going to be serious number crunching software, but with the "rho"
algorithm the behavior is more smoothly, at least for the cases I tested.


From: Matthias Drochner <drochner@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/43192 CVS commit: src/games/factor
Date: Tue, 27 Apr 2010 18:11:19 +0000

 Module Name:	src
 Committed By:	drochner
 Date:		Tue Apr 27 18:11:19 UTC 2010

 Modified Files:
 	src/games/factor: factor.c

 Log Message:
 -Fix an old bug in the "pollard" code: it gets its argument passed
  by reference, and changes the value behind the pointer under some
  circumstances (basically if it finds more than 2 different factors).
  It also calls itself if it finds a factor which is not considered prime
  (by openssl's miller-rabin check) and uses the call argument afterwards.
  This doesn't work -- we need to copy the argument into its own storage.
 -Modify the code to do the "rho" algorithm as was initially announced.
  It takes somewhat longer in rare cases, but still works in cases where
  the "p-1" algorithm is unusable. This might fix PR misc/43192
  by Luiz Henrique de Figueiredo.
 -Add some optional debug support, minor cleanup.


 To generate a diff of this commit:
 cvs rdiff -u -r1.20 -r1.21 src/games/factor/factor.c

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

State-Changed-From-To: feedback->closed
State-Changed-By: drochner@NetBSD.org
State-Changed-When: Wed, 28 Apr 2010 18:06:13 +0000
State-Changed-Why:
submitter is satisfied


From: matthew green <mrg@eterna.com.au>
To: gnats-bugs@NetBSD.org
Cc: misc-bug-people@netbsd.org, netbsd-bugs@netbsd.org,
    gnats-admin@netbsd.org, drochner@NetBSD.org, lhf@impa.br
Subject: re: misc/43192 (games/factor takes forever on some large inputs)
Date: Thu, 29 Apr 2010 08:30:32 +1000

 wow, new factor is a LOT better for me with a couple of simple
 inputs that were taking forever.

 for the number with 50 1 digits in it, new factor takes ~6
 seconds whereas the old one i ^Ced after 2 minutes just now.


 thanks!

From: matthew green <mrg@eterna.com.au>
To: 
Cc: misc-bug-people@netbsd.org, netbsd-bugs@netbsd.org,
    gnats-admin@netbsd.org, drochner@NetBSD.org, lhf@impa.br,
    gnats-bugs@NetBSD.org
Subject: re: misc/43192 (games/factor takes forever on some large inputs)
Date: Thu, 29 Apr 2010 09:31:53 +1000


    for the number with 50 1 digits in it, new factor takes ~6
    seconds whereas the old one i ^Ced after 2 minutes just now.

 OK, i restarted and let it finish and the difference is so great
 i feel the need to tell you all about them:

 new:  6.265u 0.001s 0:06.26 100.0%    0+0k 0+0io 0pf+0w
 old:  728.262u 0.001s 12:08.28 99.9%  0+0k 0+0io 0pf+0w


 .mrg.

From: David Holland <dholland-bugs@netbsd.org>
To: matthew green <mrg@eterna.com.au>
Cc: misc-bug-people@netbsd.org, netbsd-bugs@netbsd.org,
	gnats-admin@netbsd.org, drochner@NetBSD.org, lhf@impa.br,
	gnats-bugs@NetBSD.org
Subject: Re: misc/43192 (games/factor takes forever on some large inputs)
Date: Thu, 29 Apr 2010 00:42:45 +0000

 On Thu, Apr 29, 2010 at 09:31:53AM +1000, matthew green wrote:
  >    for the number with 50 1 digits in it, new factor takes ~6
  >    seconds whereas the old one i ^Ced after 2 minutes just now.
  > 
  > OK, i restarted and let it finish and the difference is so great
  > i feel the need to tell you all about them:
  > 
  > new:  6.265u 0.001s 0:06.26 100.0%    0+0k 0+0io 0pf+0w
  > old:  728.262u 0.001s 12:08.28 99.9%  0+0k 0+0io 0pf+0w

 This kind of thing is how you get your name on an algorithm :-)

 (yes, the previous code was allegedly Pollard's other algorithm but it
 wasn't really.)

 -- 
 David A. Holland
 dholland@netbsd.org

>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2007 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.