NetBSD Problem Report #51569

From gson@gson.org  Fri Oct 21 17:10:36 2016
Return-Path: <gson@gson.org>
Received: from mail.netbsd.org (mail.netbsd.org [199.233.217.200])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (verified OK))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 48A8A7A217
	for <gnats-bugs@gnats.NetBSD.org>; Fri, 21 Oct 2016 17:10:36 +0000 (UTC)
Message-Id: <20161021171029.87CA27449D1@guava.gson.org>
Date: Fri, 21 Oct 2016 20:10:29 +0300 (EEST)
From: gson@gson.org (Andreas Gustafsson)
Reply-To: gson@gson.org (Andreas Gustafsson)
To: gnats-bugs@NetBSD.org
Subject: crypto tests randomly failing on amd64 with openssl 1.0.2j
X-Send-Pr-Version: 3.95

>Number:         51569
>Category:       lib
>Synopsis:       crypto tests randomly failing on amd64 with openssl 1.0.2j
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    lib-bug-people
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri Oct 21 17:15:00 +0000 2016
>Closed-Date:    Thu Jan 19 19:25:26 +0000 2017
>Last-Modified:  Thu Jan 19 19:25:26 +0000 2017
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current, source date >= 2016.10.14.20.34.29
>Organization:

>Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
>Description:

A number of tests in tests/crypto/libcrypto are randomly failing on
the amd64 testbed, in about half the runs.  The failing test cases
are:

  crypto/libcrypto/t_libcrypto:bn
  crypto/libcrypto/t_pubkey:ec
  crypto/libcrypto/t_pubkey:ecdh
  crypto/libcrypto/t_pubkey:ecdsa

This appears to have started with the import of openssl 1.0.2j on
October 14.

The first logged failures are at:

  http://releng.netbsd.org/b5reports/amd64/commits-2016.10.html#2016.10.14.20.34.29

>How-To-Repeat:

>Fix:

>Release-Note:

>Audit-Trail:
From: Andreas Gustafsson <gson@gson.org>
To: gnats-bugs@NetBSD.org
Cc: spz@NetBSD.org
Subject: Re: lib/51569: crypto tests randomly failing on amd64 with openssl 1.0.2j
Date: Wed, 28 Dec 2016 19:21:37 +0200

 Running 

   cd /usr/tests/crypto/libcrypto
   ./t_libcrypto bn

 on a physical machine yielded a core file and a backtrace:

   (gdb) where
   #0  0x00007d4b35f735ca in bn_GF2m_mul_2x2 () from /usr/lib/libcrypto.so.12
   #1  0x00007d4b35f6ebb7 in BN_GF2m_mod_mul_arr () from /usr/lib/libcrypto.so.12
   #2  0x00007d4b35f6f7ce in BN_GF2m_mod_mul () from /usr/lib/libcrypto.so.12
   #3  0x0000000003407cb7 in test_gf2m_mod_mul ()
   #4  0x0000000003409e99 in main ()
   (gdb) x/i $pc
   => 0x7d4b35f735ca <bn_GF2m_mul_2x2+42>:	pclmullqlqdq %xmm1,%xmm0
   (gdb)

 Wikipedia says pclmullqlqdq (I hope I will never have to pronounce
 that) was introduced around 2010, but the CPU on this machine (an
 Intel Core 2 Quad Q6600) is older than that, so it looks like there
 is something wrong with the CPU feature detection.
 -- 
 Andreas Gustafsson, gson@gson.org

From: coypu@SDF.ORG
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: lib/51569: crypto tests randomly failing on amd64 with openssl
 1.0.2j
Date: Thu, 5 Jan 2017 13:44:48 +0000

 --pWyiEgJYm5f9v55/
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: inline

 Hi,

 I can reproduce this on my pre-Westmere laptop.
 The attached diff seems to work for me, and seems to use
 the faster code for my newer desktop.

 --pWyiEgJYm5f9v55/
 Content-Type: text/plain; charset=us-ascii
 Content-Disposition: attachment; filename="pclmul.diff"

 Index: x86_64/x86_64-gf2m.S
 ===================================================================
 RCS file: /cvsroot/src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/x86_64-gf2m.S,v
 retrieving revision 1.3
 diff -u -p -u -r1.3 x86_64-gf2m.S
 --- x86_64/x86_64-gf2m.S	14 Oct 2016 16:09:44 -0000	1.3
 +++ x86_64/x86_64-gf2m.S	5 Jan 2017 13:40:39 -0000
 @@ -202,8 +202,8 @@ _mul_1x1:
  .type	bn_GF2m_mul_2x2,@function
  .align	16
  bn_GF2m_mul_2x2:
 -	movq	OPENSSL_ia32cap_P@GOTPCREL(%rip),%rax
 -	btq	$33,%rax
 +	mov	OPENSSL_ia32cap_P+4(%rip),%eax
 +	bt	$1,%eax
  	jnc	.Lvanilla_mul_2x2

  .byte	102,72,15,110,198

 --pWyiEgJYm5f9v55/--

From: coypu@SDF.ORG
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: lib/51569
Date: Thu, 5 Jan 2017 14:28:47 +0000

 should probably be $2?

From: "Christos Zoulas" <christos@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/51569 CVS commit: src/crypto/external/bsd/openssl/dist/crypto/bn/asm
Date: Sat, 7 Jan 2017 20:55:49 -0500

 Module Name:	src
 Committed By:	christos
 Date:		Sun Jan  8 01:55:49 UTC 2017

 Modified Files:
 	src/crypto/external/bsd/openssl/dist/crypto/bn/asm: x86_64-gf2m.pl

 Log Message:
 PR/51569: Andreas Gustafsson: Check the right bit for pclmulqdq:
 Perform a Carry-Less Multiplication of Quadword instruction
 (accelerator for GCM)


 To generate a diff of this commit:
 cvs rdiff -u -r1.4 -r1.5 \
     src/crypto/external/bsd/openssl/dist/crypto/bn/asm/x86_64-gf2m.pl

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Andreas Gustafsson <gson@gson.org>
To: christos@NetBSD.org
Cc: gnats-bugs@netbsd.org, coypu@SDF.ORG, spz@NetBSD.org
Subject: Re: PR/51569 CVS commit: src/crypto/external/bsd/openssl/dist/crypto/bn/asm
Date: Mon, 9 Jan 2017 17:46:27 +0200

 Christos,

 You wrote:
 >  Log Message:
 >  PR/51569: Andreas Gustafsson: Check the right bit for pclmulqdq:
 >  Perform a Carry-Less Multiplication of Quadword instruction
 >  (accelerator for GCM)
 >  
 >  To generate a diff of this commit:
 >  cvs rdiff -u -r1.4 -r1.5 \
 >      src/crypto/external/bsd/openssl/dist/crypto/bn/asm/x86_64-gf2m.pl

 I think the actual root cause of this bug was the line

   -e 's/\(OPENSSL[A-Za-z0-9_]*\)(%rip)/\1@GOTPCREL(%rip)/' \

 in src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/Makefile,
 which you later removed in revision 1.11.

 I think the change to x86_64-gf2m.pl worked simply because the added
 "+4" caused the regexp not to match, so that the @GOTPCREL was not
 added.  Other than that, it seems to me the old and new code should
 give the same result (i.e., test the same bit).  Now that the
 @GOTPCREL hack is gone from the Makefile, there should no longer be
 any need for the change to x86_64-gf2m.pl, so I think it should be
 reverted in the interest of keeping the differences wrt upstream
 minimal.

 I have a local change to revert the x86_64-gf2m.pl change - OK to
 commit that and the corresponding regenerated .S file?
 -- 
 Andreas Gustafsson, gson@gson.org

From: christos@zoulas.com (Christos Zoulas)
To: Andreas Gustafsson <gson@gson.org>
Cc: gnats-bugs@netbsd.org, coypu@SDF.ORG, spz@NetBSD.org
Subject: Re: PR/51569 CVS commit: src/crypto/external/bsd/openssl/dist/crypto/bn/asm
Date: Mon, 9 Jan 2017 10:52:50 -0500

 On Jan 9,  5:46pm, gson@gson.org (Andreas Gustafsson) wrote:
 -- Subject: Re: PR/51569 CVS commit: src/crypto/external/bsd/openssl/dist/cry

 | Christos,
 | 
 | You wrote:
 | >  Log Message:
 | >  PR/51569: Andreas Gustafsson: Check the right bit for pclmulqdq:
 | >  Perform a Carry-Less Multiplication of Quadword instruction
 | >  (accelerator for GCM)
 | >  
 | >  To generate a diff of this commit:
 | >  cvs rdiff -u -r1.4 -r1.5 \
 | >      src/crypto/external/bsd/openssl/dist/crypto/bn/asm/x86_64-gf2m.pl
 | 
 | I think the actual root cause of this bug was the line
 | 
 |   -e 's/\(OPENSSL[A-Za-z0-9_]*\)(%rip)/\1@GOTPCREL(%rip)/' \
 | 
 | in src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/Makefile,
 | which you later removed in revision 1.11.
 | 
 | I think the change to x86_64-gf2m.pl worked simply because the added
 | "+4" caused the regexp not to match, so that the @GOTPCREL was not
 | added.  Other than that, it seems to me the old and new code should
 | give the same result (i.e., test the same bit).  Now that the
 | @GOTPCREL hack is gone from the Makefile, there should no longer be
 | any need for the change to x86_64-gf2m.pl, so I think it should be
 | reverted in the interest of keeping the differences wrt upstream
 | minimal.
 | 
 | I have a local change to revert the x86_64-gf2m.pl change - OK to
 | commit that and the corresponding regenerated .S file?

 If you can test it, go for it!

 christos

From: "Andreas Gustafsson" <gson@netbsd.org>
To: gnats-bugs@gnats.NetBSD.org
Cc: 
Subject: PR/51569 CVS commit: src/crypto/external/bsd/openssl/dist/crypto/bn/asm
Date: Mon, 9 Jan 2017 17:36:24 +0000

 Module Name:	src
 Committed By:	gson
 Date:		Mon Jan  9 17:36:24 UTC 2017

 Modified Files:
 	src/crypto/external/bsd/openssl/dist/crypto/bn/asm: x86_64-gf2m.pl

 Log Message:
 Revert last two revisions; local changes should no longer be needed
 now that the root cause of PR lib/51569 is fixed by revision 1.11 of
 src/crypto/external/bsd/openssl/lib/libcrypto/arch/x86_64/Makefile.


 To generate a diff of this commit:
 cvs rdiff -u -r1.6 -r1.7 \
     src/crypto/external/bsd/openssl/dist/crypto/bn/asm/x86_64-gf2m.pl

 Please note that diffs are not public domain; they are subject to the
 copyright notices on the relevant files.

From: Joerg Sonnenberger <joerg@bec.de>
To: gnats-bugs@NetBSD.org
Cc: lib-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, Andreas Gustafsson <gson@gson.org>
Subject: Re: PR/51569 CVS commit:
 src/crypto/external/bsd/openssl/dist/crypto/bn/asm
Date: Mon, 9 Jan 2017 21:03:48 +0100

 On Mon, Jan 09, 2017 at 03:50:01PM +0000, Andreas Gustafsson wrote:
 >  I think the change to x86_64-gf2m.pl worked simply because the added
 >  "+4" caused the regexp not to match, so that the @GOTPCREL was not
 >  added.  Other than that, it seems to me the old and new code should
 >  give the same result (i.e., test the same bit).  Now that the
 >  @GOTPCREL hack is gone from the Makefile, there should no longer be
 >  any need for the change to x86_64-gf2m.pl, so I think it should be
 >  reverted in the interest of keeping the differences wrt upstream
 >  minimal.

 The original version penalizes both in terms of code size and
 unnecessary unaligned access. Don't just revert to that crap.

 Joerg

From: Andreas Gustafsson <gson@gson.org>
To: Joerg Sonnenberger <joerg@bec.de>
Cc: gnats-bugs@NetBSD.org
Subject: Re: PR/51569 CVS commit: src/crypto/external/bsd/openssl/dist/crypto/bn/asm
Date: Mon, 9 Jan 2017 22:39:39 +0200

 Joerg Sonnenberger wrote:
 > The original version penalizes both in terms of code size and
 > unnecessary unaligned access. Don't just revert to that crap.

 I already reverted it, with christos' blessing.  If you don't like
 the openssl code, I suggest you submit your improvements to the
 openssl project rather than make NetBSD diverge.
 -- 
 Andreas Gustafsson, gson@gson.org

From: Joerg Sonnenberger <joerg@bec.de>
To: gnats-bugs@NetBSD.org
Cc: lib-bug-people@netbsd.org, gnats-admin@netbsd.org,
	netbsd-bugs@netbsd.org, Andreas Gustafsson <gson@gson.org>
Subject: Re: PR/51569 CVS commit:
 src/crypto/external/bsd/openssl/dist/crypto/bn/asm
Date: Tue, 10 Jan 2017 02:03:37 +0100

 On Mon, Jan 09, 2017 at 08:40:01PM +0000, Andreas Gustafsson wrote:
 > The following reply was made to PR lib/51569; it has been noted by GNATS.
 > 
 > From: Andreas Gustafsson <gson@gson.org>
 > To: Joerg Sonnenberger <joerg@bec.de>
 > Cc: gnats-bugs@NetBSD.org
 > Subject: Re: PR/51569 CVS commit: src/crypto/external/bsd/openssl/dist/crypto/bn/asm
 > Date: Mon, 9 Jan 2017 22:39:39 +0200
 > 
 >  Joerg Sonnenberger wrote:
 >  > The original version penalizes both in terms of code size and
 >  > unnecessary unaligned access. Don't just revert to that crap.
 >  
 >  I already reverted it, with christos' blessing.  If you don't like
 >  the openssl code, I suggest you submit your improvements to the
 >  openssl project rather than make NetBSD diverge.

 Ah yes, let's ignore any on-going discussion by trigger happy commits.
 Let's just make code worse, because "upstream" doesn't care about code
 quality. I really like your attitude...

 Joerg

From: christos@zoulas.com (Christos Zoulas)
To: gnats-bugs@NetBSD.org, lib-bug-people@netbsd.org, 
	gnats-admin@netbsd.org, netbsd-bugs@netbsd.org, 
	gson@gson.org (Andreas Gustafsson)
Cc: 
Subject: Re: PR/51569 CVS commit: src/crypto/external/bsd/openssl/dist/crypto/bn/asm
Date: Mon, 9 Jan 2017 20:32:17 -0500

 On Jan 10,  1:05am, joerg@bec.de (Joerg Sonnenberger) wrote:
 -- Subject: Re: PR/51569 CVS commit: src/crypto/external/bsd/openssl/dist/cry

 |  Ah yes, let's ignore any on-going discussion by trigger happy commits.
 |  Let's just make code worse, because "upstream" doesn't care about code
 |  quality. I really like your attitude...

 I will push it upstream, no reason to fight :-)

 christos

From: coypu@SDF.ORG
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: lib/51569: crypto tests randomly failing on amd64 with openssl
Date: Tue, 10 Jan 2017 12:56:11 +0000

 I have an untested diff for this against upstream openssl here:
   https://github.com/openssl/openssl/compare/master...coypoop:master

 I'm not sure if it will be welcome.

State-Changed-From-To: open->closed
State-Changed-By: gson@NetBSD.org
State-Changed-When: Thu, 19 Jan 2017 19:25:26 +0000
State-Changed-Why:
The bug is fixed.  Discussion about OpenSSL code changes not
needed to fix this particular bug belongs elsewhere.


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.