NetBSD Problem Report #49853

From martin@aprisoft.de  Sat Apr 25 12:18:10 2015
Return-Path: <martin@aprisoft.de>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (not verified))
	by mollari.NetBSD.org (Postfix) with ESMTPS id 0FF04A654B
	for <gnats-bugs@gnats.NetBSD.org>; Sat, 25 Apr 2015 12:18:10 +0000 (UTC)
Message-Id: <20150425121730.5B023ED0E4F@emmas.aprisoft.de>
Date: Sat, 25 Apr 2015 14:17:30 +0200 (CEST)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: x86 fpu save loop may terminate early?
X-Send-Pr-Version: 3.95

>Number:         49853
>Category:       port-amd64
>Synopsis:       x86 fpu save loop may terminate early?
>Confidential:   no
>Severity:       non-critical
>Priority:       medium
>Responsible:    port-amd64-maintainer
>State:          closed
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Sat Apr 25 12:20:00 +0000 2015
>Closed-Date:    Sat Apr 25 15:04:24 +0000 2015
>Last-Modified:  Sat Apr 25 15:04:24 +0000 2015
>Originator:     Martin Husemann
>Release:        NetBSD 7.99.12
>Organization:
The NetBSD Foundation, Inc
>Environment:
System: NetBSD martins.aprisoft.de 7.99.12 NetBSD 7.99.12 (GENERIC) #18: Sat Apr 25 14:06:36 CEST 2015 martin@martins.aprisoft.de:/usr/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:

When fpu ownership is removed from a remote cpu, there is a loop in
x86/x86/fpu.c:fpusave_lwp that waits for the other cpu to clear pcb->pcb_fpcpu.
IIUC the loop has a second exit condition to protect against unresponsive
cpus and avoid an endless loop:

               while (pcb->pcb_fpcpu == oci && ticks == hardclock_ticks) {
                       x86_pause();
                       spins++;
               }

That is: give up waiting once hardclock_ticks has increased. Now I don't
understand what prevents this clock tick to happen basically at the same
moment that we send the ipi. This would cause the loop to exit early, and
the function return while the other cpu is not done saving FPU state.

Should this read something like:

               while (pcb->pcb_fpcpu == oci && (ticks+1) >= hardclock_ticks) {
                       x86_pause();
                       spins++;
               }

instead?

>How-To-Repeat:
code inspection

>Fix:
see above?

>Release-Note:

>Audit-Trail:
From: Michael van Elst <mlelstv@serpens.de>
To: gnats-bugs@netbsd.org
Cc: 
Subject: Re: port-amd64/49853 x86 fpu save loop may terminate early?
Date: Sat, 25 Apr 2015 15:11:41 +0200

 This is always an "infinite loop" when pcb->pcb_fpcpu isn't changed
 by the remote CPU.


 forever() {

 	if (fpu context is free)
 		break

 	if (fpu context belongs to me) {
 		save fpu
 		break
 	}

 	signal context owner by IPI

 	while (fpu context belongs to same owner && same time slice)
 		pause

 }


 The inner while loop just paces the checks and sending of IPI messages.
 If hardclock_ticks changes when entering the main loop, the only
 consequence would be that the second iteration comes early.



 Greetings,
 -- 
                                 Michael van Elst
 Internet: mlelstv@serpens.de
                                 "A potential Snark may lurk in every tree."

State-Changed-From-To: open->closed
State-Changed-By: martin@NetBSD.org
State-Changed-When: Sat, 25 Apr 2015 15:04:24 +0000
State-Changed-Why:
Not a bug (overlooked the outer loop)


>Unformatted:

NetBSD Home
NetBSD PR Database Search

(Contact us) $NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014 The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.