NetBSD Problem Report #49853
From martin@aprisoft.de Sat Apr 25 12:18:10 2015
Return-Path: <martin@aprisoft.de>
Received: from mail.netbsd.org (mail.netbsd.org [149.20.53.66])
(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
(Client CN "mail.netbsd.org", Issuer "Postmaster NetBSD.org" (not verified))
by mollari.NetBSD.org (Postfix) with ESMTPS id 0FF04A654B
for <gnats-bugs@gnats.NetBSD.org>; Sat, 25 Apr 2015 12:18:10 +0000 (UTC)
Message-Id: <20150425121730.5B023ED0E4F@emmas.aprisoft.de>
Date: Sat, 25 Apr 2015 14:17:30 +0200 (CEST)
From: martin@NetBSD.org
Reply-To: martin@NetBSD.org
To: gnats-bugs@NetBSD.org
Subject: x86 fpu save loop may terminate early?
X-Send-Pr-Version: 3.95
>Number: 49853
>Category: port-amd64
>Synopsis: x86 fpu save loop may terminate early?
>Confidential: no
>Severity: non-critical
>Priority: medium
>Responsible: port-amd64-maintainer
>State: closed
>Class: sw-bug
>Submitter-Id: net
>Arrival-Date: Sat Apr 25 12:20:00 +0000 2015
>Closed-Date: Sat Apr 25 15:04:24 +0000 2015
>Last-Modified: Sat Apr 25 15:04:24 +0000 2015
>Originator: Martin Husemann
>Release: NetBSD 7.99.12
>Organization:
The NetBSD Foundation, Inc
>Environment:
System: NetBSD martins.aprisoft.de 7.99.12 NetBSD 7.99.12 (GENERIC) #18: Sat Apr 25 14:06:36 CEST 2015 martin@martins.aprisoft.de:/usr/src/sys/arch/amd64/compile/GENERIC amd64
Architecture: x86_64
Machine: amd64
>Description:
When fpu ownership is removed from a remote cpu, there is a loop in
x86/x86/fpu.c:fpusave_lwp that waits for the other cpu to clear pcb->pcb_fpcpu.
IIUC the loop has a second exit condition to protect against unresponsive
cpus and avoid an endless loop:
while (pcb->pcb_fpcpu == oci && ticks == hardclock_ticks) {
x86_pause();
spins++;
}
That is: give up waiting once hardclock_ticks has increased. Now I don't
understand what prevents this clock tick to happen basically at the same
moment that we send the ipi. This would cause the loop to exit early, and
the function return while the other cpu is not done saving FPU state.
Should this read something like:
while (pcb->pcb_fpcpu == oci && (ticks+1) >= hardclock_ticks) {
x86_pause();
spins++;
}
instead?
>How-To-Repeat:
code inspection
>Fix:
see above?
>Release-Note:
>Audit-Trail:
From: Michael van Elst <mlelstv@serpens.de>
To: gnats-bugs@netbsd.org
Cc:
Subject: Re: port-amd64/49853 x86 fpu save loop may terminate early?
Date: Sat, 25 Apr 2015 15:11:41 +0200
This is always an "infinite loop" when pcb->pcb_fpcpu isn't changed
by the remote CPU.
forever() {
if (fpu context is free)
break
if (fpu context belongs to me) {
save fpu
break
}
signal context owner by IPI
while (fpu context belongs to same owner && same time slice)
pause
}
The inner while loop just paces the checks and sending of IPI messages.
If hardclock_ticks changes when entering the main loop, the only
consequence would be that the second iteration comes early.
Greetings,
--
Michael van Elst
Internet: mlelstv@serpens.de
"A potential Snark may lurk in every tree."
State-Changed-From-To: open->closed
State-Changed-By: martin@NetBSD.org
State-Changed-When: Sat, 25 Apr 2015 15:04:24 +0000
State-Changed-Why:
Not a bug (overlooked the outer loop)
>Unformatted:
(Contact us)
$NetBSD: query-full-pr,v 1.39 2013/11/01 18:47:49 spz Exp $
$NetBSD: gnats_config.sh,v 1.8 2006/05/07 09:23:38 tsutsui Exp $
Copyright © 1994-2014
The NetBSD Foundation, Inc. ALL RIGHTS RESERVED.