[Top] [All Lists]

experimental FPU context switch patch

To: "" <>
Subject: experimental FPU context switch patch
From: Jun Sun <>
Date: Mon, 04 Mar 2002 11:51:05 -0800
User-agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:0.9.4) Gecko/20011126 Netscape6/6.2.1

I implemented a new FPU context saving/restoring patch, as previously
suggested by Kevin and Ralf.  The major change is that we will save the FPU
context when we switch out a process, if necessary.

The goal is to gurrantee an off-line process always has its FPU context
saved in memory and thus free to move aother CPU in a SMP system.

The initial experimental patch can be found at the following URL.
It is a quick hack to study the performance impact.  It should be
further optimized.  It also needs to be extended so that it works
for all CPUs (including the ones without FPU) and becomes true SMP-safe
(getting rid of global variable last_task_used_math).

Here is the pseudo code version of the patch:

do_cpu() {

        if (current->used_math) {               /* Using the FPU again.  */
-               lazy_fpu_switch(last_task_used_math);
+ restore_fp(current); /* we don't need to save for the current proc */
        } else {                                /* First time FPU user.  */


        save non_scratch registers
+       if (current proc owns FPU) {    /* t used FPU in the curr run */
+               make it turn off FPU for next run
+               save FPU context to current proc
+               (note we leave last_task_used_math alone)

lmbench is run to compare the performance difference on a UP system
(NEC VR5500).  See the output at the following URL.  orig are
the unpatched kernel.

It is obvious there is not much performance difference.  And this is not
a surprise.

A couple of attributes of the patch:

1) it does not save FPU if the proc did not use FPU in the current run
2) when proc uses FPU again in next run, we don't have to restore FPU context
   if the hardware context has not been used by another proc yet
   (i.e., last_task_used_math == current)


1) if no processes are actively using FPU, we don't see much overhead other
   than a couple of load/branch instructions in resume

2) if most processes are actively using FPU, then we see the same overhead.
   The saving of FPU context is necessary in this scenario, whether it is done
   resume() (as in the patch) or a little later in lazy_fpu_swotch() as in
   the current kernel.

3) The only pathological case which would make the patch bad is when you have
   a process that actively uses FPU and it frequently switches context with
   non-FPU-using processes.  In this case, the saving of FPU context each
   time fpu-using proc is switched off is an overhead.

   If each time the fpu-using process runs through a full time slice, the
   overhead is very small percentage wise.  It is the frequent context
   switching in this case would make a kill.

I am interested in testing any benchmarks that would create case 3).  Please
let me know if you know any.

So much for rambling.


<Prev in Thread] Current Thread [Next in Thread>
  • experimental FPU context switch patch, Jun Sun <=